mpledger@cti1.UUCP (Mark Pledger) (08/20/90)
Help! I have three lex & yacc questions which I need help on. They are listed below. (Please note that I have posted this article to to a few news groups for wider coverage.) PROBLEM 1 ------------------------------------ :-( When I compile then link using the command line options below, I always get the "environ" referencing error. The only way I've found to get around this is to compile and link without the "-c" compiler option. Why is this happening and what is "environ"? I think its the char *environ[] string which is passed to each program from the shell. Is this correct? CTI-> cc -c main.c lex.yy.c CTI-> ld main.o lex.yy.o -ll -lc 2>&1 > ubtest.err undefined first referenced symbol in file environ /lib/libc.a ld fatal: Symbol referencing errors. No output written to a.out CTI-> PROBLEM 2 ------------------------------------ :-( When I run yylex() from the sample code below, if no matching integer is found it prints the yytext[] anyway. Why? I have it doing nothing in the lex definition file (also shown below). Whats causing it to print out the token found? [lex source code] delim [ \t\n] ws {delim}+ letter [A-Za-z'_'] digit [0-9] number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)? var {letter}({letter}|{digit})* %% {ws} { /* do nothing */ } "APPLICATION" { return(APP); } "IF" { return(IF); } "FORM" { return(FORM); } {var} { /* do nothing */ } {number} { /* do nothing */ } "/*" { skipcomments(); } %% skipcomments() ... [C source code] while (1) { switch( yylex() ) { case APP: /* APPLICATION key word found */ printf("\n%s",yytext); break; case FORM: /* FORM key word found */ printf("\n%s",yytext); break;O case IF: /* IF key word found */ printf("\n%s", yytext); break; case UVAR: /* user var word found */ break; case 0: /* end of file reached */ printf("\nEOF reached.\n"); exit(0); break; } /* switch */ } /* while */ } /* main() */ [screen dump from program's output] CTI-> ld main.o lex.yy.o -ll -lc 2>&1 > ubtest.err CTI-> ubtest < ubtest.txt CTI-> APPLICATION(==);(==); EOF reached. CTI->...usr2/mpledger/ub> PROBLEM 3 ------------------------------------ :-( I asked this before but did'nt get a response. I'm trying to write a grammer for a context, case-insensitive grammer (Unify RDBMS command language). I know that you can specify lower or upper case letters by defining letters in the lex rules section. In my example I am using letters [A-Za-z'_'] to match upper or lower case characters and possibly the underscore. My question is this, how can you get lex to match a reserved word you have declared, whether it's upper case or not. For example, Unify has the reserved command word "application". I wish to scan for this word using lex and return to yyparse() whether the reserved word "application" is found as "application", "APPLICATION", "Application", "APPlication", etc. What do you specify in lex to do this. I tried using "APPLICATION" in the rules section (e.g. "APPLICATION" { return(APP); } where APP is a #define). However, this only works if the token found is already capitalized. Any help to the above mentioned problems would be of great help to me. I have read the dragon book and recently bought O'Reilly's book Lex & Yacc (which I don't think is very good!). Thanks in advance. -- Sincerely, Mark Pledger -------------------------------------------------------------------------- CTI | (703) 685-5434 [voice] 2121 Crystal Drive | (703) 685-7022 [fax] Suite 103 | Arlington, DC 22202 | mpledger@cti.com -------------------------------------------------------------------------- "To boldly go where no 'C' has gone before" --------------------------------------------------------------------------
ssdken@watson.Claremont.EDU (Ken Nelson) (08/21/90)
> PROBLEM 3 ------------------------------------ :-( > I asked this before but did'nt get a response. I'm trying to write a > grammer for a context, case-insensitive grammer (Unify RDBMS command > language). I know that you can specify lower or upper case letters > by defining letters in the lex rules section. In my example > I am using letters [A-Za-z'_'] to match upper or lower case characters > and possibly the underscore. My question is this, how can you get > lex to match a reserved word you have declared, whether it's upper case > or not. For example, Unify has the reserved command word "application". > I wish to scan for this word using lex and return to yyparse() whether > the reserved word "application" is found as "application", "APPLICATION", > "Application", "APPlication", etc. What do you specify in lex to do > this. I tried using "APPLICATION" in the rules section (e.g. > "APPLICATION" { return(APP); } where APP is a #define). However, this > only works if the token found is already capitalized. Try this: [Aa][Pp][Pp][Ll][Ii][Cc][Aa][Tt][Ii][Oo][Nn] { return _APPLICATION; } this will match no matter what combination of case is used. Hope this helps. Ken Nelson Principal Engineer Software Systems Design 3627 Padua Av. Claremont, CA 91711 (714) 624-3402
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (08/21/90)
In article <266@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes: \\\ >PROBLEM 1 ------------------------------------ :-( > >When I compile then link using the command line options below, I >always get the "environ" referencing error. The only way I've \\\ Dont know -- what does your "main" look like? Does it really have a main()? This looks like libc is trying to set up the stuff to run your program but cant find it... >PROBLEM 2 ------------------------------------ :-( > >When I run yylex() from the sample code below, if no matching integer >is found it prints the yytext[] anyway. Why? I have it doing nothing \\\ If something doesnt match any pattern it gets printed. Maybe you should have a patten at the bottom like: .|\n ; So that anything else that matches a single character will get flushed. >PROBLEM 3 ------------------------------------ :-( \\\ >and possibly the underscore. My question is this, how can you get >lex to match a reserved word you have declared, whether it's upper case >or not. For example, Unify has the reserved command word "application". \\\ There are a couple of ways here. Firstly, you can define the input() macro to call your own character- getting routine. You may also have to redefine unput() as well if your method of getting characters is significanly different from getc(). (I presume all you will do is a toupper(getc(f)) or something similar). Secondly, I understand from the ``documentation'' for Lex that you can define translate tables. This may be an ideal place to learn how to use this feature, which I've managed to avoid for a couple of decades! -Kym Horsell ===== Lexing and Yaccing for years and still sane(?)
chryses@xurilka.UUCP (Lapus Lazuli) (08/21/90)
-- Phong T. Co (Lapus Lazuli) | One in your belly, and one for Rudi, chryses@xurilka.UUCP | You got what you gave by the heel of my bootie, dada Indugu Inc. | Bang, bang, out! like an old cherootie, Montreal, CANADA | I'm coming for you. -- Kate Bush
chryses@xurilka.UUCP (Lapus Lazuli) (08/21/90)
In article <8144@jarthur.Claremont.EDU> ssdken@watson.Claremont.EDU (Ken Nelson) writes: >> by defining letters in the lex rules section. In my example >> I am using letters [A-Za-z'_'] to match upper or lower case characters >> and possibly the underscore. My question is this, how can you get >> lex to match a reserved word you have declared, whether it's upper case >> or not. For example, Unify has the reserved command word "application". >Try this: >[Aa][Pp][Pp][Ll][Ii][Cc][Aa][Tt][Ii][Oo][Nn] { return _APPLICATION; } >this will match no matter what combination of case is used. > Sorry about the blank article, I had a lapse of motor control. The above method will work, but will generate HUGE lex tables if you have more than a couple. Someone I know tried doing that with a Pascal-type language and lex ran out of memory. A better way would be to define an IDENTIFIER token, i.e., letter or underscore followed by zero or more letters, digits, or underscores. Send them all to a function which will convert it to uppercase. If you have a fixed list of reserved words you can make a hash table for it. Return the appropriate value if the token is in the hash table, else just return IDENTIFIER. There are algorithms out there to generate perfect hash functions (guaranteed no collisions), given a fixed list of values. The table is about 3 or 4 times bigger than the list, but if your list is 50 values or less that shouldn't be a problem. I don't remember where I got mine. Perhaps someone else has an idea. Happy hacking! Phong. > Ken Nelson -- Phong T. Co (Lapus Lazuli) | One in your belly, and one for Rudi, chryses@xurilka.UUCP | You got what you gave by the heel of my bootie, dada Indugu Inc. | Bang, bang, out! like an old cherootie, Montreal, CANADA | I'm coming for you. -- Kate Bush
peter@ficc.ferranti.com (Peter da Silva) (08/21/90)
Assuming you're using a compiler that uses Ritchie-compiler style switches (say, a UNIX compiler), which is implied by the reference to /lib/libc.a: In article <266@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes: > When I compile then link using the command line options below, I > always get the "environ" referencing error. The only way I've When you use the "-c" that means "don't link". When you specify the two file names, that means "link". You have confused the compiler. Make is your friend. Build a makefile that looks sort of like this: OFILES= main.o lex.yy.o prog: $(OFILES) $(CC) $(CFLAGS) -o prog $(OFILES) This will compile each program -c to generate a .o file, then link your .o files together. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com