[comp.lang.c] lex & yacc questions

mpledger@cti1.UUCP (Mark Pledger) (08/20/90)

Help!  I have three lex & yacc questions which I need help on.  They
are listed below.  (Please note that I have posted this article to
to a few news groups for wider coverage.)


PROBLEM 1 ------------------------------------ :-(

When I compile then link using the command line options below, I
always get the "environ" referencing error.  The only way I've
found to get around this is to compile and link without the "-c"
compiler option.  Why is this happening and what is "environ"?  I
think its the char *environ[] string which is passed to each program
from the shell.  Is this correct?

CTI-> cc -c main.c lex.yy.c

CTI-> ld main.o lex.yy.o -ll -lc 2>&1 > ubtest.err

undefined			first referenced
 symbol  			    in file
environ              /lib/libc.a
ld fatal: Symbol referencing errors. No output written to a.out
CTI->


PROBLEM 2 ------------------------------------ :-(

When I run yylex() from the sample code below, if no matching integer
is found it prints the yytext[] anyway.  Why?  I have it doing nothing
in the lex definition file (also shown below).  Whats causing it to
print out the token found?


[lex source code]

delim                [ \t\n]
ws                   {delim}+
letter               [A-Za-z'_']
digit                [0-9]
number               {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
var                  {letter}({letter}|{digit})*

%%
{ws}                 { /* do nothing */ }
"APPLICATION"        { return(APP); }
"IF"                 { return(IF); }
"FORM"               { return(FORM); } 
{var}                { /* do nothing */ }
{number}             { /* do nothing */ }
"/*"                 { skipcomments(); }
%%
   
skipcomments()

   ...


[C source code]

while (1)
   {
   switch( yylex() )
      {
      case APP:      /* APPLICATION key word found */
         printf("\n%s",yytext);
         break;

      case FORM:     /* FORM key word found */
         printf("\n%s",yytext);
         break;O

      case IF:       /* IF key word found */
         printf("\n%s", yytext);
         break;

      case UVAR:     /* user var word found */
         break;

      case  0:       /* end of file reached */
         printf("\nEOF reached.\n");
         exit(0);
         break;

      } /* switch */

   } /* while */

}  /* main() */


[screen dump from program's output]

CTI-> ld main.o lex.yy.o -ll -lc 2>&1 > ubtest.err
CTI-> ubtest < ubtest.txt                         
CTI-> 

APPLICATION(==);(==);
EOF reached.
CTI->...usr2/mpledger/ub> 



PROBLEM 3 ------------------------------------ :-(

I asked this before but did'nt get a response.  I'm trying to write a 
grammer for a context, case-insensitive grammer (Unify RDBMS command
language).  I know that you can specify lower or upper case letters
by defining letters in the lex rules section.  In my example
I am using letters  [A-Za-z'_'] to match upper or lower case characters
and possibly the underscore.  My question is this, how can you get
lex to match a reserved word you have declared, whether it's upper case
or not.  For example, Unify has the reserved command word "application".
I wish to scan for this word using lex and return to yyparse() whether
the reserved word "application" is found as "application", "APPLICATION",
"Application", "APPlication", etc.  What do you specify in lex to do
this.  I tried using "APPLICATION" in the rules section (e.g. 
"APPLICATION" { return(APP); } where APP is a #define).  However, this
only works if the token found is already capitalized.




Any help to the above mentioned problems would be of great help to me.
I have read the dragon book and recently bought O'Reilly's book Lex &
Yacc (which I don't think is very good!).

Thanks in advance.


-- 
Sincerely,


Mark Pledger

--------------------------------------------------------------------------
CTI                              |              (703) 685-5434 [voice]
2121 Crystal Drive               |              (703) 685-7022 [fax]
Suite 103                        |              
Arlington, DC  22202             |              mpledger@cti.com
--------------------------------------------------------------------------
             "To boldly go where no 'C' has gone before" 
--------------------------------------------------------------------------

ssdken@watson.Claremont.EDU (Ken Nelson) (08/21/90)

> PROBLEM 3 ------------------------------------ :-(

> I asked this before but did'nt get a response.  I'm trying to write a 
> grammer for a context, case-insensitive grammer (Unify RDBMS command
> language).  I know that you can specify lower or upper case letters
> by defining letters in the lex rules section.  In my example
> I am using letters  [A-Za-z'_'] to match upper or lower case characters
> and possibly the underscore.  My question is this, how can you get
> lex to match a reserved word you have declared, whether it's upper case
> or not.  For example, Unify has the reserved command word "application".
> I wish to scan for this word using lex and return to yyparse() whether
> the reserved word "application" is found as "application", "APPLICATION",
> "Application", "APPlication", etc.  What do you specify in lex to do
> this.  I tried using "APPLICATION" in the rules section (e.g. 
> "APPLICATION" { return(APP); } where APP is a #define).  However, this
> only works if the token found is already capitalized.

Try this:


[Aa][Pp][Pp][Ll][Ii][Cc][Aa][Tt][Ii][Oo][Nn] 	{ return _APPLICATION; }


this will match no matter what combination of case is used.


   Hope this helps.



				Ken Nelson
				Principal Engineer
				Software Systems Design
				3627 Padua Av.
				Claremont, CA 91711
				(714) 624-3402

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (08/21/90)

In article <266@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes:
\\\
>PROBLEM 1 ------------------------------------ :-(
>
>When I compile then link using the command line options below, I
>always get the "environ" referencing error.  The only way I've
\\\

Dont know -- what does your "main" look like? Does it really
have a main()? This looks like libc is trying to set up the
stuff to run your program but cant find it...

>PROBLEM 2 ------------------------------------ :-(
>
>When I run yylex() from the sample code below, if no matching integer
>is found it prints the yytext[] anyway.  Why?  I have it doing nothing
\\\

If something doesnt match any pattern it gets printed. Maybe
you should have a patten at the bottom like:

.|\n	;

So that anything else that matches a single character will
get flushed.

>PROBLEM 3 ------------------------------------ :-(
\\\
>and possibly the underscore.  My question is this, how can you get
>lex to match a reserved word you have declared, whether it's upper case
>or not.  For example, Unify has the reserved command word "application".
\\\

There are a couple of ways here. 

Firstly, you can define the input() macro to call your own character-
getting routine.  You may also have to redefine unput() as well if your method
of getting characters is significanly different from getc().
(I presume all you will do is a toupper(getc(f)) or something similar).

Secondly, I understand from the ``documentation'' for Lex that
you can define translate tables. This may be an ideal place to
learn how to use this feature, which I've managed to avoid for
a couple of decades!

-Kym Horsell
=====

Lexing and Yaccing for years and still sane(?)

chryses@xurilka.UUCP (Lapus Lazuli) (08/21/90)

-- 
Phong T. Co (Lapus Lazuli) |	One in your belly, and one for Rudi,
chryses@xurilka.UUCP	   |	You got what you gave by the heel of my bootie,
dada Indugu Inc.	   |	Bang, bang, out! like an old cherootie,
Montreal, CANADA	   |	I'm coming for you.	-- Kate Bush

chryses@xurilka.UUCP (Lapus Lazuli) (08/21/90)

In article <8144@jarthur.Claremont.EDU> ssdken@watson.Claremont.EDU (Ken Nelson) writes:
>> by defining letters in the lex rules section.  In my example
>> I am using letters  [A-Za-z'_'] to match upper or lower case characters
>> and possibly the underscore.  My question is this, how can you get
>> lex to match a reserved word you have declared, whether it's upper case
>> or not.  For example, Unify has the reserved command word "application".
>Try this:
>[Aa][Pp][Pp][Ll][Ii][Cc][Aa][Tt][Ii][Oo][Nn] 	{ return _APPLICATION; }
>this will match no matter what combination of case is used.
>


Sorry about the blank article, I had a lapse of motor control.


The above method will work, but will generate HUGE lex tables if you have
more than a couple.  Someone I know tried doing that with a Pascal-type
language and lex ran out of memory.

A better way would be to define an IDENTIFIER token, i.e., letter or underscore
followed by zero or more letters, digits, or underscores.  Send them all to 
a function which will convert it to uppercase.  If you have a fixed list of
reserved words you can make a hash table for it.  Return the appropriate value
if the token is in the hash table, else just return IDENTIFIER.

There are algorithms out there to generate perfect hash functions (guaranteed
no collisions), given a fixed list of values.  The table is about 3 or 4 times
bigger than the list, but if your list is 50 values or less that shouldn't be
a problem.  I don't remember where I got mine.  Perhaps someone else has an
idea.


Happy hacking!
Phong.



>				Ken Nelson


-- 
Phong T. Co (Lapus Lazuli) |	One in your belly, and one for Rudi,
chryses@xurilka.UUCP	   |	You got what you gave by the heel of my bootie,
dada Indugu Inc.	   |	Bang, bang, out! like an old cherootie,
Montreal, CANADA	   |	I'm coming for you.	-- Kate Bush

peter@ficc.ferranti.com (Peter da Silva) (08/21/90)

Assuming you're using a compiler that uses Ritchie-compiler style
switches (say, a UNIX compiler), which is implied by the reference
to /lib/libc.a:

In article <266@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes:
> When I compile then link using the command line options below, I
> always get the "environ" referencing error.  The only way I've

When you use the "-c" that means "don't link". When you specify the
two file names, that means "link". You have confused the compiler.

Make is your friend. Build a makefile that looks sort of like this:

	OFILES= main.o lex.yy.o

	prog: $(OFILES)
		$(CC) $(CFLAGS) -o prog $(OFILES)

This will compile each program -c to generate a .o file, then link
your .o files together.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com