[comp.unix.wizards] yacc & lex - cupla questions

woodward@uicbert.eecs.uic.edu (07/27/90)

i have been trying to parse a straightforward stream of bytes using the
c-preprocessors lex & yacc.  being a new user of these utilities, i have
a couple of problems for which i'd like to solicit your suggestions:
 
---------------------------------------------------------------------
1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
the code which is generated defaults to stdin and stdout for input and
output, respectively.  i'd like to redefine these defaults w/o having 
to hack on the intermediate c-code, since this is a live production 
project; i'd like to be able to update and modify the program simply by 
saying "make". 

---------------------------------------------------------------------
2.)  how can one get the automagically-defined #defines, which can
normally be created from yacc with the -d flag, to come out when you
use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
source files, respectively, and i have object files defined in my makefile 
called lex.o and yacc.o such that "make" follows default rules to create 
these from the aforementioned source files.   

---------------------------------------------------------------------
3.)  if i have a yacc construct such as:
 
line3	: 	A B C
		{  yacc action sequence }


which indicates that the construct line3 is composed of the 3 tokens
A B and C, in that order ...
 
how can i now assign the values of A, B, and C into local vars of my
choice?  the problem lies in the fact that each of A B and C represent
three calls to lex, and if i pass back a pointer to yytext[] from lex, 
i only retain the value of the last token in the sequence, in this case C, 
when i get to the action sequence in my yacc code.  what if i want to 
be able to select the EXACT ascii tokens for each of A B and C above in 
my yacc code.  how do i do that?


any comments or suggestions would be most heartily appreciated.

jp woodward
univ of ill at chicago 
312-996-0939

chucka@cup.portal.com (Charles - Anderson) (07/27/90)

First suggestion is to buy the new book out from
O'Reilly and Associates, Inc. "lex & yacc". It answers 
your questions quite nicely. 1-800-338-6887

I listed a couple of suggestions below.

>i have been trying to parse a straightforward stream of bytes using the
>c-preprocessors lex & yacc.  being a new user of these utilities, i have
>a couple of problems for which i'd like to solicit your suggestions:
> 
>---------------------------------------------------------------------
>1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
>the code which is generated defaults to stdin and stdout for input and
>output, respectively.  i'd like to redefine these defaults w/o having 
>to hack on the intermediate c-code, since this is a live production 
>project; i'd like to be able to update and modify the program simply by 
>saying "make". 

You can use freopen, or if you wish another file use dup.

>
>---------------------------------------------------------------------
>2.)  how can one get the automagically-defined #defines, which can
>normally be created from yacc with the -d flag, to come out when you
>use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
>source files, respectively, and i have object files defined in my makefile 
>called lex.o and yacc.o such that "make" follows default rules to create 
>these from the aforementioned source files.   
>

Some make utilities have default rules for lex and yacc
file ending with .l and .y

You can always force make with a dependency ie:

prog:	prog.c lex.yy.o y.tab.o
	cc prog.c -o prog lex.yy.o y.tab.o -ly -ll

lex.yy.o: 	lex.yy.c
		cc lex.yy.c -c ...

lex.yy.c:	lex.l y.tab.o
		lex lex.l 

y.tab.o:	y.tab.c
		cc y.c -c ... 

y.tab.c:	y.y
		yacc -d y.y

This will only compile files that have changed.

Solution 2 is to put all the commands under prog: and wholesale
do what ever.

Solution 3 is to use a shell script and make it a dependency.


>---------------------------------------------------------------------
>3.)  if i have a yacc construct such as:
> 
>line3	: 	A B C
>		{  yacc action sequence }
>
>
>which indicates that the construct line3 is composed of the 3 tokens
>A B and C, in that order ...
> 
>how can i now assign the values of A, B, and C into local vars of my
>choice?  the problem lies in the fact that each of A B and C represent
>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>i only retain the value of the last token in the sequence, in this case C, 
>when i get to the action sequence in my yacc code.  what if i want to 
>be able to select the EXACT ascii tokens for each of A B and C above in 
>my yacc code.  how do i do that?
>

The book recommends having a line that gives the file name, 
parameters etc. Just as if it were a yacc specification.
Make it the first line as input and you get your file name.
You can have a default or make a fatal error if you do not
get your first line.

The book talks about redirection, opening multiple files as needed.

>
>any comments or suggestions would be most heartily appreciated.
>
>jp woodward
>univ of ill at chicago 
>312-996-0939

ronald@atcmp.nl (Ronald Pikkert) (07/27/90)

From article <1990Jul26.175831.1216@uicbert.eecs.uic.edu>, by woodward@uicbert.eecs.uic.edu:
<> ---------------------------------------------------------------------
<> 1.)  how does one redefine the i/o in a yacc/lex piece of code? 

The code which is generated in fact defaults to yyin and yyout and these
are assigned stdin and stdout respectively.
Reassigning them can be done, without hacking, as shown in the example
using an auxillary function outside of the yylex() body.

<> 
<> ---------------------------------------------------------------------
<> 2.)  how can one get the automagically-defined #defines, which can
<> normally be created from yacc with the -d flag, to come out when you
<> use a makefile?  

Just like make will use the CFLAGS variable for cc, it will use
LFLAGS for lex and YFLAGS for yacc. So YFLAGS= -d in your makefile
will do.


<> ---------------------------------------------------------------------
<> 3.)  
<> the problem lies in the fact that each of A B and C represent
<> three calls to lex, and if i pass back a pointer to yytext[] from lex, 
<> i only retain the value of the last token in the sequence

You will have to copy the scanned pattern into some dynamically
allocated memory. 



/------------------------ example lex scanner  -----------------------------/
%{
       /* this c-code will be inserted before the yylex() c-kode */

       char *strdupl();

       /* call this function before the first call to yylex() */
       initlex(some_file_pointer)
       FILE *some_file_pointer;
       {
	       yyin = some_file_pointer;
       }

%}

constant		[0-9][0-9]*

%%

%{

         /*	initial c-kode that will be executed on every call of yylex() */

%}


{constant}	{ 
			yylval.xxxxxx = strdupl(yytext);
			return(CONSTANT); 
		}

.		{	
			return(yytext[0]);
		}

%%

/*
	use malloc to save any scanned code for use in yacc program
*/
char *strdupl(s)
char *s;
{
	char *hlp;

	hlp = (char *)malloc(strlen(s)+1);
	strcpy(hlp, s);
	return(hlp);
}
/------------------------ end example -----------------------------/


-
Ronald Pikkert                 E-mail: ronald@atcmp.nl
@ AT Computing b.v.            Tel:    080 - 566880
Toernooiveld
6525 ED  Nijmegen

martin@mwtech.UUCP (Martin Weitzel) (07/27/90)

In article <1990Jul26.175831.1216@uicbert.eecs.uic.edu> woodward@uicbert.eecs.uic.edu writes:
>
>i have been trying to parse a straightforward stream of bytes using the
>c-preprocessors lex & yacc.  being a new user of these utilities, i have
>a couple of problems for which i'd like to solicit your suggestions:

Since the "standard docs" for lex + yacc are very terse (not to say:
incomplete in many places), I think I make this a followup rather
than an emailed answer. Now, let's see where the problems are ...

>---------------------------------------------------------------------
>1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
>the code which is generated defaults to stdin and stdout for input and
>output, respectively.  i'd like to redefine these defaults w/o having 
>to hack on the intermediate c-code, since this is a live production 
>project; i'd like to be able to update and modify the program simply by 
>saying "make". 

The "calling"-tree in a lex+yacc application, when it comes to read
input and you do not change anything, is normally:

   (main or whatever)
	---> yyparse
		---> yylex
			---> input[Macro]
				---> getc(yyin) [yyin defaults to stdin]

If you want to read from some other source as stdin, you have several
points where you can change something. (In the very simplest case you
could even change nothing and use the input redirection of UNIX.)

.sidenote on
Though there are often good reasons, I sometimes wonder why a program
cares about file arguments at all instead of using stdin only. I once
found it annoying that there were programs like "tr" which don't handle
file arguments in the unix style ... until I learned that it's so easy
to put such programs in a shell wrapper like "cat $* | tr .....".
.sidenote off

If your lex-generated program has to read from another source as stdin,
just fopen the file and assign the returned FILE-pointer to yyin. The
latter is a global symbol in the object file which results from compiling
what lex generated. If you prefer seperate compilation you must define
it as "extern FILE *yyin;" in the module where you assign to it. The
right place for this will probably be the one where you call yyparse
in the above example. Note that the standard main program from the
yacc-library is not linked if you supply your own. So you can play
any games you like before calling yyparse.

Next step of complexity is to change the input-macro of yylex. This
is useful sometimes, but I would not recommend to do so until you
have gathered a bit experience with lex and understand the implications
(but I'm willing to answer questions on this by email).

Finally, you can consider avoiding lex at all and roll your own
version of yylex. If you have only the "ancient" lex which is supplied
with most unix systems (contrary to the rewrite "flex" which is IMHO
in the public domain?), it could eventually be an advantage to do so,
since lex-generated programs are known to be not so much efficient
as hand-written scanners. (I have no exact metrics for that and
comparisions made are often based on trivial scanners, which are
easily written by hand. In any case I would recommend to use lex
during development as prototyping tool.)

For the redirection of output I see no problem at all, since this
is fully under control of the C-program fragments you write in your
actions of the lex+yacc source.

>---------------------------------------------------------------------
>2.)  how can one get the automagically-defined #defines, which can
>normally be created from yacc with the -d flag, to come out when you
>use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
>source files, respectively, and i have object files defined in my makefile 
>called lex.o and yacc.o such that "make" follows default rules to create 
>these from the aforementioned source files.   

If you use lex + yacc with the Unix tool make, you can add your
own explicit dependencies or change the default-rules and add your
own commands there. There is no "catch all" method for this - several
variations all with their specific advantages and drawbacks exist -
but if you know make or are willing to learn about make, you can
determine the dependencies between your files generated by lex + yacc in
the same way as if it were normal sources (BTW: I found the book in the
O'Reilly Nutshell Series, "Managing Projects with Make", excellent for
learning about make, though for the basics the "The Unix Programming
Environment" by Kernighan + Pike [K+P] is sufficient. The latter is
also recommendable because of its treatment of lex + yacc.)

One thing is to mention here (you also find this in K+P): During
development it's much more probable that the actions in your grammar
will change rather than you add new tokens or change the type of
the value stack. Hence when running yacc, the contents of y.tab.c
will often change, but y.tab.h will stay the same. Since both
are generated in one run (yacc -d), and some other targets may
depend on y.tab.h, you often will have unnecessary compiles caused
by this scheme (BTW: This is a mistake in the design of yacc. A
better choice would have been to let yacc -d create *only* y.tab.h.
If GNU's replacement for yacc, bison, hasn't allready done this, it
should add an option switch for that purpose. This would ease "clean"
integration into make-managed projects.)

K+P has a solution for this. Mine is basically the same, just in another
package: Write two shell-wrappers (or one with an option) for yacc which
generate the y.tab.c and y.tab.h seperately. For any grammar in a file
"grammar.y", this wrappers should generate appropriate "grammar.c" and
"grammar.h" files. Since yacc writes its output into y.tab.c and y.tab.h,
the wrappers must rename these files and before doing so for y.tab.h
this file should be compared (eg. with cmp(1) or diff(1)) to grammar.h
(if one allready exits). Leaving the existing one if nothing has changed
will avoid the unnecessary re-compiles of other modules.

>
>---------------------------------------------------------------------
>3.)  if i have a yacc construct such as:
> 
>line3	: 	A B C
>		{  yacc action sequence }
>
>
>which indicates that the construct line3 is composed of the 3 tokens
>A B and C, in that order ...
> 
>how can i now assign the values of A, B, and C into local vars of my
>choice?  the problem lies in the fact that each of A B and C represent
>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>i only retain the value of the last token in the sequence, in this case C, 
>when i get to the action sequence in my yacc code.  what if i want to 
>be able to select the EXACT ascii tokens for each of A B and C above in 
>my yacc code.  how do i do that?

Yes, that's a frequently asked one.

Transfering strings from yylex to yyparse (resp. the action which has
the relevant tokens on the RHS of its grammar rule) must be done with
care: Using pointers to yytext is not feasible here - you must copy the
contents to a safe place. For that purpose you could malloc some space
in the action of yylex (not yyparse!!) which recognizes the token (see
example following below). Your C-standard-library may also contain strdup,
which does malloc and strcpy all in one, but its not difficult to do
without. Of course you must be careful here:
   -	malloc may return a NULL-pointer because of memory limits
   -	you must not forget to allocate space for the terminating
	NUL-byte; malloc(yylen + 1) is the right thing!
   -	you must carefully plan for de-allocation, if your
	program should not run out of memory when it analyzes
	some large input

If you transfer pointers to the malloc-ed space via the value stack,
the last chance for free-ing is before the stack is cleared. So, if you
don't copy the pointers which correspond to A, B, and C in the above
example, your last chance is in the grammar action. A short code
excerpt should help to understand what is required:

lex-source -------------------------------------------------------
........
%%
regex-for-token-A	{
		yylval.str = malloc(yylen + 1);
		if (yylval.str == (char *)0) {
			srceam and shout and die horrible death
		}
		strcpy(yylval.str, yytext);
		return(A);
	}
......... etc, same for token B and C
-------------------------------------------------------------------

yacc-source ------------------------------------------------------
......
%union {
	.....
	char *str;
	.....
}
......
%token <str> A B C
......
%%
......
line3 : A B C {
		$1, $2, and $3 are pointers to "safe" copies of the
		original tokens now, but if you don't copy these
		pointers to variables that will SURVIVE THIS BLOCK,
		you must cleanup befor this action ends:
			free($1);
			free($2);
			free($3);
		Be especially careful if you create multiple references
		to the malloc-ed space or if you transfer one of these
		further out, say: $$ = $1. In this case you must of course
		*not* free $1 here, instead the action(s) of the rule(s)
		where the non-terminal "line3" appears on the RHS are now
		responsible to do so.
	}
......

>
>any comments or suggestions would be most heartily appreciated.

Enough?

Good, lex+yacc lesson ends for today :-).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

ac1@chive.cs.reading.ac.uk (Andrew Cunningham) (07/29/90)

In article <1990Jul26.175831.1216@uicbert.eecs.uic.edu> woodward@uicbert.eecs.uic.edu writes:
>
>i have been trying to parse a straightforward stream of bytes using the
>c-preprocessors lex & yacc.  being a new user of these utilities, i have
>a couple of problems for which i'd like to solicit your suggestions:
> 
>---------------------------------------------------------------------
>1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
>the code which is generated defaults to stdin and stdout for input and
>output, respectively.  i'd like to redefine these defaults w/o having 
>to hack on the intermediate c-code, since this is a live production 
>project; i'd like to be able to update and modify the program simply by 
>saying "make". 

There are two approaches to this.  You can either close stdin and
reopen it to point to your input file or #define yyinput to be
something which returns the character from your file.  Then, when
lex.yy.c is compiled, instead of calling the yyinput function your
#define is called instead.  E.g.

#define yyinput my_yyinput
int my_yyinput()
  {
    /* get the character you want and return it */
  }

You'll also have to redefine yyunput(c)  if you want to do this.

>---------------------------------------------------------------------
>2.)  how can one get the automagically-defined #defines, which can
>normally be created from yacc with the -d flag, to come out when you
>use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
>source files, respectively, and i have object files defined in my makefile 
>called lex.o and yacc.o such that "make" follows default rules to create 
>these from the aforementioned source files.   

You'll need to specify an explicit rule to do this.  Or, at the
expense of some processor time you might want to run:

y.tab.h: yacc.y
	yacc yacc.y
	rm y.tab.c

(This shouldn't take too long, yacc is *fast* compared with the cc stage)

>---------------------------------------------------------------------
>3.)  if i have a yacc construct such as:
> 
>line3	: 	A B C
>		{  yacc action sequence }
>
>
>which indicates that the construct line3 is composed of the 3 tokens
>A B and C, in that order ...
> 
>how can i now assign the values of A, B, and C into local vars of my
>choice?  the problem lies in the fact that each of A B and C represent
>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>i only retain the value of the last token in the sequence, in this case C, 
>when i get to the action sequence in my yacc code.  what if i want to 
>be able to select the EXACT ascii tokens for each of A B and C above in 
>my yacc code.  how do i do that?

line3: 
	A {atext=strdup(yytext);}
	B {btext=strdup(yytext);}
	C {ctext=strdup(yytxet);}

Note: if you're grammar is more comlex than this you can lead to
all sorts of comflicts in the compiler - when the parser executes an
action it is `committed' to that branch of the parse tree and cannot
backtrack to resolve any ambiguity that might occur (the classic
problem here is if ... then ... else in programming languages).

Hope this information helps.

AndyC

Yours etc,                      | e-mail: ac1@csug.cs.reading.ac.uk
Captain B.J. Smethwick          |------------------------------------------
in a white wine sauce with      | Nobody agrees with my opinions, though
shallots, mushrooms and garlic. | everybody is entitled to them.

pag@hao.hao.ucar.edu (Peter Gross) (07/30/90)

In article <1990Jul26.175831.1216@uicbert.eecs.uic.edu> woodward@uicbert.eecs.uic.edu writes:
>
>i have been trying to parse a straightforward stream of bytes using the
>c-preprocessors lex & yacc.  being a new user of these utilities, i have
>a couple of problems for which i'd like to solicit your suggestions:
> 
>---------------------------------------------------------------------
>1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
>the code which is generated defaults to stdin and stdout for input and
>output, respectively.  i'd like to redefine these defaults w/o having 
>to hack on the intermediate c-code, since this is a live production 
>project; i'd like to be able to update and modify the program simply by 
>saying "make". 
>any comments or suggestions would be most heartily appreciated.

Many respondents have given excellent suggestions for this question,
the general idea being to the reassign the FILE *'s yyin and yyout.
How about the closely related, but not quite the same problem of
wanting to do processing to the input (or output): eg., to strip off
the high order bit?  I have an application with the following condition:
input from tty port, RAW, contains 7-bit binary info, but generates
even parity (I have no control over this).  Have to strip off the
parity bit.  Any suggestions?  Note that I prefer not to redefine
yyinput() due to the yyunput() problem. (Maybe that's not such a big
deal -- haven't looked into it yet).

On a related, but tangential matter -- anyone know of some public
domain line-discipline (similar to, but much more flexible than bk(4))
for high speed input (no processing 'cept the previously mentioned
parity bit stripping)?  Currently running under SunOS 3.5.

Thanks to everyone who answered the original question -- lot's of good
responses.

--
--peter gross
pag@scg.boulder.co.us	[MX-able]
..ncar!scg!pag		[uucp]
pag%scg@ncar.ucar.edu	[Internet]

martin@mwtech.UUCP (Martin Weitzel) (07/30/90)

In article <2481@onion.reading.ac.uk> ac1@rosemary.cs.reading.ac.uk (Andrew Cunningham) writes:
[Q+A for some problems with lex and yacc; refer to
 previous articles in this thread for more details]

[reading from other source than `yyin']

>[You can] #define yyinput to be
>something which returns the character from your file.  Then, when
>lex.yy.c is compiled, instead of calling the yyinput function your
>#define is called instead.  E.g.
>
>#define yyinput my_yyinput
>int my_yyinput()
>  {
>    /* get the character you want and return it */
>  }
>  
>You'll also have to redefine yyunput(c)  if you want to do this.

From this and one more article in this thread I conclude that there's a
widespread misconception about how things work together. Maybe, the above
works with some versions of lex outthere, but from looking to the details
of the generated lex-source (lex.yy.c) of several systems (XENIX derived
from SysIII; AT&T UNIX SysV; ISC 386/ix derived from SysV), I see that
the above CAN NOT WORK as desired.

Here are some details, how the individual routines and functions call
each other:

        main (from lex-library or own)
         |
         V
        yylex --------------------------+-+
         | |                            | |
  +---+  | |  +-----+                   | |
  |   V  V V  V     |           ......  V V .........
  |  input unput    |           :  yyless yyreject  : in the
  |                 |           :.... | ... | . | ..: lex-library
  |                 |                 V     V   V
  |                 |                 yyunput   yyinput
  |                 |                    |         |
  |                 +--------------------+         |
  +------------------------------------------------+

What we should note first is:

	When the next character is needed in yylex, input
	(NOT yyinput!) is called. Normally, input is #defined
	as macro but you can re-#define it, or #undef-ine it
	and make a function with this name visible when you
	compile and link lex.yy.c.

	There is another macro, unput, that must properly undo
	the actions of input, though unput is only called if your
	regular expressions require look-ahead. (If you are not
	*very* experienced with regular expressions, assume that
	there will *allways* will be look-ahead.)

So if we want to change things here, we must find the right place
for our re-definition, that is, we must write it somewhere into the
".l"-file (with the lex-source), so that it appears *after* the #define
that is automatically generated by lex, but *before* the first use
of input/unput. As the order in which the parts of your ".l"-file
appear in lex.yy.c changed with the evolution of lex, you should
check for the right place if you try this the first time!

The "safest" (ie. most portable) place I've found is right at the
beginning of the second part of the ".l"-file, immediatly before
the first regular expression.

file.l ------------------------------------------------
	first part
%%
%{
#undef	input /* ANSI-C requires that, though */
#undef	unput /* other compilers may do without */
#define	input .... whatever ....
#define unput ... as you like ..
%}
first-regex {
	... action ...
}
second-regex {
	.... action ...
}
	... etc. ......
%%
	third part
------------------------------------------------------------

Now for the tricky part: As you see from the above, there are
some routines in the lex-library which need sometimes to input
or unput characters. These routines *must* use exactly the re-
defined versions of input and unput. How can this routines "link"
to something that is defined as macro?

The solution can again be seen if we carefully study lex.yy.c, the
source generated by lex. At the end of this source we find the
functions, yyinput und yyunput (note the yy-prefix now!), which do no
more and no less than calling input and unput. As the two functions are
compiled where our macro-definitions are visible, they are the "stubs"
thru which the functions in the lex-library access our macros.

Again: Look at the above scheme showing the calling hierarchie and
try to understand the dependencies. Eventuelly study lex.yy.c a bit
further. THEN you might consider writing your own input/unput macros!

>  
>>---------------------------------------------------------------------
[managing yacc projects with make]

>You'll need to specify an explicit rule to do this.  Or, at the
>expense of some processor time you might want to run:
>
>y.tab.h: yacc.y
>	yacc yacc.y
	    ^-- insert "-d"-switch
>	rm y.tab.c
>
>(This shouldn't take too long, yacc is *fast* compared with the cc stage)

First, the above is a good advice in so far as it generally doesn't
hurt to run yacc only for the purpose of generating y.tab.h. Just for
that the "-d" switch should be specified, but IMHO that is simply a
typo here.

What I have to criticize is that y.tab.h (as well as y.tab.c) is
more some kind of "workfile" (IMHO at least) and should be renamed
to something else. So we get:

yacc.h: yacc.y
	yacc -d yacc.y
	mv y.tab.h yacc.h
	rm -f y.tab.c
	   ^^----------- add this for portability, as on some older
	                 systems the exit status of rm is not set
	                 cleanly otherwise (`make' may complain).

(BTW: I'm not quite happy with file names like yacc.y, yacc.h etc. in
the presence of a command called yacc in the same lines here, but I
didn't want to change the original example too much.)

Some fine point I allready mentioned in an erarlier positing follows:
Generally the situation is that typical changes in yacc.y will change
y.tab.c, but not y.tab.h (resp. yacc.h in the above example.) The latter
will only occur if new tokens or new types for the value stack are
introduced, which is by far less frequently done as changes in the
actions of the grammar rules. So it is recommendable to extend the
above further to:

yacc.h: yacc.y
	yacc -d yacc.y
	test -f yacc.h && cmp -s y.tab.h yacc.h || mv y.tab.h yacc.h
	rm -f y.tab.c

Here the mv is only done if yacc.h doesn't exist or is different from
y.tab.h

>>---------------------------------------------------------------------
[making yytext available in grammar actions]

>
>line3: 
>	A {atext=strdup(yytext);}
>	B {btext=strdup(yytext);}
>	C {ctext=strdup(yytxet);}
>	
>Note: if you're grammar is more comlex than this you can lead to
>all sorts of comflicts in the compiler - when the parser executes an
>action it is `committed' to that branch of the parse tree and cannot
>backtrack to resolve any ambiguity that might occur (the classic
>problem here is if ... then ... else in programming languages).

Again the poster tells something very true here ... but forgets to
mention something *much* more important:

Never, again NEVER, again ***NEVER*** depend on an unchanged contents
of yytext in the actions of yyparse(%): In yyparse the calls to yylex which
in turn change the contents of yytext are slightly "asynchroneous", ie.
there might be a read-ahead of one token and yytext doesn't contain what
you think! (Note: There's not ALLWAYS a read-ahead, it just depends if
yyparse needs one to decide what to do further!) The only place where
yytext is valid is in the action-block following the regular expression
in the lex-source.

%: Small note to Chris Torek who some time ago gave a similar
recommendation in one of his postings: You and a few others who
understand the LALR(1) parsing algorithm used by yyparse and hence
can decide under which circumstances read-ahead will occur, are
explicitly excempt from the above "never"-rule :-)

>
>Hope this information helps.
>
>AndyC

Hope this corrections avoid frustration.

P.S. to AndyC: I didn't intend to make your recommendations look bad.
Topics like lex and yacc are really not well covered by the docs, or
at least you have to look very hard to get to the information you need.
Stay in tuned ...
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

hollings@poona.cs.wisc.edu (Jeff Hollingsworth) (07/30/90)

In article <32114@cup.portal.com>, chucka@cup.portal.com (Charles -
Anderson) writes:
|> 
|
|> >---------------------------------------------------------------------
|> >1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
|> >the code which is generated defaults to stdin and stdout for input and
|> >output, respectively.  i'd like to redefine these defaults w/o having 
|> >to hack on the intermediate c-code, since this is a live production 
|> >project; i'd like to be able to update and modify the program simply by 
|> >saying "make". 
|> 
|> You can use freopen, or if you wish another file use dup.
|> 

A cleaner approach is to use the file variable yyin.  Which lex uses to get 
its input.  Thus yyin = fopen(infile, "r") will do the redirection and leave
standard in alone.  Just do this before calling yyparse().


-------------------------------------------------------------------------------
Jeff Hollingsworth					Work: (608) 262-6617
hollings@cs.wisc.edu					Home: (608) 256-4839