[comp.lang.c] lex & yacc - cupla questions

woodward@uicbert.eecs.uic.edu (07/27/90)

i have been trying to parse a straightforward stream of bytes using the
c-preprocessors lex & yacc.  being a new user of these utilities, i have
a couple of problems for which i'd like to solicit your suggestions:
 
---------------------------------------------------------------------
1.)  how does one redefine the i/o in a yacc/lex piece of code?  i.e.
the code which is generated defaults to stdin and stdout for input and
output, respectively.  i'd like to redefine these defaults w/o having 
to hack on the intermediate c-code, since this is a live production 
project; i'd like to be able to update and modify the program simply by 
saying "make". 

---------------------------------------------------------------------
2.)  how can one get the automagically-defined #defines, which can
normally be created from yacc with the -d flag, to come out when you
use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
source files, respectively, and i have object files defined in my makefile 
called lex.o and yacc.o such that "make" follows default rules to create 
these from the aforementioned source files.   

---------------------------------------------------------------------
3.)  if i have a yacc construct such as:
 
line3	: 	A B C
		{  yacc action sequence }


which indicates that the construct line3 is composed of the 3 tokens
A B and C, in that order ...
 
how can i now assign the values of A, B, and C into local vars of my
choice?  the problem lies in the fact that each of A B and C represent
three calls to lex, and if i pass back a pointer to yytext[] from lex, 
i only retain the value of the last token in the sequence, in this case C, 
when i get to the action sequence in my yacc code.  what if i want to 
be able to select the EXACT ascii tokens for each of A B and C above in 
my yacc code.  how do i do that?


any comments or suggestions would be most heartily appreciated.

jp woodward
univ of ill at chicago 
312-996-0939

880716a@aucs.uucp (Dave Astels) (07/28/90)

In article <1990Jul26.175545.959@uicbert.eecs.uic.edu> woodward@uicbert.eecs.uic.edu writes:
>2.)  how can one get the automagically-defined #defines, which can
>normally be created from yacc with the -d flag, to come out when you
>use a makefile?  i.e. suppose i have lex.l and yacc.y lex and yacc
>source files, respectively, and i have object files defined in my makefile 
>called lex.o and yacc.o such that "make" follows default rules to create 
>these from the aforementioned source files.

Check your implementation of MAKE for the rule to go from .y to .o
there should be a macro YACCFLAGS or such.  Add a '-d' to that (in the
default file or maybe your makefile.

>3.)  if i have a yacc construct such as:
> 
>line3	: 	A B C
>		{  yacc action sequence }
>
>how can i now assign the values of A, B, and C into local vars of my
>choice?  the problem lies in the fact that each of A B and C represent
>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>i only retain the value of the last token in the sequence, in this case C, 
>when i get to the action sequence in my yacc code.

return a copy of yytext, not a pointer to it.

-- 
"I liked him better before he died" - McCoy, ST V
===============================================================================
Dave Astels            |  Internet: 880716a@AcadiaU.CA
PO Box 835, Wolfville, |  Bitnet:   880716a@Acadia

timk@xenitec.on.ca (Tim Kuehn) (07/31/90)

(Dave Astels) writes:
>woodward@uicbert.eecs.uic.edu writes:
>>3.)  if i have a yacc construct such as:
>> 
>>line3	: 	A B C
>>		{  yacc action sequence }
>>
>>how can i now assign the values of A, B, and C into local vars of my
>>choice?  the problem lies in the fact that each of A B and C represent
>>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>>i only retain the value of the last token in the sequence, in this case C, 
>>when i get to the action sequence in my yacc code.
>
>return a copy of yytext, not a pointer to it.

That still doesn't solve the problem as you don't know *which* rule has 
fired,  and where to store the text value associated with tokens A B C until 
after you've made three (or more) calls to yylex. However, provision for 
handling this in provided with the yylval, which can be used with the 
%union {} declaration in yacc, and assigned in the action associated with 
the regexp's in lex. In the case where text values associated with these 
tokens, one would include a char pointer declaration in the %union statement, 
(say char *cval) copy the text value to a malloc'd space, and assign 
the pointer to yylval.cval before returning from lex. This will be saved 
on the value stack yacc maintains as it goes through it's states, and can 
be retreived by making reference to $$, $1, $2, or $3 (the equivalent 
variable names for the rule, first, second, and third elements of the rule 
respectively). 

Check your yacc/lex manual for more details. 

------------------------------------------------------------------------------
Timothy D. Kuehn	TDK Consulting Services  "Where Quality is Guaranteed"
timk@xenitec.on.ca 	uunet!watmath!maytag!xenitec!timk
119 University Ave. East, Waterloo, Ont., Canada. N2J 2W1 519-888-0766
if no answer 519-742-2036 (w/ans mach) fax: 519-747-0881. Contract services
available in Dos/Unix/Xenix - SW & HW. Clipper, Foxbase/Pro, C, Pascal,
Fortran, Assembler etc. *Useable* dBase program generator under construction
------------------------------------------------------------------------------

ray@vantage.UUCP (Ray Liere) (07/31/90)

I have found the book "Introduction to Compiler Construction with UNIX"
by Schreiner and Friedman (ISBN 0-13-474396-2) to be very helpful --
they start with small projects and then work in to a good-sized compiler.


Ray Liere
Vantage Consulting and Research Corporation
voice: (503)657-7294
uucp: uunet!nwnexus.WA.COM!vantage!ray
       -or-
      uunet!nwnexus!vantage!ray
       -or-
      hplabs!hpubvwa!hpupora!vantage!ray
Internet: ray%vantage@nwnexus.WA.COM

martin@mwtech.UUCP (Martin Weitzel) (08/06/90)

In article <45960003@vantage.UUCP> ray@vantage.UUCP (Ray Liere) writes:
>I have found the book "Introduction to Compiler Construction with UNIX"
>by Schreiner and Friedman (ISBN 0-13-474396-2) to be very helpful --
>they start with small projects and then work in to a good-sized compiler.

This book has definetly its strengths and weakness.

If you are completly new to compiler construction and have no plans
to spend half your live reading the classical "Compilers - Principles,
Techniques, and Tools" by Aho, Sethi, and Ullman, (ISBN 0-201-10088-6),
the book of Schreiner and Friedman is surely an easier road to walk.
But be aware that Schreiner+Friedmans book centers around writing a
"Small-C"-Compiler, which is useful as a "Programming-Exercise", but
not as a real product. (I'm not saying that the latter was intended
or the book or promises this.)

Sometimes they use programming conventions that are a little unportable,
but what I allways appreciated from Schreiner+Friedmans book were the
"cookbook-style" rules for `error' symbols in the grammar and `yyerrok'
in the actions. What I definetely missed were the start conditions for
lex. Instead they proposed and worked out at length how to
recognize C-style comments using regular expressions only, which is
not only much more difficult and less readable, but dangerous too, as
yytext is limited to YYLMAX characters (usually 200).

Explanations about how yyparse does its work are given in detail.
If understanding this is not the problem (eg. if you have access to
the lex+yacc tutorials which were printed in the "Unix Programmers
Manual" in the days of V7-Unix), I'd rather recommend to start with
"The Unix Programming Environment" by Kernighan+Pike (ISBN 0-13-937681-X).
It has a large chapter where they describe the development of a small
language called the "hoc" (similar to the UNIX "bc") in several steps,
starting with an interpreter for expressions, later adding flow-control
(if, while) and changing the whole thing to a "half-compiler".

In my very personal opinion Kernighan+Pike gives you a better overview
over the general techniques which must be mastered to write a compiler
*and* you have less to read. Note that both books are only partially
dedicated to lex + yacc, as are several other books, which contain small
chapters about lex + yacc(%). Neither of each is sufficient if you want
to develop a "product" and neither has much treatment of "other
applications" (ie. if you want to write something different than a
compiler with lex + yacc). For "real" compiler projects there are more
complete books. The new book of Alan Holub, "Compiler Design in C"
(ISBN: 0-13-155151-5) looks promising in this respect, but I haven't
had the time for a closer look.

%: If anybody has allready started a list of books which contain
at least one chapter about lex + yacc, I'm willing to contribute what
I know - pointers to such books as well as "personal opinion" about
the quality or usefulnes of the material. Sorry that I don't have the
time to start such a list myself within the next few weeks.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

scott@bbxsda.UUCP (Scott Amspoker) (08/07/90)

In article <1990Jul30.202119.2768@xenitec.on.ca> timk@xenitec.UUCP (Tim Kuehn) writes:
>(Dave Astels) writes:
>>woodward@uicbert.eecs.uic.edu writes:
>>>3.)  if i have a yacc construct such as:
>>> 
>>>line3	: 	A B C
>>>		{  yacc action sequence }
>>>
>>>how can i now assign the values of A, B, and C into local vars of my
>>>choice?  the problem lies in the fact that each of A B and C represent
>>>three calls to lex, and if i pass back a pointer to yytext[] from lex, 
>>>i only retain the value of the last token in the sequence, in this case C, 
>>>when i get to the action sequence in my yacc code.

Sometimes you got to get your fingernails a little dirty.  If you wish to
return tokens from lex that have arbitrarily long lexemes (such as
string litererals or symbol names) you will have to setup your own
dynamic buffering mechanism to retain such values.
-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

staff@cadlab.sublink.ORG (Alex Martelli) (08/12/90)

martin@mwtech.UUCP (Martin Weitzel) writes:
	[...lots of good advice omitted...]
>If understanding this is not the problem (eg. if you have access to
>the lex+yacc tutorials which were printed in the "Unix Programmers
>Manual" in the days of V7-Unix), I'd rather recommend to start with
	[...lots of more good advice omitted...]

Just an aside, note that those tutorials, and gobs of other nice stuff,
can still be obtained quite easily - in Europe from EUUG, in the U.S.,
I believe, from Usenix - as part of the 4.3 BSD Manual Set.  A GREAT
buy at 50 pounds, even if you don't even CARE about 4.3 BSD at all,
just for the great, timeless Unix stuff.  (Kernighan & Pike is an even
BETTER buy, I'll admit... so get both!).

-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only; any time of day or night).