[comp.compilers] Two pass compiler using YACC?

wsineel@info.win.tue.nl (e.vriezekolk) (08/23/90)

Hi,

We are working on a compiler, using yacc. The compiler will be two-pass,
and we have a different .y file for both passes.  Each .y file is
translated by yacc to pass.1.C and pass.2.C (we are using C++).

The problems come during link-time, for ld, obviously, complains about
multiple defined symbols (such as yylval and yyparse).

This must be a traditional problem. How is it solved?
-- 
Eelco Vriezekolk, wsineel@win.tue.nl
[I hope the grammars for the two passes are the same, and just the { }
actions are different.  I always do the obvious thing, if(pass1)... else ...
in the action routines.  In about half the cases, the action routines do
something simple like look up an identifier in the symbol table, so the
conditional code is buried in the lower level routine.  Having separate yacc
grammars seems to me to be a poor idea if for no other reason than that it is
a big problem to keep the two files in sync when you make a grammar change.
-John]

-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

mkie@vlsivie.at (Inst.f.Techn.Informatik) (08/23/90)

In article <1364@svin02.info.win.tue.nl>, wsineel@info.win.tue.nl (e.vriezekolk) writes:
> The problems come during link-time, for ld, obviously, complains about
> multiple defined symbols (such as yylval and yyparse).
> 
> This must be a traditional problem. How is it solved?
The  symbols which make you clash can be renamed by a simple sed 
script 
sed -e 's/yyparse/something_parse/g' \
    -e 's/yylex/something_lex/g' \
    -e 's/yylval/something_lval/g' \
    -e 's/YYSTYPE/something_TYPE/g' \

 [...] you get the idea. 
You will also have to modify the y.tab.h file and the file with
yylex() 

> [I hope the grammars for the two passes are the same, and just the { }
> actions are different.  I always do the obvious thing, if(pass1)... else ...
> in the action routines.  In about half the cases, the action routines do
> something simple like look up an identifier in the symbol table, so the
> conditional code is buried in the lower level routine.  Having separate yacc
> grammars seems to me to be a poor idea if for no other reason than that it is
> a big problem to keep the two files in sync when you make a grammar change.
> -John]
That's the way I do it also, but sometimes you need two parsers in the 
same executable, so this problem does exist at times

				bye,
					mike

Michael K. Gschwind             mike@vlsivie.at
Technical University, Vienna    mike@vlsivie.uucp
Voice: (++43).1.58801 8144      e182202@awituw01.bitnet
Fax:   (++43).1.569697
[Similar suggestions from zs@munginya.cs.mu.OZ.AU (Zoltan Somogyi),
jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot), and vramsey@NCoast.ORG
(Cedric Ramsey).]

-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

tbr@virgil.tfic.bc.ca (Tom Rushworth) (08/25/90)

In article <1364@svin02.info.win.tue.nl> wsineel@info.win.tue.nl (e.vriezekolk)
says:
>The problems come during link-time, for ld, obviously, complains about
>multiple defined symbols (such as yylval and yyparse).


I have a number of library routines that need to parse different forms of
input, and I ran into the same problem (multiple yacc parsers in one program).
The brute force solution I used was in my makefile :

#-------------------------------------------------------
parse.c: definition.y
	yacc definition.y
	sed -f Makestatic <y.tab.c >$@
	rm y.tab.c
#-------------------------------------------------------
where Makestatic is:
#-------------------------------------------------------
/^extern int yychar/d
/^extern short yyerrflag/d
/^YYSTYPE/s//static &/
/^short yyexca/s//static &/
/^short yyact/s//static &/
/^short yypact/s//static &/
/^short yypgo/s//static &/
/^short yyr1/s//static &/
/^short yyr2/s//static &/
/^short yychk/s//static &/
/^short yydef/s//static &/
/^int yydebug/s//static &/
/^int yychar/s//static &/
/^int yynerrs/s//static &/
/^short yyerrflag/s//static &/
/^yyparse()/s//static &/
#-------------------------------------------------------

I then #include "parse.c" in the controlling code (instead of compiling it as
a separate module) and supply static versions of any of the externs deleted
in the script.  This all works (at least on my Sun3s & 4s), the only price you
pay is two (or more) copies of the parsing code.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

bart@videovax.tv.tek.com (Bart Massey) (08/25/90)

In article <1364@svin02.info.win.tue.nl> wsineel@info.win.tue.nl (e.vriezekolk) writes:
> We are working on a compiler, using yacc. The compiler will be two-pass,
> and we have a different .y file for both passes.
...
> The problems come during link-time, for ld, obviously, complains about
> multiple defined symbols (such as yylval and yyparse).
...
> [I hope the grammars for the two passes are the same, and just the { }
> actions are different.  I always do the obvious thing, if(pass1)... else ...
> in the action routines ... > it is a big problem to keep the two files in
> sync when you make a grammar change. -John]

Hmmm.  It seems to me to depend a lot on what is meant by "two-pass".  In the
traditional usage, where the second pass is merely used to resolve forward
references, I'd agree completely.  However, I have heard the phrase
"two-pass" used to describe compilers where the first pass outputs a very
different intermediate form which the second pass parses.  In this case, one
could legitimately want two entirely different grammars.

There are several ways to accomplish this, none of them trivial.  One is to
get a copy of GNU BISON and use the "semantic parsers" feature, which does
what you want.  It needs some slight debugging, however, and is really hard
to figure out.  Another is to get a copy of Berkeley YACC, and hack it up to
do the right thing, which would probably be easy, and would probably be
greatly appreciated by others.  The third, and by far the most traditional,
is to write a sed script which hacks the YACC output so that all the names
are unique.  This is usually some baroque variant on replacing "yy" with
some more unique prefix.

Finally, and this is my favorite, you can just hose the intermediate form
through a pipe to a separate program.  This opens up a lot of other neat
possibilities...

It really is a shame that none of the 3 parser generators mentioned above
generates modular parsers very well.  Ah well.  Hope this helps,

					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

sja@sirius.hut.fi (Sakari Jalovaara) (08/26/90)

>>The problems come during link-time, for ld, obviously, complains about
>>multiple defined symbols (such as yylval and yyparse).

[solution with a sed script]

How about an include file, say, yyrename.h (warning - untested code
ahead):

	#define yyGLUE1(x,y) x ## y
	#define yyGLUE(x,y)  yyGLUE1 (x, y) /* There may be an
					     * easier way...
					     */
	#define yyparse   yyGLUE (yyparsername,_yyparse)
	#define yylex     yyGLUE (yyparsername,_yylex)
	#define yychar    yyGLUE (yyparsername,_yychar)
	...etc...

and in parse.y:

	%{
	#define yyparsername my_parser
	#include "yyrename.h"

	my_parser_yylex () { ...lexer goes here... }
	%}
									++sja
[No question, there are arbitrarily ugly ways to make this work. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

meissner@osf.org (08/28/90)

In terms of ways of having two or more parsers within a program, one
way that I haven't seen mentioned is the way I did it when
implementing Data General's AOS/VS C compiler.  I implemented the
compiler and preprocessor as one integrated program, and was left with
the decision on how to handle #if.  I didn't particularly want to have
two sets of parser tables in the compiler, because of difficulities in
convincing the parser generator to cooperate (this was a home grown
parser generator that had it's own share of problems).

The solution I came up with was to make the top level rule, something
like:

	top:   grammar1
	     | SPECIAL_TOKEN grammar2 ;

That is, having the lexer spit out a special token to go to second
grammar rather than the first.

Of course if you have a recursive parser (such as my case processing
#if's within the normal handling of the grammar), you have to save all
of the parser state and restore it when you exit the parser.  Given
that the parser generator I used just spit out the tables, and I had
to write the parser skeleton, it was easy to do.  With YACC where the
skeleton is provided for you, and it doesn't support multiple grammars
like BISON does, you have some work to do....

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142
[Good point, I once did that too.  Making yacc support recursive parsing
isn't very hard, you just have to make a bunch of static arrays and
variables automatics. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

jar@florida.eng.ileaf.com (Jim Roskind x5570) (09/03/90)

>   From: wsineel@info.win.tue.nl (e.vriezekolk)
>   
>   We are working on a compiler, using yacc. The compiler will be two-pass,
>   and we have a different .y file for both passes.  Each .y file is
>   translated by yacc to pass.1.C and pass.2.C (we are using C++).
>   
>   The problems come during link-time, for ld, obviously, complains about
>   multiple defined symbols (such as yylval and yyparse).
>   
>   This must be a traditional problem. How is it solved?

IMHO the cleanest solution lies in use of the C preprocessor (as do so
many solutions :-) ).

Suppose that the only multiply defined external were yyparse and yyerror.

Create a file called "pass1.h" containing the lines:

#define yyparse pass1_yyparse
#define yyerror pass1_yyerror

Then simply #include this file into the initial section of your yacc
grammar.  This placement will cause the #include directive to appear
sufficiently early in the y.tab.c file that *all* references to
yyparse will be translated appropriately.  External calls to the first
pass should use the new name.

>   [I hope the grammars for the two passes are the same, and just the { }
>   actions are different.  ... -John]

I actually have examples where the grammar are distinct phases of
processing, and have very little connection.  For example, in parsing
C, there is 1) the preprocessing parser, 2) the preprocessing
expression evaluation parser, and 3) the C syntax parser.  I actually
implement each of the phases using a yacc based parser, and hence the
grammars are quite distinct.  In the most general case, when the
phases of translation are very "orthogonal", there is a need for
multiple distinct parsers, and the trick for making yacc work in such
an environment is quite useful. (It also applies to multiple flex/lex
based lexers).


Jim Roskind
Independent Consultant
(407)729-4348 or (617)577-9813 x5570
jar@hq.ileaf.com or ...!uunet!leafusa!jar
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

meissner@osf.org (09/06/90)

| [re using two yacc grammars in the same program]
| IMHO the cleanest solution lies in use of the C preprocessor (as do so
| many solutions :-) ).
| 
| Suppose that the only multiply defined external were yyparse and yyerror.
| 
| Create a file called "pass1.h" containing the lines:
| 
| #define yyparse pass1_yyparse
| #define yyerror pass1_yyerror
| ...

The problem with this type of solution is that a different version of
yacc may define more yy<xxx> external names.  If this is the case, you
are hosed.  Just because the yacc that your computer vendor only uses
those two symbols today, means diddly squat when you load the new
release, or go to a different machine.

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.