[comp.lang.c++] Using OO Languages for Compilers/Interpreters

Will@cup.portal.com (Will E Estes) (04/09/89)

I am curious about how well the object-oriented paradigm applies to
compiler and interpreter development.  I am specifically interested
in hearing about any projects using C++, but anyone who has ever
been involved a compiler/interpreter development project using any
object-oriented language please share your experiences.  Would
object-oriented techniques have more benefits for a typeless language
as opposed to one that is strongly typed?  What kinds of performance
degradation might you expect in the parser/code generator/optimizer/etc.?
If you have written a parser/interpreter in C++ can you share the code?
Finally, is there a YACC variant (YACC++ ?) that generates C++ code
for the parser?
Thanks,
Will

fischer@iesd.dk (Lars P. Fischer) (04/09/89)

In article <16915@cup.portal.com> Will@cup.portal.com (Will E Estes) writes:

>Finally, is there a YACC variant (YACC++ ?) that generates C++ code
>for the parser?

The code generated by bison (aka GNU YACC) works fine with C++, at
least with the GNU C++ compiler. Simply write C++ code in the action
parts, and compile with g++. No problem. Using YACC, you get a couple
of warnings, but it works ok.

Using LEX is more tricky, but it can be done (kludge available :-).
Using flex (Vern Paxson's fast LEX) is far better. You'll have to make
a simple patch to the skeleton lexer, but that's easy to spot. I have
a patch if anyone is interested (I've mailed it to Vern, too).

The fact that C++ is pseudo-ANSI-C compatible is sometimes a big help.
The fact that so much high-quality free software is available is
definitley a big help :-). Thanks, all.

/Lars
--
Lars Fischer,  fischer@iesd.dk, {...}!mcvax!iesd!fischer
Any sufficiently advanced technology is indistinguishable from magic.
			-- Arthur C. Clarke

rme@wdl1.UUCP (Richard M Emberson) (04/10/89)

	A small (but by far the most important point) has been missed.
	Simply to have  yacc or bison produce C++ code is not important,
	but what is important is that their outputs are c++ classes.
	This allows one to have multiple bnf grammers in the same
	executable. (Try linking the output of two yacc grammers in the
	same program - the yacc/bison drivers don't allow multiple
	grammar tables).

	With yacc/bison one dare not use them to make a library because
	any user of the library can not then use yacc/bison in the
	application linking that library.

	The yacc++ I produced from yacc yeilds instances of a class
	(called yacc) defined in a c++ header file. The grammar tables
	are passed in at instance creation. Thus multiple instances can
	exist in the same program and at the same time.

	The same can be done for lex/flex.

								Richard M. Emberson

fischer@iesd.dk (Lars P. Fischer) (04/12/89)

In article <3690007@wdl1.UUCP> rme@wdl1.UUCP (Richard M Emberson) writes:
>	A small (but by far the most important point) has been missed.
>	Simply to have  yacc or bison produce C++ code is not important,
>	but what is important is that their outputs are c++ classes.
>	This allows one to have multiple bnf grammers in the same
>	executable. (Try linking the output of two yacc grammers in the
>	same program - the yacc/bison drivers don't allow multiple
>	grammar tables).

From the BISON manual:

  Most programs that use Bison parse only one language and therefore
  contain only one Bison parser.  But what if you want to parse more
  than one language with the same program?  Here is what you must do:
  
     * Make each parser a pure parser (*note Pure Decl::.).  This gets
       rid of global variables such as `yylval' which would otherwise
       conflict between the various parsers, but it requires an
       alternate calling convention for `yylex' (*note Pure Calling::.).
  
     * In each grammar file, define `yyparse' as a macro, expanding
       into the name you want for that parser.  Put this definition in
       the C declarations section (*note C Declarations::.).  For
       example:
  
            %{
            #define yyparse parse_algol
  .....

The same could be done for (f)lex. 

This is not to say that a C++ parser object would be a bad thing, only
to point out that it can be done with ordinary function, and that
bison can do it, now.

/Lars
--
Copyright 1989 Lars Fischer; you can redistribute only if your recipients can.
Lars Fischer,  fischer@iesd.dk, {...}!mcvax!iesd!fischer
Any sufficiently advanced technology is indistinguishable from magic.
			-- Arthur C. Clarke

murren@oman.steinmetz (Brian Murren) (04/14/89)

In article <1706@iesd.dk> fischer@iesd.dk (Lars P. Fischer) writes:
>
>In article <3690007@wdl1.UUCP> rme@wdl1.UUCP (Richard M Emberson) writes:
>>	...text deleted... (Try linking the output of two yacc grammers in the
>>	same program - the yacc/bison drivers don't allow multiple
>>	grammar tables).
>
> Lars Fischer replies:
>     * Make each parser a pure parser (*note Pure Decl::.) ...
>  
>     * In each grammar file, define `yyparse' as a macro ...

A more primitive mechanism :-) than either of the above is to take the
generated parser file (and associated header) and run them through the
stream editor (sed).  For example:

	yacc -d foo.y  # or bison

	sed -e 's/yy/my_parser_/g' y.tab.h > myparser.h
	sed -e 's/yy/my_parser_/g' y.tab.c > myparser.c
	rm y.tab.h y.tab.c

Applying a different filter for each parser will yield the desired
result since each grammar table will exist in its own name space. I have
an application where 5 different parsers peacefully coexist.


Brian Murren
GE Corporate Research and Development Center, Schenectady, NY

Internet:   murren@ge-crd.arpa OR murren@crd.ge.com
UUCP:       murren@oman.steinmetz.ge.com OR uunet!steinmetz!oman!murren

-------------- All opinions expressed are strictly my own -----------------

krohn@u1100a.UUCP (Eric Krohn) (04/14/89)

In article <3690007@wdl1.UUCP> rme@wdl1.UUCP (Richard M Emberson) writes:
] 
] 	The yacc++ I produced from yacc yields instances of a class
] 	(called yacc) defined in a c++ header file. The grammar tables
] 	are passed in at instance creation. Thus multiple instances can
] 	exist in the same program and at the same time.

At least in System V yaccs, you also need different versions of yyparse
since the rule actions get interpolated into a giant switch statement
at the bottom of yyparse.  A fine use for virtual functions.
(I seem to recall that the first UNIX machine I used (v6 or v7)
had a separate function for the actions; so I believe that yyparse
could be inherited but the auxiliary function would have to be virtual.)

I have an abstract Yacc class that looks like:

class	Yacc	{
  public:
	int yydebug;			/* set to 1 to get debugging */
	int yychar;			/* current input token number */
	YYSTYPE yylval;			/* current token value */
	FILE	*yyin;			/* input file */
	LexLocation current;		/* current file name and line number */

	Yacc (FILE *input, long linenumber, const char *filename);

	virtual	int	yyparse ();
	virtual	int	yylex ();
	virtual	int	yyerror (const char *msg);

	long	yylineno ()	{ return (current.linenumber()); }
	};

This class depends on a straightforward sed script to make all the
yacc tables static, rename yyparse to be a member function, and omit
the global definitions of yychar and yydebug.  C++ automagically
takes care of the rest.

Multiple instances of a class derived from Yacc will run the same
parser and actions.  Different derived classes can run completely
different parsers and actions.

I originally wrote this class with the expectation of having one or
more yacc sub-parsers to handle some ambiguities in the topmost
parser, but I have not yet resorted to this because I've been
able to patch things up in the lexer.

-- 
--
Eric J. Krohn
krohn@ctt.ctt.bellcore.com  or  {bcr,bellcore}!u1100a!krohn
Bell Communications Research,	444 Hoes Ln,    Piscataway, NJ 08854