[comp.lang.c] two

leland@cs (12/07/90)

I have an application that requires two discrete uses of both lex and
yacc.  Since both lex and yacc have hardcoded names (all with the 'yy'
prefix) for a whole bunch of global symbols, there is an immediate
problem:  somehow the two sets must be hidden from each other and
distinguished for any other code that accesses them.

I've tried this kludge:  create a header file that re-#define's all the
names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
those in set #2 to be named set2yyfoo.  This has worked for me in the
past, but won't in this particular instance because the generated
code includes calls to yyless() and yywrap(), which are in the LEX
library (-ll), the contents of which I cannot rename.  So that doesn't
work.

I've also investigated GNU's replacements for these programs, flex and
bison.  I haven't gotten to the bison documentation yet, but a
quick look at the flex man page implies that these programs still
have the hardcoded global symbol names.  Flex eliminates the need
for the external library, however, so this may solve my problem.  But
it's still a horrible kludge, and it would require that others to whom
I distribute the software maintain flex and/or bison.  I'm looking
for something better.

I may be able to use some special loader options to 'hide' the yy*
symbols when compiling the object modules that contain them, but
the procedure for doing is not obvious nor it is standardized across
loader versions.

Has anybody listening gotten into this before?  Can you offer any
suggestions?  Thanks for whatever you may know.

It is difficult to accept that, considering how long these programs have
been around, none of the implementers ever thought to make these globals
static or to add a command-line option to rename them.

Leland Woodbury
-- 
ARPANET/INTERNET: leland@cs.columbia.edu
	  USENET: ...!columbia!cs.columbia.edu!leland
	  BITNET: leland%cs.columbia.edu@cuvmb
	  USMAIL: Columbia Univ., 457 CS, 500 W. 120 St., NYC 10027-6699

gwyn@smoke.brl.mil (Doug Gwyn) (12/07/90)

In article <1990Dec6.200944.13037@cs.columbia.edu>, leland@cs writes:
- I've tried this kludge:  create a header file that re-#define's all the
- names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
- those in set #2 to be named set2yyfoo.  This has worked for me in the
- past, but won't in this particular instance because the generated
- code includes calls to yyless() and yywrap(), which are in the LEX
- library (-ll), the contents of which I cannot rename.  So that doesn't
- work.

But it almost does -- Since "lex" produces C source, you can #define
set1yyless yyless, etc. before the lex output to be compiled, thereby
turning these selected reference back into calls to the shared library
functions.  (I assume the lex library does not maintain internal state.)

mpledger@cti1.UUCP (Mark Pledger) (12/10/90)

There was a recent article in the C User's Journal about 3 months ago
on this topic of using multiple parsers within the same executable.  Somebody
might want to check it out.



-- 
Sincerely,


Mark Pledger

--------------------------------------------------------------------------
CTI                              |              (703) 685-5434 [voice]
2121 Crystal Drive               |              (703) 685-7022 [fax]
Suite 103                        |              
Arlington, VA  22202             |              mpledger@cti.com
--------------------------------------------------------------------------

martin@mwtech.UUCP (Martin Weitzel) (12/10/90)

In article <14674@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <1990Dec6.200944.13037@cs.columbia.edu>, leland@cs writes:
>- I've tried this kludge:  create a header file that re-#define's all the
>- names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
>- those in set #2 to be named set2yyfoo.  This has worked for me in the
>- past, but won't in this particular instance because the generated
>- code includes calls to yyless() and yywrap(), which are in the LEX
>- library (-ll), the contents of which I cannot rename.  So that doesn't
>- work.
>
>But it almost does -- Since "lex" produces C source, you can #define
>set1yyless yyless, etc. before the lex output to be compiled, thereby
>turning these selected reference back into calls to the shared library
>functions.  (I assume the lex library does not maintain internal state.)

Unfortunately things are more complicated. Here is an excerpt from
`nm /usr/lib/libl.a' (UNIX Sys V):
----------------------------------------------------------------------
Symbols from /usr/lib/libl.a[reject.o]:

Name                  Value   Class        Type         Size   Line  Section

reject.c            |        | file |                  |      |     |
yyreject            |       0|extern|            int( )|   270|     |.text
yyracc              |     272|extern|            int( )|   154|     |.text
yyinput             |       0|extern|                  |      |     |
yyleng              |       0|extern|                  |      |     |
yytext              |       0|extern|                  |      |     |
yylsp               |       0|extern|                  |      |     |
yyolsp              |       0|extern|                  |      |     |
yyfnd               |       0|extern|                  |      |     |
yyunput             |       0|extern|                  |      |     |
yylstate            |       0|extern|                  |      |     |
yyprevious          |       0|extern|                  |      |     |
yyoutput            |       0|extern|                  |      |     |
yyextra             |       0|extern|                  |      |     |
yyback              |       0|extern|                  |      |     |


Symbols from /usr/lib/libl.a[yyless.o]:

Name                  Value   Class        Type         Size   Line  Section

yyless.c            |        | file |                  |      |     |
yyless              |       0|extern|            int( )|   107|     |.text
yyleng              |       0|extern|                  |      |     |
yytext              |       0|extern|                  |      |     |
yyunput             |       0|extern|                  |      |     |
yyprevious          |       0|extern|                  |      |     |


Symbols from /usr/lib/libl.a[yywrap.o]:

Name                  Value   Class        Type         Size   Line  Section

yywrap.c            |        | file |                  |      |     |
yywrap              |       0|extern|            int( )|    16|     |.text
----------------------------------------------------------------------

The problem is not some internal state of these functions, but that they
expect a number of external `yyfoo'-symbols, and there is no way to make
them access the `right' ones without rewriting the functions.

So, how hard would it be to rewrite them?

The trivial case is `yywrap'. I hope AT&T doesn't sue me because of reverse
engineering :-), but this function is a one-liner.

	yywrap() { return 1; }

The two other functions (`yyless' and `yywrap') may have complicated
interactions with a lot of globals, so the best solution is to avoid
them and do manually what is required. This is simple in case of
`yyless', since it is usually used to push back parts of `yytext' to
the input stream. This can also be done by with `unput()'-macro in a loop
(The library version of `yyless' does this via the `yyunput()'-function but
this function simply calls `unput()' which may have been redefined -
have a look into `lex.yy.c' to understand how things work together.)
In addition the "original" `yyless' adjusts `yytext' and `yyleng'
accordingly. The part that still worries me is the reference to
`yyprevious' within `yyless'. To be sure, you should probably disassemble
the library version of `yyless' - it's not that large.

``yyreject' should best be completly avoided because it plays with a lot
of external symbols (the poster of the original question is lucky here, but
others may understand this as a hint to use REJECT - which in turn calls
yyreject() - only as a last resort).

BTW: Another option is to have a common lexer for both sets of input
symbols and use start conditions in lex to select the appropriate ones.
It's a pitty that start conditions are insufficiently explained in the
common documentation of lex (if they are mentioned at all).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

polfer@b11.ingr.com (? Polfer) (12/11/90)

In article <1990Dec6.200944.13037@cs.columbia.edu> leland@cs () writes:
>I have an application that requires two discrete uses of both lex and
>yacc. [...]
>I've tried this kludge:  create a header file that re-#define's all the
>names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
>those in set #2 to be named set2yyfoo.

Trying to redefine all of the symbols can be a real pain in the neck.  Try
to use "#define yy set1yy" or "#define yy set2yy" at the top of your
different parser files, and then have the parser files include the lex
code (catching all of the lex symbols).

>[...] code includes calls to yyless() and yywrap(), which are in the LEX
>library (-ll), the contents of which I cannot rename.  So that doesn't
>work.

The LEX library (libl.a) includes default objects for yyless, yywrap, reject,
main, and allprint (a routine used by the builtin debug code).  Each of these
routines are fairly trivial, and can be replaced by you with little effort.
For example, the yywrap code in libl.a is equivalent to the following:

    int yywrap ()
    {
        return(1);
    }

yyless is about as tough, you simply use the unput routine to push the last
few characters of yytext back onto the input stream.  Note that there is a
stated limit of 100 characters on unput (although I've observed 200).  Also,
remember that in your lex/yacc files, you must use yytext (not set1yytext)
because the macro definition will take care of the translation.  As far as
unput, input, and output are concerned, LEX usually defines these as macros
so there is no danger unless you have implemented the routines as functions
and undefined the macros.  Replacing the routines with your own will allow
you to kill the link dependancy on libl.a.

>I've also investigated GNU's replacements for these programs, flex and
>bison.  I haven't gotten to the bison documentation yet, but a
>quick look at the flex man page implies that these programs still
>have the hardcoded global symbol names.

Flex and Bison can produce code for "isolated" modules (for lack of a better
name), but Bison is from the Free Software Foundation so you need to follow
their usual conventions (your code becomes free, etc).  I'm not sure about
Flex, but I didn't think it had the same restrictions (it's from a California
University, Berkely maybe?).

    Anyway, hope the above helps!

--

Dan Polfer                             ...uunet!ingr!b29!dap!dan  (UUCP)
Intergraph Corporation                 b29!dap!dan@ingr.com       (Internet)
Huntsville, Al                         (205) 730-6154

martin@mwtech.UUCP (Martin Weitzel) (12/11/90)

In article <332@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes:
>There was a recent article in the C User's Journal about 3 months ago
>on this topic of using multiple parsers within the same executable.  Somebody
>might want to check it out.

It was in the July 1990 issue (volume 8, number 7), but it doesn't adress
the problem with routines from the lex-library.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83