leland@cs (12/07/90)
I have an application that requires two discrete uses of both lex and yacc. Since both lex and yacc have hardcoded names (all with the 'yy' prefix) for a whole bunch of global symbols, there is an immediate problem: somehow the two sets must be hidden from each other and distinguished for any other code that accesses them. I've tried this kludge: create a header file that re-#define's all the names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all those in set #2 to be named set2yyfoo. This has worked for me in the past, but won't in this particular instance because the generated code includes calls to yyless() and yywrap(), which are in the LEX library (-ll), the contents of which I cannot rename. So that doesn't work. I've also investigated GNU's replacements for these programs, flex and bison. I haven't gotten to the bison documentation yet, but a quick look at the flex man page implies that these programs still have the hardcoded global symbol names. Flex eliminates the need for the external library, however, so this may solve my problem. But it's still a horrible kludge, and it would require that others to whom I distribute the software maintain flex and/or bison. I'm looking for something better. I may be able to use some special loader options to 'hide' the yy* symbols when compiling the object modules that contain them, but the procedure for doing is not obvious nor it is standardized across loader versions. Has anybody listening gotten into this before? Can you offer any suggestions? Thanks for whatever you may know. It is difficult to accept that, considering how long these programs have been around, none of the implementers ever thought to make these globals static or to add a command-line option to rename them. Leland Woodbury -- ARPANET/INTERNET: leland@cs.columbia.edu USENET: ...!columbia!cs.columbia.edu!leland BITNET: leland%cs.columbia.edu@cuvmb USMAIL: Columbia Univ., 457 CS, 500 W. 120 St., NYC 10027-6699
gwyn@smoke.brl.mil (Doug Gwyn) (12/07/90)
In article <1990Dec6.200944.13037@cs.columbia.edu>, leland@cs writes:
- I've tried this kludge: create a header file that re-#define's all the
- names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all
- those in set #2 to be named set2yyfoo. This has worked for me in the
- past, but won't in this particular instance because the generated
- code includes calls to yyless() and yywrap(), which are in the LEX
- library (-ll), the contents of which I cannot rename. So that doesn't
- work.
But it almost does -- Since "lex" produces C source, you can #define
set1yyless yyless, etc. before the lex output to be compiled, thereby
turning these selected reference back into calls to the shared library
functions. (I assume the lex library does not maintain internal state.)
mpledger@cti1.UUCP (Mark Pledger) (12/10/90)
There was a recent article in the C User's Journal about 3 months ago on this topic of using multiple parsers within the same executable. Somebody might want to check it out. -- Sincerely, Mark Pledger -------------------------------------------------------------------------- CTI | (703) 685-5434 [voice] 2121 Crystal Drive | (703) 685-7022 [fax] Suite 103 | Arlington, VA 22202 | mpledger@cti.com --------------------------------------------------------------------------
martin@mwtech.UUCP (Martin Weitzel) (12/10/90)
In article <14674@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes: >In article <1990Dec6.200944.13037@cs.columbia.edu>, leland@cs writes: >- I've tried this kludge: create a header file that re-#define's all the >- names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all >- those in set #2 to be named set2yyfoo. This has worked for me in the >- past, but won't in this particular instance because the generated >- code includes calls to yyless() and yywrap(), which are in the LEX >- library (-ll), the contents of which I cannot rename. So that doesn't >- work. > >But it almost does -- Since "lex" produces C source, you can #define >set1yyless yyless, etc. before the lex output to be compiled, thereby >turning these selected reference back into calls to the shared library >functions. (I assume the lex library does not maintain internal state.) Unfortunately things are more complicated. Here is an excerpt from `nm /usr/lib/libl.a' (UNIX Sys V): ---------------------------------------------------------------------- Symbols from /usr/lib/libl.a[reject.o]: Name Value Class Type Size Line Section reject.c | | file | | | | yyreject | 0|extern| int( )| 270| |.text yyracc | 272|extern| int( )| 154| |.text yyinput | 0|extern| | | | yyleng | 0|extern| | | | yytext | 0|extern| | | | yylsp | 0|extern| | | | yyolsp | 0|extern| | | | yyfnd | 0|extern| | | | yyunput | 0|extern| | | | yylstate | 0|extern| | | | yyprevious | 0|extern| | | | yyoutput | 0|extern| | | | yyextra | 0|extern| | | | yyback | 0|extern| | | | Symbols from /usr/lib/libl.a[yyless.o]: Name Value Class Type Size Line Section yyless.c | | file | | | | yyless | 0|extern| int( )| 107| |.text yyleng | 0|extern| | | | yytext | 0|extern| | | | yyunput | 0|extern| | | | yyprevious | 0|extern| | | | Symbols from /usr/lib/libl.a[yywrap.o]: Name Value Class Type Size Line Section yywrap.c | | file | | | | yywrap | 0|extern| int( )| 16| |.text ---------------------------------------------------------------------- The problem is not some internal state of these functions, but that they expect a number of external `yyfoo'-symbols, and there is no way to make them access the `right' ones without rewriting the functions. So, how hard would it be to rewrite them? The trivial case is `yywrap'. I hope AT&T doesn't sue me because of reverse engineering :-), but this function is a one-liner. yywrap() { return 1; } The two other functions (`yyless' and `yywrap') may have complicated interactions with a lot of globals, so the best solution is to avoid them and do manually what is required. This is simple in case of `yyless', since it is usually used to push back parts of `yytext' to the input stream. This can also be done by with `unput()'-macro in a loop (The library version of `yyless' does this via the `yyunput()'-function but this function simply calls `unput()' which may have been redefined - have a look into `lex.yy.c' to understand how things work together.) In addition the "original" `yyless' adjusts `yytext' and `yyleng' accordingly. The part that still worries me is the reference to `yyprevious' within `yyless'. To be sure, you should probably disassemble the library version of `yyless' - it's not that large. ``yyreject' should best be completly avoided because it plays with a lot of external symbols (the poster of the original question is lucky here, but others may understand this as a hint to use REJECT - which in turn calls yyreject() - only as a last resort). BTW: Another option is to have a common lexer for both sets of input symbols and use start conditions in lex to select the appropriate ones. It's a pitty that start conditions are insufficiently explained in the common documentation of lex (if they are mentioned at all). -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
polfer@b11.ingr.com (? Polfer) (12/11/90)
In article <1990Dec6.200944.13037@cs.columbia.edu> leland@cs () writes: >I have an application that requires two discrete uses of both lex and >yacc. [...] >I've tried this kludge: create a header file that re-#define's all the >names 'yyfoo' in lex/yacc set #1 to be named, say, set1yyfoo, and all >those in set #2 to be named set2yyfoo. Trying to redefine all of the symbols can be a real pain in the neck. Try to use "#define yy set1yy" or "#define yy set2yy" at the top of your different parser files, and then have the parser files include the lex code (catching all of the lex symbols). >[...] code includes calls to yyless() and yywrap(), which are in the LEX >library (-ll), the contents of which I cannot rename. So that doesn't >work. The LEX library (libl.a) includes default objects for yyless, yywrap, reject, main, and allprint (a routine used by the builtin debug code). Each of these routines are fairly trivial, and can be replaced by you with little effort. For example, the yywrap code in libl.a is equivalent to the following: int yywrap () { return(1); } yyless is about as tough, you simply use the unput routine to push the last few characters of yytext back onto the input stream. Note that there is a stated limit of 100 characters on unput (although I've observed 200). Also, remember that in your lex/yacc files, you must use yytext (not set1yytext) because the macro definition will take care of the translation. As far as unput, input, and output are concerned, LEX usually defines these as macros so there is no danger unless you have implemented the routines as functions and undefined the macros. Replacing the routines with your own will allow you to kill the link dependancy on libl.a. >I've also investigated GNU's replacements for these programs, flex and >bison. I haven't gotten to the bison documentation yet, but a >quick look at the flex man page implies that these programs still >have the hardcoded global symbol names. Flex and Bison can produce code for "isolated" modules (for lack of a better name), but Bison is from the Free Software Foundation so you need to follow their usual conventions (your code becomes free, etc). I'm not sure about Flex, but I didn't think it had the same restrictions (it's from a California University, Berkely maybe?). Anyway, hope the above helps! -- Dan Polfer ...uunet!ingr!b29!dap!dan (UUCP) Intergraph Corporation b29!dap!dan@ingr.com (Internet) Huntsville, Al (205) 730-6154
martin@mwtech.UUCP (Martin Weitzel) (12/11/90)
In article <332@cti1.UUCP> mpledger@cti1.UUCP (Mark Pledger) writes: >There was a recent article in the C User's Journal about 3 months ago >on this topic of using multiple parsers within the same executable. Somebody >might want to check it out. It was in the July 1990 issue (volume 8, number 7), but it doesn't adress the problem with routines from the lex-library. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83