[comp.compilers] Lex and Start Conditions

eifrig@server.cs.jhu.edu (Jonathan Eifrig) (01/24/91)

Here's a Lex usage question concerning start conditions that I encountered:

I want to have my Lex-generated scanner strip out comments.  Conceptually,
this is easy to do with start conditions.  For example:

%Start	NORM COM

%%

<NORM>{natnum}+		{return(ID);}

<NORM>"/*"		{BEGIN COM;}

<COM>"*/"		{BEGIN NORM;}

<COM>[^]		{ /* Do nothing */}



The idea is to have a simple little rule that eats a single character up and
discards it while searching for the close-comment symbol; thereby stripping
the comment out.  So far, so good.

This works great, except that the automaton has to be started up in the
correct (meta) state (in this case, NORM).  What is the best way to do this?
I've found two ways of doing this, neither of which is very pretty.

Option 1: Kick the automaton into the NORM state manually before parsing.
This basically involves having a main() like:

main()
{
	...
	BEGIN NORM;
	yyparse();
}

Unfortunately, this requires importing the BEGIN macro into the main program
file, which is sort of unappealing.

Option 2:  Use the undocumented INITIAL start condition.  Substitute INITIAL
for NORM above.  Unfortuately, INITIAL isn't a "real" start condition, so we
can't just BEGIN INITIAL, but have to use BEGIN 0, which is very cheezy.  In
addition, I have no idea how portable it is.  Using "undocumented features"
seems like a bad idea to me.

Does anyone have any other suggestions?

Jack Eifrig					eifrig@cs.jhu.edu
[You could use exclusive states in flex. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

bliss@sp64.csrd.uiuc.edu (Brian Bliss) (01/25/91)

To make you lexer start out in a non-default start condition,
just write a routine lex_init (), and put it in the routines
section of the lex file:

void lex_init () { BEGIN STATE; }

then call lex_init () from the main program before parsing.
these's no reason it hash to be in the main program.

bb

P.S.  You wouldn't want to use the undocumented INITIAL start condition.
Even if you are in the COMMENT start condition, rules which have
INITIAL start condition will still be recognized.  Then,
lex will try to break up the inside of your comments into tokens
before it realizes that the entire comment is a longer token itself,
and chooses this based on the fact that it is longer.  all of this 
makes the lexer take up much more space: a simple pascal lexer I wrote,
using the INITIAL start condition for normal code (idents, etc),
produced a 300K object file, but when I added the NORMAL start condition
to every rule which had the INITIAL start condition, the object
shrunk to 30K!

bb
[From bliss@sp64.csrd.uiuc.edu (Brian Bliss)]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

peter@objy.com (Peter Moore) (01/25/91)

In article <eifrig.664691313@voronoi.cs.jhu.edu>, eifrig@server.cs.jhu.edu (Jonathan Eifrig) writes:
.....
|> This works great, except that the automaton has to be started up in the
|> correct (meta) state (in this case, NORM).  What is the best way to do this?
This (along with the documentation) are my two biggest complaints with lex.
Here is my solution (with minor flame intact).

	Peter Moore
	peter@objy.com

. |
\n		{
	    /* 
	     * Lex has the misfeature that all unlabeled rules are 
	     * always active.  This makes interference between labeled 
	     * and unlabeled rules a big problem.  To overcome this, 
	     * every rule is labeled, with the normal state being 
	     * NORMAL_MODE.  To get ourselves into NORMAL_MODE at the 
	     * start, we use this rule that matches any character.  It 
	     * simply switches to NORMAL_MODE and pushes back the 
	     * character that triggered it.  Anytime after, this rule 
	     * is shielded by .|\n rules in all the other modes.
	     */
  
	    unput(yytext[0]);
	    START_MODE(NORMAL_MODE);
	    NLSTATE;  /* otherwise beginning of line rules won't work */
	}
[Should work, but I still like exclusive start states in flex. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.