[net.unix-wizards] Last try: lex help

dfz (12/02/82)

This is an answer to a "lex" question asked on the net-wizards newsgroup.
Please excuse the redundancy: this is my last try!
To: Randy Bentson
Address: harpo!decvax!cca!hplabs!hao!csu-cs!bentson@sri-unix

Lex has the ability to allow the user to insert
a user-tailored set of states in front of the lex-generated
finite state machine.  These states can be used to subdivide
the lex-generated FSA into as many independent FSAs
as the user desires.  This trick comes in handy when performing
lexical analysis upon an input which contains delimited sections
in which the rules of analysis are very different from each other
(e.g. comments or strings vs. C language text).  The reason you
don't see this treatment used for comments in most C compilers
is that comments are stripped out by the C preprocessor, rather
than by the lexical analyzer of the C parser.

The technique for doing this is to use the "%start" directive
(see the lex manual).  I will show an example:
---------------------------------------------------
%start normal comment
%%
<normal> ...	{ code for expression }
<normal> ...	{ code for expression }
...
...
...
<normal>"/*"	{ BEGIN comment; }
<comment>"*/"	{ BEGIN normal; }
<comment>\n	{}
<comment>.	{}
---------------------------------------------------

In the above example, the lexical rules are divided into
two independent sets of rules; the rules for <comment> are
entirely independent of those for <normal>.  Furthermore,
one can jump from one state (e.g. <normal>) to another (e.g. <comment>)
as part of the action taken upon recognizing an expression in a state.
Thus one has effectively created two independent FSAs.  Lex uses
some sort of push-down stack, so that when you enter another state
(using the BEGIN macro) and then return, you current state is preserved.
All pattern recognition done in the other state effectively removes
the text it recognized from the view of the current state.
This jumping (I should say pushing and popping) can be nested
and even done recursively, although this is not usually needed
for most applications.  Below is a second example, which uses
three separate states to recognize normal text, comments, and strings:
---------------------------------------------------
%start normal comment string
%%
<normal> ...	{ code for expression }
<normal> ...	{ code for expression }
...
...
...
<normal>"/*"	{ BEGIN comment; }
<comment>"*/"	{ BEGIN normal; }
<comment>\n	{}
<comment>.	{}
<normal>\"	{ BEGIN string; }
<string>\"	{ BEGIN normal; }
<string> ...	{ code for expression }
...
...
...
---------------------------------------------------
I hope I have answered your question.

			Dave Ziffer
			Bell Labs in Naperville, Illinios
			...!decvax!harpo!iwlc7!dfz

tihor (12/03/82)

#R:iwlc7:-10800:cmcl2:11400003:000:284
cmcl2!tihor    Dec  2 14:45:00 1982

It is not clear from your message how one tells lex that one wishes to
push the current state and begin a new invokation of an automaton 
already on the stack.  The obvious syntax BEGIN newautomaton; seems
to be used to pop back to a previous version of newautomaton already
stacked.