[comp.lang.c] Some more hints for Lex

martin@mwtech.UUCP (Martin Weitzel) (06/01/90)

In article <1990May30.174745.1161@csrd.uiuc.edu> pommu@iis.ethz.ch (Claude Pommerell) writes:
>Lex rules that do not begin with any starting condition <cond>... are valid for
>ALL possible starting conditions. BEGIN 0 resets the Lex interpreter in
>its initial
>state where it has no explicit starting condition, so that only the
>untagged rules
>are valid.

Yes, this is a bit confusing for the newcomer (and not so well covered
by the FM). You may look at it as follows: Assume you have several named
start conditions (say S1, S2, ... Sn). There is allways *one* additional
condition that you don't have to define (there is sometimes a #define
for it as `INITIAL', but that's also an undocumented feature and so
I'll call it S0 here). All *untagged* rules could be read then as if
they were tagged with <S0,S1,S2,...Sn>.

So the untagged rules are obviosly valid for all start conditions *plus*
the one special condition S0 that you can trigger with `BEGIN 0'. (Note
that what Claude has written is absolutely correct - I just supplied an
altenate view that may be helpful for understanding.)

[About literally inserted text, enclosed in lines with `%{' and `%}'.]

As FM says, all indented lines also go to "lex.yy.c", but I've found
by experience that %{ ... %} is a little more reliable, but read on.

>However, if you put such an insertion text after "%%" (in the rules
>section of your
>Lex source), it gets inserted at the start of the body of the function
>that performs
>the lexical analysis, so you can use it to specify an initial condition.

Yes, IMHO also undocumented but works allright. Furthermore, C-source
*after* the first `%%' but *before* the first rule ist copied exactly
behind the last local variable defined in yylex(). So you can define
some more locals for yylex() in this place. I found this by accident
when playing a little with lex, but it seems very useful. First I
only used indented lines here to achieve the effect of copying to
the generated source, but at least with one version of lex I know, I
had to resort to %{ ... %} to make this work with more than one
statement.

[Lex source to skip nestet comments]

As Claude allreay inserted his excerpts of code here, I don't need to
insert mine (which handles strings and char const-s as well), but if
anybody should be interrested, I'll mail him or her mine. (On the
danger of getting religious :-), mine doesn't support nested comments.)

As an additional hint: It is sometime necessary to preserve a start
condition over returns of yylex(). This can easily be done with a static
variable in yylex(). In all the implementations I know, start conditions
are simply int-s, so an int-variable will do. You must only remenber to
set your copy of the start condition with every `BEGIN'. Look at the
following code fragment to understand this:

----------------------------
.....
%START INITIAL_START_CONDITION, S1, S2, ...(some more start conditions)
.....
%%
%{
	static int saved_start_condition = INITIAL_START_CONDITION;
	BEGIN saved_start_condition;
%}
some-rule	{
		.... stuff ....
		BEGIN S1; saved_start_condition = S1;
		return (whatever);
}
some-rule	{
		.... stuff ....
		BEGIN S2; saved_start_condition = S2;
		return (whatever);
}
%%
.....
----------------------------
Got the idea?

BTW: It would seem natural and less error prone, to write a single
statement
		BEGIN saved_start_condition = Sx;

but given the way `BEGIN' is defined, this won't work if you don't put
some braces around the assignment. If you really need much of this stuff,
it's easy to add some more #defines that automatically change and save
the start condition.

With some #defines (and using start conditions), you can increase the
power of lex considerably. Below you'll find an excerpt from a program
that is somewhat too large to post here. In principal it implements
(explicitly) an automaton with one state that can be saved and
restored(%). With a little more sophisticated #define-s you could
implement a stack to save and restore more than one start condition.

----------------------------
%start PART1
%start LHS RHS ACT CMT STR ALTS SKIP
%start PART3
%%
%{
	static struct yysvf *s;
#	define NEWSTATE s=BEGIN
#	define RESTORE yybgin=s
	/* BEGIN sets a new start condition and saves the old,
	 * NEWSTATE sets a new start condition without saving,
	 * and RESTORE restores the saved condition.
	*/
	NEWSTATE PART1;
%}
....
<RHS,ALTS>"|"		{ .................	NEWSTATE RHS;	}
<RHS,ALTS>";"		{ .................	NEWSTATE LHS;	}
<LHS,RHS,ALTS>"/*"	{ .................	BEGIN SKIP;	}
<SKIP>"*/"		{ .................	RESTORE;	}
<ACT>\"			{ .................	NEWSTATE STR;	}
....
%{
#	undef NEWSTATE
#	undef RESTORE
%}
----------------------------

%: BTW I use this program to reformat yacc sources, so that they can
be more easily processed with awk to extract useful information about
the grammar etc. If anybody is interrested in this project let me
know, I'm willing to share.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83