nag@hpmtlx.HP.COM ($Diwakar_Nag) (05/30/90)
/ hpmtlx:comp.lang.c / eifrig@crabcake.cs.jhu.edu (Jonathan Eifrig) / 6:42 pm May 24, 1990 / > I have a not-so-basic question about Lex and its input specific- >ations. I want to write a simple parser, and in particular want to sup- >port nested comments. The basic idea to use Lex's start conditions: > .... > .... > Jack Eifrig > (eifrig@cs.jhu.edu) Just use BEGIN <state_name> eg . BEGIN COMMENT. Since BEGIN is a macro, and it refers to some global variables defined by lex, it is a good idea to define a function like InitScanner() which just calls BEGIN macro in the lex specs. file. InitScanner() can be used in a different file without worrying about global vars. used by BEGIN. -diwakar
pommerel@sp14.csrd.uiuc.edu (Claude Pommerell) (05/31/90)
Lex rules that do not begin with any starting condition <cond>... are valid for
ALL possible starting conditions. BEGIN 0 resets the Lex interpreter in
its initial
state where it has no explicit starting condition, so that only the
untagged rules
are valid.
There is a way to solve your problem, Jack. You know from the Lex manual
that every
text enclosed by lines starting with "%{" and "%}" is inserted literally
in the C
program generated by Lex. In fact, if you put this insertion text before
the first
line starting with "%%" (that is, in the definitions section of your Lex
source), it
gets inserted at the global scope of the C program, so this is perfect
to declare
externals and such.
However, if you put such an insertion text after "%%" (in the rules
section of your
Lex source), it gets inserted at the start of the body of the function
that performs
the lexical analysis, so you can use it to specify an initial condition.
This is my Lex source to skip nested C-like comments:
-------------------------------------------------------------------
%{
/* context in recursive C-like comments */
static int commentLevel;
%}
/* Starting conditions to support recursive C-like comments */
%START Text NewCCom InCCom EndCCom
%%
%{
/* Set the initial condition */
BEGIN Text;
commentLevel = 0;
%}
<Text>\/\* { commentLevel = 1; BEGIN InCCom; }
<InCCom>\/ { BEGIN NewCCom; }
<InCCom>\* { BEGIN EndCCom; }
<NewCCom>\* { ++commentLevel; BEGIN InCCom; }
<EndCCom>\/ { if (--commentLevel)
BEGIN InCCom;
else
BEGIN Text;
}
<NewCCom,EndCCom>[^\*\/] { BEGIN InCCom; }
<InCCom>[^\/\*] |
<NewCCom>\/ |
<EndCCom>\* ;
-------------------------------------------------------------------
All the other (true regular context-free) rules start with initial
condition <Text>.
This solution seems to be portable. I used it on Alliant, Convex, and
Cray computers
without ever having trouble with it. I will report the fix in case I
have problems
porting it further.
Claude Pommerell
(pommy@iis.ethz.ch)
ejp@bohra.cpg.oz (Esmond Pitt) (05/31/90)
In article <1990May30.174745.1161@csrd.uiuc.edu> pommu@iis.ethz.ch (Claude Pommerell) writes: > >There is a way to solve your problem, Jack. There are two even simpler ways. Instead of effectively changing the initial condition to <Text>, either: 1. Ensure each start-state is equipped with enough rules to handle any possible input, and, as the documentation does state, place all the unlabelled rules after all the labelled rules, and/or 2. Label all the rules you only want applied in the INITAL state with <INITIAL>, so they won't be applied as defaults in other states. Placing non-labelled rules before labelled rules is probably the single most common error in writing LEX scripts, even after 15 years. I don't know why. -- Esmond Pitt, Computer Power Group ejp@bohra.cpg.oz D
merlyn@iwarp.intel.com (Randal Schwartz) (05/31/90)
In article <116@bohra.cpg.oz>, ejp@bohra (Esmond Pitt) writes: | There are two even simpler ways. | | Instead of effectively changing the initial condition to <Text>, either: | | 1. Ensure each start-state is equipped with enough rules to handle any | possible input, and, as the documentation does state, place all the | unlabelled rules after all the labelled rules, and/or | | 2. Label all the rules you only want applied in the INITAL state with | <INITIAL>, so they won't be applied as defaults in other states. | | Placing non-labelled rules before labelled rules is probably the single | most common error in writing LEX scripts, even after 15 years. | | I don't know why. Because it is insufficient. It's not "first match", but "longest match" that determines rule triggering. ("first match" applies when the rules have the same length.) ... <FOO>a { something; } ... ab { something_else; } ... will match the "something_else" clause if "ab" is seen, even in state "FOO". I know... this bit me once. Just another lex hacker, -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/
ejp@bohra.cpg.oz (Esmond Pitt) (06/01/90)
In article <1990May31.161800.11133@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes: >In article <116@bohra.cpg.oz>, ejp@bohra (Esmond Pitt) [me] writes: >| >| Placing non-labelled rules before labelled rules is probably the single >| most common error in writing LEX scripts, even after 15 years. >| >| I don't know why. > >Because it is insufficient. It's not "first match", but "longest >match" that determines rule triggering. ("first match" applies when >the rules have the same length.) How does this explain why people put labelled states after non-labelled states, when the manual says not to? -- Esmond Pitt, Computer Power Group ejp@bohra.cpg.oz D
jeff@samna.UUCP (Jeff Barber) (06/01/90)
In article <1990May30.174745.1161@csrd.uiuc.edu> pommu@iis.ethz.ch (Claude Pommerell) writes: >However, if you put such an insertion text after "%%" (in the rules >section of your >Lex source), it gets inserted at the start of the body of the function >that performs >the lexical analysis, so you can use it to specify an initial condition. That's okay for this particular situation. But it won't work if your lex program is a lexical analyzer in a larger program. Your placement of the "BEGIN start-symbol;" after the first %% causes it to be included at the beginning of the yylex() function. This means that every time you call the lexical analyzer for a new token, its state gets reset. If your actions are designed to return a token to a parser (a yacc program, for example), they'll contain statements like: return TOK_IDENTIFIER; So, a better general purpose solution is to define some function after the *second* %% which contains the BEGIN statement and is called to initialize the analyzer. In your case, we can just create a main() function with the BEGIN in it (You've also got some unnecessary states in here, so I've simplified a bit): --------------------Cut Here---------------------------- %{ /* context in recursive C-like comments */ static int commentLevel = 0; %} /* Starting conditions to support recursive C-like comments */ %START Text InCCom %% \/\* { ++commentLevel; BEGIN InCCom; } <InCCom>\*\/ { if (--commentLevel == 0) BEGIN Text; } <Text>\*\/ { printf("Syntax error\n"); exit(1); } <InCCom>. | <InCCom>\n { /* Ignore stuff inside of comments everything else echoed by default. */ } %% main(ac, av) char **av; { /* Set the initial condition */ BEGIN Text; return yylex(); } --------------------Cut Here---------------------------- One last thing, it is possible to utter the name of the initial state ("INITIAL") so that if INITIAL were substituted for Text, no state initialization would be necessary (our main() function wouldn't be either; it would be supplied by the lex library [ cc ... -ll ]). (BTW, anybody know whether this is portable - I don't recall reading about this INITIAL state in the documentation; I just noticed it in the lex.yy.c output and discovered by experimentation that lex recognizes it in a <INITIAL> rule). I've directed followups out of comp.lang.c. Jeff
norbert@rwthinf.UUCP (Norbert Kiesel) (06/05/90)
Or even better, instead of using lex, use flex! It's - GNU coptleft (I think), - fully documented - has exclusive and inclusive start conditions (normal lex only has inclusive start conditions) - and is *MUCH* faster Just check your nearest ftp server. Normally it's stored under /pub/gnu. The latest version is 2.2. so long Norbert ******************************************************************************* * Norbert Kiesel Institut f. Informatik III NN NN KK KK * * RWTH Aachen Ahornstr. 55 NNN NN KK KK * * West Germany D-5100 Aachen NN N NN KK KK * * +49 241 80-7266 NN N NN KKKK * * NN N NN KKKK * * EUNET: norbert@rwthi3.uucp NN N NN KK KK * * USENET: ...!mcvax!unido!rwthi3!norbert NN N NN KK KK * * X.400: norbert@rwthi3.informatik.rwth-aachen.de NN NNN KK KK * *******************************************************************************