martin@mwtech.UUCP (Martin Weitzel) (06/02/90)
In article <116@bohra.cpg.oz> ejp@bohra.cpg.oz.au (Esmond Pitt) writes: >In article <1990May30.174745.1161@csrd.uiuc.edu> pommu@iis.ethz.ch (Claude Pommerell) writes: >> [about changing start conditions at entry to yylex()] > >There are two even simpler ways. `Simpler' mostly depends on your view and expectations ... > >Instead of effectively changing the initial condition to <Text>, either: > >1. Ensure each start-state is equipped with enough rules to handle any >possible input, and, as the documentation does state, place all the >unlabelled rules after all the labelled rules, and/or Sounds not simpler to me. > >2. Label all the rules you only want applied in the INITAL state with ><INITIAL>, so they won't be applied as defaults in other states. This trades off one undocumented feature (stuff *after* the first '%%' line and *before* the first rule) against another undocumented feature. But to be fair: Strictly following some man pages for lex nearly every a nontrivial lex applications would use some undocumented feautures. I just looked up for the purpose of writing this: 1.) SVID (1986) 2.) XPG3 (1989) 3.) ISC Programmers Reference Manual 4.) ISC Programmers Guide (1988) .FLAME ON Not any single mentioning of start conditions in (4) at all (neither the syntax in rules, nor the special action BEGIN). Worse in some example the the advice is for a lex program to #define BEGIN 1 (believe it or not) as a `good programming style' for returning tokens. This finally reveals that the author can have never heard something about start conditions. The example of the lex program has a line: begin return (BEGIN); Please ISC, could you send the person who has written this guide to a lex+yacc course (BTW: I'm teaching some :-)) before a revised version is produced. Well, there is mentioned that yacc contains a feature to supply token-defines, but it's bad practice to give advices that turn out to be not only unnecessary, but dangerous too. The only advice in this guide that isn't near to worthless is to look into the paper about Lex written by Mike Lesk. (You were better advised printing this paper in the guide than the section that's in by now.) An other advice there is to check out the reference manual (3). Be aware: IF YOU TRY TO USE LEX WITH THIS REFERENCE, YOU WILL BE ABSOLUTELY LOST. (Better save your time trying, rather end work soon, go out and have a nice evening - or, again, look for the Lesk-paper.) Ehhm, we are talking about start conditions. The reference manual is *very* silent about them - in fact no mentioning. On the other hand: The author was quite careful to mention, that the "-r option is not yet fully operational". (What this option tells is that lex should produce RATFOR source instead of C. Oh, how many times I needed that an wondered why it just didn't "fully" work - but good news, not "yet", that is, the day will come when I finally can switch from C to RATFOR :-).) But before we beat ISC too much: I suppose they took what they got from somewhere (AT&T?) and only made it a little worse. Let's look at (1) - and don't say a reference dated from 1986 is too old today: The stuff we are talking about is *much* longer in lex. SVID does the good job of printing a table which shows the regular expression syntax for lex rules (it's quite similar as the "extended regular expressions" of egrep and awk, but there are some differences). In this table you'll find the syntax of start conditions, but not the least mentioning of them and BEGIN in the rest of the text. So, if you read the table you probably think you must be stupid, if you don't know lex and hence you don't understand what <s>r the occurence of the regular expression r only when the program is in start condition (state) s shall tell you. (Again, start conditions or states and the special action BEGIN is not mentioned anywhere else in the section). Finally to (2), which seems a not so bad re-work of the SVID in other areas. A quick scan thru the lex section reveals that it is quite similar to (1), but the table with the regular expression syntax is ommitted in favor of a difference list to extended regular expressions. (BTW: The difference list is not complete.) The same sentence concerning the <s>r-Syntax as in (1) appears but again nothing about start conditions, states, and BEGIN in the rest of the section. .FLAME OFF Hello AT&T, anybody listening: If you haven't revised the manuals recently, please do a complete rewrite of the lex section but find somebody as author who has sufficient experience with lex+yacc in non-trivial applications *and* who can explain understandable to mortals. (From many publications I know such people are working at AT&T - I'm volunteering doing a proof-read.) > >Placing non-labelled rules before labelled rules is probably the single >most common error in writing LEX scripts, even after 15 years. > >I don't know why. After reading the above, you probably know why many novices struggle with lex. Concerning your the problem, here are the THREE BIG DISAMBIGUATING RULES 1) leftmost 2) longer 3) first in source which tells us: Take the input stream and write it down in one line from left to right: Then, in case two rules might match some part of the input stream, lex chooses the rule that matches more to the left, that matches the longer regular expression or that matches the regular expression which appears first in (lex) source, with (1) having higher priority than (2) having higher priority than (3). This is well choosen, because it enables us to do the following: "if" { ... action for keyword if ... } [a-z]+ { ... action for identifier ... } which triggers the second for " fif " (because of 1), as well as for " iff " (because of 2), but the first for " if " (because of 3). Start conditions are no exceptions from this rule! -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83