jimv@radix (Jim Valerio) (09/19/87)
I've been debugging a problem parsing dates in MH mail, and have narrowed it down to the point where it looks like lex is not doing the right thing with start conditions. I'm having this problem on a System Vr3 80386 system, and have reproduced the problem on a 4.3bsd Vax system. Here's a trivial lex file that illustrates the problem I am having: %{ #undef input char *inpstr = "01"; input() { return *inpstr++; } yywrap() { return 1; } main() { yylex(); } %} %start Z %% 0 { printf("0\n"); BEGIN Z; } 1 { printf("shouldn't get here!\n"); } <Z>1 { printf("1\n"); } . { printf("Error! (%c)\n", yytext[0]); } When I compile and run this, I get the "shouldn't get here!" message from processing the the "1" digit in the fixed input string "01". My understanding of lex leads me to believe that I should instead get the output lines "0" and then "1". Am I missing something obvious here, or is there a bug in lex? If it is a bug, can someone suggest a reasonable workaround? -- Jim Valerio {verdix,intelca!mipos3,intel-iwarp.arpa}!omepd!radix!jimv
stan@dnlunx.UUCP (09/21/87)
In article <6@radix>, jimv@radix (Jim Valerio) writes: > [...] it looks like lex is not doing the right thing with > start conditions [...] > [...] > Am I missing something obvious here, or is there a bug in lex? > If it is a bug, can someone suggest a reasonable workaround? First of all, when lex tries to match some part of the input string against the specified rules, it will take the rule with the longest match. When more than one rule matches, lex decides in favor of the rule which is first encountered in the lex file. After the `BEGIN Z;' statement all rules starting with `<Z>' are active, together with all rules without a start condition. -------- ---- --- ----- ------- - ----- --------- Your example should work if you swap the rules in which the `1' is matched, resulting in: %% 0 { printf("0\n"); BEGIN Z; } <Z>1 { printf("1\n"); } 1 { printf("shouldn't get here!\n"); } . { printf("Error! (%c)\n", yytext[0]); } Stan ----
ejp@ausmelb.oz.au (Esmond Pitt) (09/22/87)
In article <6@radix> jimv@radix.UUCP (Jim Valerio) writes: >I've been debugging a problem parsing dates in MH mail, and have narrowed it >down to the point where it looks like lex is not doing the right thing with >start conditions. 1. Rules with start-conditions are in effect only within that start-condition. 2. Rules without start-conditions are always in effect. 3. In the event of two rules matching the same text, the first occurring rule is chosen. 4. Therefore, rules with start-conditions must precede rules without them. If you put your rules in this order: <Z>1 {blah} 1 {blah} you will get the desired result. There ARE bugs in lex: 1. The metacharacters ^ and $ only work at the literal beginning and end respectively of a rule; i.e. they do not work within () brackets, nor can they be put within a named rule. For example, all occurrences of ^ in the below represent the character '^', not the beginning of the line. FRED ^FRED %% FRED printf("FRED = %s\n",yytext); (^JOE) printf("JOE = %s\n",yytext); 2. Constructions like x+/xy: if you input xxxy to this it will return xxx, not x as it should. -- Esmond Pitt, Austec International Ltd ...!seismo!munnari!ausmelb!ejp,ejp@ausmelb.oz.au D
jbuck@epimass.EPI.COM (Joe Buck) (09/22/87)
In article <6@radix> jimv@radix.UUCP (Jim Valerio) writes: >I've been debugging a problem parsing dates in MH mail, and have narrowed it >down to the point where it looks like lex is not doing the right thing with >start conditions. I'm having this problem on a System Vr3 80386 system, and >have reproduced the problem on a 4.3bsd Vax system. Your problem is that a pattern without a start condition matches regardless of start condition. So with the following lex code: > %start Z > %% > 0 { printf("0\n"); BEGIN Z; } > 1 { printf("shouldn't get here!\n"); } > <Z>1 { printf("1\n"); } and the input "01", the first "1" rule matches even though you are in the Z state, because if no state is given in the rule it always matches. Solution: reverse the order, to 0 { printf("0\n"); BEGIN Z; } <Z>1 { printf("1\n"); } 1 { printf("shouldn't get here!\n"); } since the first rule found is the one that is used. -- - Joe Buck {uunet,ucbvax,sun,decwrl,<smart-site>}!epimass.epi.com!jbuck Old internet mailers: jbuck%epimass.epi.com@uunet.uu.net
john@frog.UUCP (John Woods, Software) (09/23/87)
In article <6@radix>, jimv@radix (Jim Valerio) writes: > %start Z > %% > 0 { printf("0\n"); BEGIN Z; } > 1 { printf("shouldn't get here!\n"); } > <Z>1 { printf("1\n"); } > . { printf("Error! (%c)\n", yytext[0]); } > Two keys from the LEX documentation: (1) Rules with no start condition are always active. (2) When two rules match the same input string, the first is preferred. Try changing your rules to %start Z %% 0 { printf("0\n"); BEGIN Z; } <Z>1 { printf("1\n"); } 1 { printf("shouldn't get here!\n"); } . { printf("Error! (%c)\n", yytext[0]); } -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, ...!mit-eddie!jfw, jfw@eddie.mit.edu Maybe it's the sound of a WET RAG hitting a smooth WEASEL!
chris@mimsy.UUCP (Chris Torek) (09/23/87)
>In article <6@radix> jimv@radix.UUCP (Jim Valerio) writes: >>... it looks like lex is not doing the right thing with start conditions. In article <1516@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: >Your problem is that a pattern without a start condition matches >regardless of start condition. So with the following lex code: >> 0 { printf("0\n"); BEGIN Z; } >> 1 { printf("shouldn't get here!\n"); } >> <Z>1 { printf("1\n"); } >... Solution: reverse the order, to > 0 { printf("0\n"); BEGIN Z; } > <Z>1 { printf("1\n"); } > 1 { printf("shouldn't get here!\n"); } Or use <INITIAL>1 { printf("shouldn't get here!\n"); } <Z>1 { printf("1\n"); } Lex begins in state INITIAL; if there are no `%state's or BEGIN directives, it stays that way forever. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
ejp@ausmelb.oz.au (Esmond Pitt) (09/29/87)
In article <6@radix>, jimv@radix (Jim Valerio) writes: > [...] it looks like lex is not doing the right thing with > start conditions [...] > [...] > Am I missing something obvious here, or is there a bug in lex? > If it is a bug, can someone suggest a reasonable workaround? [1000's of people pointed out lex's rule-selection rules] Another thing to mention is that rules can be made to be active *only* in the initial condition, by using the start-condition name INITIAL, i.e.: <Z>blah grumble; <INITIAL>blah mumble; -- Esmond Pitt, Austec International Ltd ...!seismo!munnari!ausmelb!ejp,ejp@ausmelb.oz.au D