lee@uhccux.UUCP (Greg Lee) (05/07/88)
When a flex-created program is asked to reject a string matched by a pattern ending in $, it appears that one too few characters is pushed back, in some circumstances. The following demonstrates the problem -- the program "test" loops endlessly pushing back and accepting the last character of a line. Greg, lee@uhccux.uhcc.hawaii.edu ---------contents of file test.l------- %% .*$ { printf("%s REJECT\n", yytext); REJECT; } .*$ { printf("%s ACCEPT\n", yytext); } . | \n ; %% main() { yylex(); } ----------------eof-------------------- ---------contents of file xtext-------- Text ending with X ----------------eof-------------------- (There is a newline after the "X".) Test program made with flex -r gives following results. Output of "test <xtext": ---------begin test output------------- Text ending with X REJECT Text ending with ACCEPT X REJECT ACCEPT X REJECT ACCEPT X REJECT ACCEPT and so on, ad infinitum. --------------------------------------- Test program made with lex gives following results. Output of "test <xtext": ---------begin test output------------- Text ending with X REJECT Text ending with X ACCEPT REJECT ACCEPT ----------end test output--------------
vern%lbl-pistachio@lbl-rtsg.arpa (Vern Paxson) (05/07/88)
The problems pointed out by Greg Lee concerning the input: > %% > > .*$ { printf("%s REJECT\n", yytext); > REJECT; > } > > .*$ { printf("%s ACCEPT\n", yytext); > } > > . | > \n ; > > %% looping indefinitely are half-bug and half-feature. The bug part is fixed by the patch (to the files "flex.skel" and "flexskeldef.h") at the end of this message. The feature part is that with this set of rules, the rule ".*$" \should/ loop forever. On an input like "foobar\n" the first rule will match seven characters of text (since trailing context is included in the length of the text matched) and then put back the newline. Once the input "foobar" has been accepted, the buffer contains "\n". This is matched by rules # 1, 2, and 4, so the first rule is again used, this time with an empty yytext. Since no text is consumed, the scanner keeps going through this sequence indefinitely. I don't know what lex does to avoid this problem, though I made a test and it's not doing what I'd expect, which is eating up the "\n" input with the newline rule. As far as flex is concerned, I view the looping as a bug in the set of input rules. The fix is to rearrange the rules: %% \n ; .*$ { printf("%s REJECT\n", yytext); REJECT; } .*$ { printf("%s ACCEPT\n", yytext); } . ; %% Vern *** Release-1.0/distribution/flex.skel Sun Apr 10 21:18:34 1988 --- flex.skel Fri May 6 22:48:52 1988 *************** *** 28,33 **** --- 28,36 ---- YY_DECL { int yy_n_chars, yy_lp, yy_iii, yy_buf_pos, yy_act; + #ifdef FLEX_REJECT_ENABLED + int yy_full_match; + #endif %% user's declarations go here *************** *** 41,46 **** --- 44,55 ---- goto get_next_token; do_action: + + #ifdef FLEX_REJECT_ENABLED + /* remember matched text in case we back up due to trailing context */ + yy_full_match = yy_c_buf_p; + #endif + for ( ; ; ) { YY_DO_BEFORE_ACTION *************** *** 55,64 **** case YY_NEW_FILE: break; /* begin reading from new file */ - case YY_DO_DEFAULT: - YY_DEFAULT_ACTION; - break; - case YY_END_TOK: return ( YY_END_TOK ); --- 64,69 ---- *************** *** 239,252 **** } } ! /* if we got this far, then we didn't find any accepting ! * states ! */ ! ! /* so that the default applies to the first char read */ ! ++yy_c_buf_p; ! ! yy_act = YY_DO_DEFAULT; } } --- 244,250 ---- } } ! YY_FATAL_ERROR( "no match in flex scanner - possible NULL in input" ); } } *** Release-1.0/distribution/flexskeldef.h Sun Apr 10 21:07:36 1988 --- flexskeldef.h Fri May 6 22:51:39 1988 *************** *** 33,38 **** --- 33,39 ---- #define REJECT \ { \ YY_DO_BEFORE_SCAN; /* undo effects of setting up yytext */ \ + yy_c_buf_p = yy_full_match; /* restore possibly backed-over text */ \ ++yy_lp; \ goto find_rule; \ }