[comp.sources.bugs] flex reject bug

lee@uhccux.UUCP (Greg Lee) (05/07/88)

When a flex-created program is asked to reject a string
matched by a pattern ending in $, it appears that one too
few characters is pushed back, in some circumstances.
The following demonstrates the problem -- the program "test"
loops endlessly pushing back and accepting the last character
of a line.
	Greg, lee@uhccux.uhcc.hawaii.edu

---------contents of file test.l-------
%%

.*$	{	printf("%s REJECT\n", yytext);
		REJECT;
	}

.*$	{	printf("%s ACCEPT\n", yytext);
	}

. |
\n	;

%%

main()
{	yylex();
}
----------------eof--------------------

---------contents of file xtext--------
Text ending with X
----------------eof--------------------
(There is a newline after the "X".)

Test program made with flex -r gives following results.

Output of "test <xtext":
---------begin test output-------------
Text ending with X REJECT
Text ending with  ACCEPT
X REJECT
 ACCEPT
X REJECT
 ACCEPT
X REJECT
 ACCEPT
and so on, ad infinitum.
---------------------------------------

Test program made with lex gives following results.

Output of "test <xtext":
---------begin test output-------------
Text ending with X REJECT
Text ending with X ACCEPT

 REJECT

 ACCEPT
----------end test output--------------

vern%lbl-pistachio@lbl-rtsg.arpa (Vern Paxson) (05/07/88)

The problems pointed out by Greg Lee concerning the input:

> %%
> 
> .*$     {       printf("%s REJECT\n", yytext);
> 		REJECT;
> 	}
> 
> .*$     {       printf("%s ACCEPT\n", yytext);
> 	}
> 
> . |
> \n      ;
> 
> %%

looping indefinitely are half-bug and half-feature.  The bug part
is fixed by the patch (to the files "flex.skel" and "flexskeldef.h")
at the end of this message.  The feature part is that with this set
of rules, the rule ".*$" \should/ loop forever.  On an input like
"foobar\n" the first rule will match seven characters of text (since
trailing context is included in the length of the text matched) and
then put back the newline.  Once the input "foobar" has been accepted,
the buffer contains "\n".  This is matched by rules # 1, 2, and 4,
so the first rule is again used, this time with an empty yytext.
Since no text is consumed, the scanner keeps going through this sequence
indefinitely.

I don't know what lex does to avoid this problem, though I made a test and
it's not doing what I'd expect, which is eating up the "\n" input with
the newline rule.  As far as flex is concerned, I view the looping as
a bug in the set of input rules.  The fix is to rearrange the rules:

	%%

	\n	;

	.*$     {       printf("%s REJECT\n", yytext);
			REJECT;
		}

	.*$     {       printf("%s ACCEPT\n", yytext);
		}

	.	;

	%%


		Vern



*** Release-1.0/distribution/flex.skel	Sun Apr 10 21:18:34 1988
--- flex.skel	Fri May  6 22:48:52 1988
***************
*** 28,33 ****
--- 28,36 ----
  YY_DECL
      {
      int yy_n_chars, yy_lp, yy_iii, yy_buf_pos, yy_act;
+ #ifdef FLEX_REJECT_ENABLED
+     int yy_full_match;
+ #endif
  
  %% user's declarations go here
  
***************
*** 41,46 ****
--- 44,55 ----
      goto get_next_token;
  
  do_action:
+ 
+ #ifdef FLEX_REJECT_ENABLED
+     /* remember matched text in case we back up due to trailing context */
+     yy_full_match = yy_c_buf_p;
+ #endif
+ 
      for ( ; ; )
  	{
  	YY_DO_BEFORE_ACTION
***************
*** 55,64 ****
  case YY_NEW_FILE:
  break; /* begin reading from new file */
  
- case YY_DO_DEFAULT:
- YY_DEFAULT_ACTION;
- break;
- 
  case YY_END_TOK:
  return ( YY_END_TOK );
  
--- 64,69 ----
***************
*** 239,252 ****
  		}
  	    }
  
! 	/* if we got this far, then we didn't find any accepting
! 	 * states
! 	 */
! 
! 	/* so that the default applies to the first char read */
! 	++yy_c_buf_p;
! 
! 	yy_act = YY_DO_DEFAULT;
  	}
  	}
  
--- 244,250 ----
  		}
  	    }
  
! 	YY_FATAL_ERROR( "no match in flex scanner - possible NULL in input" );
  	}
  	}
  
*** Release-1.0/distribution/flexskeldef.h	Sun Apr 10 21:07:36 1988
--- flexskeldef.h	Fri May  6 22:51:39 1988
***************
*** 33,38 ****
--- 33,39 ----
  #define REJECT \
          { \
          YY_DO_BEFORE_SCAN; /* undo effects of setting up yytext */ \
+         yy_c_buf_p = yy_full_match; /* restore possibly backed-over text */ \
          ++yy_lp; \
          goto find_rule; \
          }