[comp.sources.bugs] flex bug with |'d patterns and $.

lee@uhccux.UUCP (Greg Lee) (05/15/88)

When patterns associated with several start states are or'd with "|",
a "$" in a pattern may cause an incorrect match by a flex-generated
program.

Here is some sample data:
----------------input data-------------
	none
It's as easy as one, two, three.
	first
It's as easy as one, two, three.
	second
It's as easy as one, two, three.
--------------end data-----------------

And here is a program that works correctly when flex'd:
------------program that works---------

%s	FIRST SECOND
%%

<SECOND>"two" |
<FIRST>"one"  |
"three."		printf("<changed>");

"first"		printf("State 1"); BEGIN FIRST;

"second"	printf("State 2"); BEGIN SECOND;

%%
main() {  yylex();  }
----------end of program that works-----

From the above input and the above program, comes this output:
-------output of program that works-----
	none
It's as easy as one, two, <changed>
	State 1
It's as easy as <changed>, two, <changed>
	State 2
It's as easy as one, <changed>, <changed>
----end of output of program that works-----

But change the program by adding "$" to the pattern for start
state 0:
---------program that doesn't work---------

%s	FIRST SECOND
%%

<SECOND>"two" |
<FIRST>"one"  |
"three."$		printf("<changed>");

"first"		printf("State 1"); BEGIN FIRST;

"second"	printf("State 2"); BEGIN SECOND;

%%
main() {  yylex();  }
----end of program that doesn't work-------

You'd think the output would be the same, but instead you get:
-----output of program that doesn't work---
	none
It's as easy as one, two, <changed>
	State 1
It's as easy as <changed>wo, <changed>
	State 2
It's as easy as one, <changed>hree.
--end of output of program that doesn't work---

(Lex doesn't understand this sort of construction at all.)

	Greg, lee@uhccux.uhcc.hawaii.edu

vern%lbl-pistachio@LBL-RTSG.ARPA (Vern Paxson) (05/23/88)

Greg Lee writes about flex mismatches regarding connecting multiple
rules with "|" when one of the rules includes the match-end-of-line
"$" meta-character.  This is indeed a bug and is documented as such
in the flex manual entry ("$" is an instance of trailing context).
I'm hoping to have all trailing context restrictions removed with
the next flex release.

		Vern

	Vern Paxson				vern@lbl-csam.arpa
	Real Time Systems			ucbvax!lbl-csam.arpa!vern
	Lawrence Berkeley Laboratory		(415) 486-6411