[comp.emacs] GNU Emacs regular expression matching.

aj@zyx.UUCP (Arndt Jonasson) (04/27/87)

There is a bug in the regular expression code for GNU Emacs version 17.62:
The regular expression

		\(a\(bc\|bcd\)\)+k

doesn't match the string

		abcabcabcdk

which it should. The reason is that an optimization is made in the handling of
loops (+ and * constructs) which removes the loop's failure point if there is
no chance of its being used, namely when the beginning of the loop expression
and the expression after the loop cannot possibly match the same thing. Here
this is the case ('a' versus 'k'), but failure points are generated within
the loop, through the \| construct, so that the wrong failure point is removed,
causing the match to fail.

The quick fix is to emit a 'jump' instead of a 'maybe_finalize_jump'. The right
fix would be to detect whether any failure points can be generated within the
loop expression, and, if not, emit a 'maybe_finalize_jump', otherwise a 'jump'.

This may have been fixed in GNU Emacs version 18. In any case, there may be
enough version 17 users out there for whom this is of interest.
-- 
Arndt Jonasson, ZYX Sweden AB, Styrmansgatan 6, 114 54 Stockholm, Sweden
UUCP: ...!seismo!mcvax!enea!zyx!aj