[comp.bugs.sys5] egrep botch

henry@utzoo.UUCP (Henry Spencer) (10/13/87)

The following bug appears to be present in egrep (the Bell/AT&T one, not
the comp.sources.unix one) on all extant versions of Unix.  Certainly it
is present in V7, 4.2BSD, Sunnix 3.2, and 386 System V Release 3.

	% echo 0 >foo
	% egrep '0.' foo
	% egrep '^0.' foo
	0
	%

Adding the "^" seems to convince egrep that "." can legitimately match
newline.  Changing "." to "[^x]" does not change the behavior.  Some quick
tests suggest that "$" does not produce a similar anomaly.  I can find no
statement in any manual that gives even a shred of legitimacy to this.

Unfortunately, I lack the time to root around in the somewhat obscure
innards of egrep to discover the exact cause and fix.  Anybody out there
feeling ambitious? :-)
-- 
"Mir" means "peace", as in           |  Henry Spencer @ U of Toronto Zoology
"the war is over; we've won".        | {allegra,ihnp4,decvax,utai}!utzoo!henry

ado@elsie.UUCP (Arthur David Olson) (10/23/87)

> The following bug appears to be present in egrep (the Bell/AT&T one, not
> the comp.sources.unix one) on all extant versions of Unix.  Certainly it
> is present in V7, 4.2BSD, Sunnix 3.2, and 386 System V Release 3.
> 
> 	% echo 0 >foo
> 	% egrep '0.' foo
> 	% egrep '^0.' foo
> 	0
> 	%
> 
> Adding the "^" seems to convince egrep that "." can legitimately match
> newline.  Changing "." to "[^x]" does not change the behavior.  Some quick
> tests suggest that "$" does not produce a similar anomaly.

The change below cures the problem, and even "looks right."
Your line numbers will vary.

*** 3.2/egrep.y	Fri Oct 23 16:34:20 1987
--- 3.3/egrep.y	Fri Oct 23 16:34:24 1987
***************
*** 303,309 ****
  					if ((k = name[curpos]) >= 0)
  						if (
  							(k == c)
! 							| (k == DOT)
  							| (k == CCL && member(c, right[curpos], 1))
  							| (k == NCCL && member(c, right[curpos], 0))
  						) {
--- 303,309 ----
  					if ((k = name[curpos]) >= 0)
  						if (
  							(k == c)
! 							| (k == DOT && c != '\n')
  							| (k == CCL && member(c, right[curpos], 1))
  							| (k == NCCL && member(c, right[curpos], 0))
  						) {
--
Bugs is a trademark of Warner Brothers and Volkswagen.
-- 
ado@vax2.nlm.nih.gov	ADO, VAX, and NIH are trademarks of Ampex and DEC.

henry@utzoo.UUCP (Henry Spencer) (11/02/87)

> The change below cures the problem, and even "looks right."

Unfortunately, it doesn't cure the whole problem, since changing "." to
"[^x]" still provokes the bug.
-- 
PS/2: Yesterday's hardware today.    |  Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry

ado@elsie.UUCP (Arthur David Olson) (11/09/87)

In article <8872@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
> > The change. . .cures the problem, and even "looks right."
> 
> Unfortunately, it doesn't cure the whole problem, since changing "." to
> "[^x]" still provokes the bug.

True enough.  Newline shouldn't match "."; it also shouldn't match
"[anything]" or "[^anything]".  This can be reflected by moving the test against
newline (converting "|" operators into "||" in the process as an efficiency
boost).

Here's the diff against the "original" egrep.y.  As always, the trade secret
status of the code involved precludes a clearer posting.

304,308c304,311
< 						if (
< 							(k == c)
< 							| (k == DOT)
< 							| (k == CCL && member(c, right[curpos], 1))
< 							| (k == NCCL && member(c, right[curpos], 0))
---
> 					    if (
> 						k == c ||
> 						(c != '\n' &&
> 						(k == DOT ||
> 					        (k == CCL &&
> 						member(c, right[curpos], 1)) ||
> 						(k == NCCL &&
> 						member(c, right[curpos], 0))))
-- 
ado@vax2.nlm.nih.gov	ADO, VAX, and NIH are trademarks of Ampex and DEC.