henry@utzoo.UUCP (Henry Spencer) (10/13/87)
The following bug appears to be present in egrep (the Bell/AT&T one, not the comp.sources.unix one) on all extant versions of Unix. Certainly it is present in V7, 4.2BSD, Sunnix 3.2, and 386 System V Release 3. % echo 0 >foo % egrep '0.' foo % egrep '^0.' foo 0 % Adding the "^" seems to convince egrep that "." can legitimately match newline. Changing "." to "[^x]" does not change the behavior. Some quick tests suggest that "$" does not produce a similar anomaly. I can find no statement in any manual that gives even a shred of legitimacy to this. Unfortunately, I lack the time to root around in the somewhat obscure innards of egrep to discover the exact cause and fix. Anybody out there feeling ambitious? :-) -- "Mir" means "peace", as in | Henry Spencer @ U of Toronto Zoology "the war is over; we've won". | {allegra,ihnp4,decvax,utai}!utzoo!henry
ado@elsie.UUCP (Arthur David Olson) (10/23/87)
> The following bug appears to be present in egrep (the Bell/AT&T one, not > the comp.sources.unix one) on all extant versions of Unix. Certainly it > is present in V7, 4.2BSD, Sunnix 3.2, and 386 System V Release 3. > > % echo 0 >foo > % egrep '0.' foo > % egrep '^0.' foo > 0 > % > > Adding the "^" seems to convince egrep that "." can legitimately match > newline. Changing "." to "[^x]" does not change the behavior. Some quick > tests suggest that "$" does not produce a similar anomaly. The change below cures the problem, and even "looks right." Your line numbers will vary. *** 3.2/egrep.y Fri Oct 23 16:34:20 1987 --- 3.3/egrep.y Fri Oct 23 16:34:24 1987 *************** *** 303,309 **** if ((k = name[curpos]) >= 0) if ( (k == c) ! | (k == DOT) | (k == CCL && member(c, right[curpos], 1)) | (k == NCCL && member(c, right[curpos], 0)) ) { --- 303,309 ---- if ((k = name[curpos]) >= 0) if ( (k == c) ! | (k == DOT && c != '\n') | (k == CCL && member(c, right[curpos], 1)) | (k == NCCL && member(c, right[curpos], 0)) ) { -- Bugs is a trademark of Warner Brothers and Volkswagen. -- ado@vax2.nlm.nih.gov ADO, VAX, and NIH are trademarks of Ampex and DEC.
henry@utzoo.UUCP (Henry Spencer) (11/02/87)
> The change below cures the problem, and even "looks right."
Unfortunately, it doesn't cure the whole problem, since changing "." to
"[^x]" still provokes the bug.
--
PS/2: Yesterday's hardware today. | Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry
ado@elsie.UUCP (Arthur David Olson) (11/09/87)
In article <8872@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes: > > The change. . .cures the problem, and even "looks right." > > Unfortunately, it doesn't cure the whole problem, since changing "." to > "[^x]" still provokes the bug. True enough. Newline shouldn't match "."; it also shouldn't match "[anything]" or "[^anything]". This can be reflected by moving the test against newline (converting "|" operators into "||" in the process as an efficiency boost). Here's the diff against the "original" egrep.y. As always, the trade secret status of the code involved precludes a clearer posting. 304,308c304,311 < if ( < (k == c) < | (k == DOT) < | (k == CCL && member(c, right[curpos], 1)) < | (k == NCCL && member(c, right[curpos], 0)) --- > if ( > k == c || > (c != '\n' && > (k == DOT || > (k == CCL && > member(c, right[curpos], 1)) || > (k == NCCL && > member(c, right[curpos], 0)))) -- ado@vax2.nlm.nih.gov ADO, VAX, and NIH are trademarks of Ampex and DEC.