bink@aplcen.apl.jhu.edu (9704) (10/18/89)
I've noticed the ED s///g command works incorrectly when given the command s/b*a/c/g on the input line aaaaaaaaaaaaaaaaaaaaa -- the result is: cacacacacacacacacacac and it should be: ccccccccccccccccccccc This is incorrect according to ED(1): (.,.)s/RE/replacement/g The substitute command searches each addressed line for an occurrence of the specified RE. In each line in which a match is found, ALL (NON-OVERLAPPED) matched strings are replaced by the replacement if the global replacement indicator g appears after the command. These occurrences ARE non-overlapping. The form s/ab*/c/g works as expected. Both VI and ED behave this way, on at least System V.2 and Ultrix-32 V3.1. Is this a confirmed bug in the RE matcher or the substitute code, and has it been resolved? This sounds trivial, but it did cause real problems. -- Greg Ubben bink@aplcen.apl.jhu.edu ...!uunet!mimsy!aplcen!bink
gwyn@smoke.BRL.MIL (Doug Gwyn) (10/18/89)
In article <3522@aplcen.apl.jhu.edu> bink@aplcen.apl.jhu.edu (9704) writes:
-I've noticed the ED s///g command works incorrectly when given the command
- s/b*a/c/g
-on the input line
- aaaaaaaaaaaaaaaaaaaaa
-Is this a confirmed bug in the RE matcher or the substitute code, and has
-it been resolved? This sounds trivial, but it did cause real problems.
I believe it's a real bug. The corresponding "sam" x/b*a/c/c/ works as
you expected the "ed" command to work.
perry@ccssrv.UUCP (Perry Hutchison) (10/19/89)
In article <3522@aplcen.apl.jhu.edu> bink@aplcen.apl.jhu.edu (9704) writes:
+ I've noticed the ED s///g command works incorrectly when given the command
+ s/b*a/c/g
+ on the input line
+ aaaaaaaaaaaaaaaaaaaaa
+ -- the result is:
+ cacacacacacacacacacac
+ and it should be:
+ ccccccccccccccccccccc
< documentation citation, according to which this is a bug, deleted >
+ Both VI and ED behave this way, on at least System V.2 and Ultrix-32 V3.1.
Both also behave this way on SunOS 3.5, which is a 4.2 BSD derivative.
Anyone care to try it out on a 2bsd or a V7 system?
henry@utzoo.uucp (Henry Spencer) (10/19/89)
In article <741@ccssrv.UUCP> perry@ccssrv.UUCP (Perry Hutchison) writes: >+ s/b*a/c/g >+ on the input line >+ aaaaaaaaaaaaaaaaaaaaa >+ -- the result is: >+ cacacacacacacacacacac >+ Both VI and ED behave this way, on at least System V.2 and Ultrix-32 V3.1. > >Both also behave this way on SunOS 3.5, which is a 4.2 BSD derivative. >Anyone care to try it out on a 2bsd or a V7 system? Using our ed, which although it runs on a Sun is a V6-derived program, I get the right answer. Looks like somebody broke it -- Berkeley, for a guess, but I could be wrong -- and everyone else blindly copied the mistake. (I unfortunately don't have a V7 ed source handy to check; just conceivably it was broken there, although it would surprise me.) -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
guy@auspex.auspex.com (Guy Harris) (10/20/89)
>Both also behave this way on SunOS 3.5, which is a 4.2 BSD derivative. >Anyone care to try it out on a 2bsd or a V7 system? "vi", but not "ed", behaves that way on SunOS 4.0, whose "vi" is 4.3BSD-derived and whose "ed" is S5R3-derived.
prp@sei.cmu.edu (Patrick Place) (10/20/89)
To add fuel to the broken regular expressions fire, using a 3B1 - I have no idea where this version of ed was derived from, but I would guess from AT&T System V - ed is again broken in the way indicated by others. I would be interested if Henry Spencer, who has a V6 derived ed, can determine whether he is using the standard regular expression package or if in the derivation path the appropriate routines have been fixed. Pat Place
dmr@alice.UUCP (10/20/89)
The behavior of ed's s/b*a/c/g on aaaaa to yield cacacacacac is doubtless a bug, and quite old; it occurs in the Seventh Edition ed. It relates to handling of null strings. The problem ed is trying to overcome is shown by the command s/b*/c/g on aaaaa where the most desirable result is presumably cacacacacac where each null string before, inside, and after the subject matches /b*/ exactly once. But "each null string" is a bit slippery in practice. Ed generally handles the 'g' option of substitution by iterating the 's' command, with an advancing starting position, until it reaches the end of the subject string. The new starting position is just between the end of the replacement string from the last iteration and the end of the unconsumed part of the subject string (if the r.e. match succeeded), or one character further along in the subject (if the match at this position failed). When the algorithm suggested by this scheme is tried naively, the second example, 's/b*/c/g', blows up; it puts a 'c' at the beginning, then again finds the null string after the inserted 'c' and before the same, initial 'a'; the r.e. didn't consume any of the subject. The remedy for this misbehavior that ed chose was to force * expressions to fail to match empty strings at the beginning of the subject string on the second and subsequent iterations of the 's/.../.../g' processing. (If you have the code, this is the meaning of the 'locs' variable in advance().) So in the 's/b*a/c/g' example, a 'c' is put at the start; on the second iteration, the subject still begins with the first 'a', and the 'b*' fails on this initial 'a'. Then it advances; this time the substitution succeeds, because it is looking for 'b*a' not at the beginning. What you've discovered is that this remedy is also naive. It looks wiser to attack the problem head-on, by permitting the r.e. null string match, but recasting the 's/.../.../g' loop so that if the entire r.e. matches a null string immediately to the right of a string replaced in the preceding iteration, an advance through the subject string is forced. Dennis Ritchie dmr@research.att.com att!research!dmr
stevens@hsi.UUCP (Richard Stevens) (10/20/89)
V7's ed is broken too, so that must have been where it happened, since Henry says V6 did it correctly. Also, the 4.3BSD sed doesn't show this bug. Under SVR3.2 on a 386, ed, vi, *and* sed all show the bug. Richard Stevens Health Systems International, New Haven, CT stevens@hsi.com
peter@ficc.uu.net (Peter da Silva) (10/20/89)
Brian Beattie's Software-Tools-derived 'ed' works fine, so it looks like Kernighan got it right. -- Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation. Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-' "You can tell when a USENET discussion is getting old when one of the 'U` participants drags out Hitler and the Nazis" -- Richard Sexton
henry@utzoo.uucp (Henry Spencer) (10/21/89)
In article <4586@fy.sei.cmu.edu> prp@sei.cmu.edu (Patrick Place) writes: >I would be interested if Henry Spencer, who has a V6 derived ed, >can determine whether he is using the standard regular expression >package or if in the derivation path the appropriate routines have >been fixed. The derivation path can be described, most inadequately, as "long and twisty". The modification log is several hundred lines long. I took a quick look through it and found nothing specifically identified as fixing regular-expression bugs, but there are enough slightly-mysterious references to cleanups and such that it's hard to be sure. It would be rather difficult to recover an original V6 source, and I probably couldn't make it run on a Sun to check. I may be able to give a definitive answer about V7 next week. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
chris@mimsy.umd.edu (Chris Torek) (10/21/89)
In article <10036@alice.UUCP> dmr@alice.UUCP writes: >It looks wiser to attack the problem head-on, by permitting >the r.e. null string match, but recasting the 's/.../.../g' loop >so that if the entire r.e. matches a null string immediately to the >right of a string replaced in the preceding iteration, >an advance through the subject string is forced. Interestingly enough, Emacs (Gosling, v 85 or later) gets this right. It does not go to much effort: the solution is simply `delete the matched text; insert the new text; if the matched text has length zero, advance'. (That is, I think the procedure described above is excessively careful. It should not matter whether the null match is immediately rightward of a preceding replacement.) (I did once have to fix the Gosmacs code that handled replacements of the form (re-query-replace "^" "stuff") which, if you answered `no' to the `replace this?' prompt, would stay in the same place. The solution is the same: move right if the match is empty.) -- `They were supposed to be green.' In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
tso@rocky2 (Daniels Tso (Wiesel)) (10/22/89)
In article <1989Oct20.182045.21541@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >It would be rather difficult to recover an original V6 source, and I >probably couldn't make it run on a Sun to check. I may be able to give >a definitive answer about V7 next week. For what it's worth, ed is broken on PWB 1.2 (The precursor to USG System 3.0 that was released between V6 and V7). Almost everything on PWB 1.2 is V6-like, so I speculate that if V6 ed "works" (as Henry's suggests) and PWB 1.2 and V7 is broken, that the change was made on the research group's UNIX before the PWB people picked it up. I could fetch a V6 ed and diff it with the PWB ed. The number of diffs should be very small... Cheers, Dan Ts'o Dept. Neurobiology 212-570-7671 Rockefeller Univ. ...cmcl2!rna!dan 1230 York Ave. dan@rna.rockefeller.edu NY, NY 10021 tso@rockefeller.edu
arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (10/25/89)
In article <6606@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >Brian Beattie's Software-Tools-derived 'ed' works fine, so it looks like >Kernighan got it right. Indeed. The "Georgia Tech 'se' Screen Editor", which is also derived from the Software Tools 'ed', gets it right as well. ('Se' is available in your nearest comp.sources.unix archive.) -- Arnold Robbins -- Emory U. Information Technology Div. | Laundry increases DOMAIN: arnold@emoryu1.cc.emory.edu | exponentially in the UUCP: gatech!emoryu1!arnold PHONE: +1 404 727-7636 | number of children. BITNET: arnold@emoryu1 FAX: +1 404 727-2599 | -- Miriam Hartholz
webb@bass.tcspa.ibm.com (Bill Webb) (10/26/89)
>... > It would be rather difficult to recover an original V6 source, and I > probably couldn't make it run on a Sun to check. I may be able to give > a definitive answer about V7 next week. > -- > A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology > megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu I've just tried it on a Version 6 ed (I knew that PDP 11 simulator would eventually come in handy!) that had only one slight modification over the standard v6 ed (it output a prompt character) and it has the bug. I could probably recompile the original ed.c (dated May 13, 1975) and test that but I expect it has the bug as well, but I'd have recreate the simulated v6 environment to do so which would take a few hours. ---------------------------------------------------------------- The above views are my own, not necessarily those of my employer. Bill Webb (IBM AWD Palo Alto), (415) 855-4457). UUCP: ...!uunet!ibmsupt!webb
tale@pawl.rpi.edu (David C Lawrence) (10/26/89)
To provide one more editor data point in this, GNU Emacs' (replace-regexp "b*a" "c"), which is equivalent to the original example, works correctly. JTYMLTK. -- (setq mail '("tale@pawl.rpi.edu" "tale@itsgw.rpi.edu" "tale@rpitsmts.bitnet"))
eggert@twinsun.com (Paul Eggert) (10/27/89)
tale@pawl.rpi.edu (David C Lawrence) writes:
GNU Emacs' (replace-regexp "b*a" "c"), which is equivalent to the
original example, works correctly.
Ah, but (replace-regexp "a*" "b") does not work correctly in GNU Emacs 18.55;
it misses the empty string just before the end of the buffer. I found this bug
(and posted a bug report) after reading dmr's explanation of the bug that was
avoided in 'ed'.