[comp.unix.wizards] sed command behavior suspect

glen@proexam.UUCP (Glen Brydon) (07/06/89)

Upon installation of the Sun O/S version 4.0.3 we discovered that a sed
editing script stopped working properly. Upon diagnosis we isolated the
problem to the behavior of the `p' suffix to the Substitute command.
There seems to be some sort of interaction between this print option
and a `d' (delete) which follows in the i/o loop. (i.e.

	sed 's/.*/&/p;d'

). The bsd version does print all the lines as expected, but the system V
one does not. Also, the latest bsd version on our sun os does not either.

Please set me straight, but shouldn't this work as expected, and print
all lines?

Thanks for your comments.

maart@cs.vu.nl (Maarten Litmaath) (07/08/89)

glen@proexam.UUCP (Glen Brydon) writes:
\...	sed 's/.*/&/p;d'
\
\). The bsd version does print all the lines as expected, but the system V
\one does not. Also, the latest bsd version on our sun os does not either.

I say it should print every line indeed; apparently the System V version
of sed merely sets a flag "this line is to be printed later", which gets
cleared by the `d' operation. The following does work as expected:

	sed 's/.*/&/w foo
	d'
-- 
   "... a lap-top Cray-2 with builtin    |Maarten Litmaath @ VU Amsterdam:
cold fusion power supply"  (Colin Dente) |maart@cs.vu.nl, mcvax!botter!maart

guy@auspex.auspex.com (Guy Harris) (07/12/89)

From the previous articles in the thread:

 >There seems to be some sort of interaction between this print option
 >and a `d' (delete) which follows in the i/o loop. (i.e.
 >
 >	sed 's/.*/&/p;d'
 >
 >). The bsd version does print all the lines as expected, but the system V
 >one does not. Also, the latest bsd version on our sun os does not either.

and

 >I say it should print every line indeed; apparently the System V version
 >of sed merely sets a flag "this line is to be printed later", which gets
 >cleared by the `d' operation.

And now, the real story.  A couple of places where "sed" will print the
pattern space are:

	1) each time around its main loop;

	2) in an "s" command, if "p" was specified and a replacement was
	   made.

1) is done only if:

	a) "-n" wasn't specified

and

	b) the line wasn't deleted by a "c", "d", "D", "n", or "N"
	   command.

In the S5 version, but not in the BSD version, 2) isn't done if "-n"
wasn't specified.

The hypothesis that "the System V version of sed merely sets a flag
'this line is to be printed later', which gets cleared by the `d'
operation" is, in fact, true of *both* versions (roughly - actually,
there's no such flag, the line is always printed as long as "-n" wasn't
specified and a "d" or other deletion operation as listed above wasn't
performed); the difference is in the way the "p" modifier on the "s"
command is treated, and the line is printed in the BSD version due to
the "p" modifier on the "s" command, not due to the main "sed" loop. 

The net result is that "sed 's/.*/&/p;d'" will not print the line in 1),
since the line is deleted with a "d" command.  In the BSD version, it
will print it in 2), but in the S5 version it will do so only if the
"-n" flag is specified.

Another consequence of this difference is that a command

	sed 's/XXX/YYY/p'

will print each line on which a substitution is made *twice* in the BSD
version, but only once in the S5 version.  It's not clear whether an
older AT&T version from which the S5 version derived didn't include the
fix that caused 2) not to care whether "-n" was specified, or an older
AT&T version from which the BSD version was derived didn't include the
fix that caused 2) to care whether "-n" was specified (no, even if you
think one is correct and the other is incorrect, that doesn't affect the
history of the changes - the person who made the change may have thought
the version you think is correct is incorrect, and *vice versa*).  If
anybody has information to indicate why this difference exists, it might
be useful....

It turns out that the S5 "lint" command - a shell script - depends on
the S5 behavior, as written; it includes a line

	-*)	OPT=`echo $OPT | sed s/-//p`

The "sed" command in question will print each line twice in the BSD
version, but only once in the S5 version.

Now, this can be easily fixed, I think, by changing the line to

	-*)	OPT=`echo $OPT | sed -n s/-//p`

or

	-*)	OPT=`echo $OPT | sed s/-//`

but AT&T might still balk at adding the fix in the BSD version/removing
the fix in the S5 version, since it could be considered as breaking
existing scripts.  Then again, it might be possible to fix the script
that depends on the BSD behavior, as well; in the case of

	sed 's/<something>/<something else>/p;d'

as the ony command you can fix it by doing

	sed -n 's/<something>/<something else>/p'

to do the substitution and print only the lines on which the
substitution is made.  However, if something more complex is being done,
it may not be fixable.

My inclination might be to choose the BSD behavior, except that, for the
reason stated above, in most S5 systems, and the S5 environment on
multi-environment systems, you might be stuck with the S5 behavior
*anyway*.