[net.bugs] Fix to sed

vsh@pixel.UUCP (vsh) (10/18/85)

There is a bug in sed:  when processing files which do not end with a newline
character, the last line is discarded.  For example, execute the following:

	echo aaaaa > foo1 ; echo bbbbb > foo2 ; echo -n ccccc > foo3 
	echo ddddd > foo4 ; echo eeeee > foo5 ; echo -n fffff >> foo5
	cat foo? ; sed '' foo?
	cat foo5 ; sed '' foo5

In both cases, the 'sed' should be identical with the 'cat', but is not.

This shell script creates two 'diff' files, 'd0' and 'd1', which can be
applied to the System V, Release 2 versions of 'sed0.c' and 'sed1.c' to fix
this shortcoming.  In addition, sed will recognize '-' on the command
line as representing standard input.

These changes have been tested at Pixel and appear to work okay.
Please let me know (by e-mail) if you encounter any problems, or if any
other features should be added.

Steve Harris
: ---------------- bourne shell script starts here --------------------------
:
sed 's/^X//' << ---aaaaaaaaaaaaaa--- > d0
X138a
X	execute();
X.
X134,137c
X	if(eargc <= 0) {
X		eargc = 1;
X		*eargv = "-";
X.
---aaaaaaaaaaaaaa---
sed 's/^X//' << ---bbbbbbbbbbbbbb--- > d1
X549a
X}
X
Xgbuf ()
X{
X	extern int	eargc;
X	extern char	**eargv;
X
X	register int c;
X	static int not_first = 0;
X
X	for (;;) {
X		if ((not_first) && ((c = read (f, ibuf, 512))) > 0)
X			return (c);
X		else {
X			if (not_first && f != 0)
X				close (f);
X			while (eargc-- > 0) {
X				if (! strcmp (*eargv, "-"))
X					f = 0;
X				else if ((f = open (*eargv, 0)) < 0) {
X					fprintf (stderr, "Can't open %s\n", *eargv);
X					eargv++;
X					continue;
X				}
X				not_first = 1;
X				eargv++;
X				break;
X			}
X			if (eargc < 0)
X				return (0);
X		}
X	}
X.
X541,543c
X		else if (c)
X			if (p1 < lbend) {
X				*p1++ = c;
X				lnlflag = 0;
X			}
X.
X538a
X			else {
X				dolflag = 1;
X				break;
X			}
X		if ((c = *p2++) == '\n') {
X			lnlflag++;
X			if (p2 >= ebp)
X				if (c = gbuf()) {
X					p2 = ibuf;
X					ebp = ibuf + c;
X				}
X				else
X					dolflag = 1;
X.
X535a
X	for (;;) {
X		if (p2 >= ebp)
X			if (c = gbuf()) {
X.
X520,534d
X517a
X
X	int gbuf ();
X
X	sflag = 0;	/* BUGFIX, usenet 5/16/85 */
X.
X515a
X
X.
X513,514c
Xchar *gline(addr)
Xchar *addr;
X.
X488c
X			fprintf(ipc->r1.fcode, "%s", linebuf);
X			if (lnlflag)
X				fprintf(ipc->r1.fcode, "\n");
X.
X472c
X					if (lnlflag)
X						putc('\n', stdout);
X.
X450c
X				if (lnlflag)
X					putc('\n', stdout);
X.
X443c
X			if (lnlflag)
X				putc('\n', stdout);
X.
X437c
X			if (lnlflag)
X				putc('\n', stdout);
X.
X426,430c
X			execp = gline(spend);
X.
X422a
X			if (dolflag)
X				break;
X.
X414,418c
X			execp = gline(linebuf);
X.
X409c
X				if (lnlflag)
X					putc('\n', stdout);
X.
X405a
X			if (dolflag)
X				break;
X.
X402c
X			fprintf(stdout, "%s", genbuf);
X			if (lnlflag)
X				fprintf(stdout, "\n");
X.
X395c
X							fprintf(stdout, "%s", genbuf);
X							if (lnlflag)
X								fprintf(stdout, "\n");
X.
X387c
X						fprintf(stdout, "%s", genbuf);
X						if (lnlflag)
X							fprintf(stdout, "\n");
X.
X377c
X								fprintf(stdout, "%s", genbuf);
X								if (lnlflag)
X									fprintf(stdout, "\n");
X.
X299c
X				if (lnlflag)
X					putc('\n', stdout);
X.
X170c
X			if (lnlflag)
X				putc('\n', stdout);
X.
X87,90c
X		if (dolflag)
X			return (0);
X		execp = gline(linebuf);
X.
X80,85d
X70,76d
X62,63c
Xexecute()
X.
X17a
Xint	lnlflag = 1;
X.
X14c
Xint     dolflag = 0;
X.
---bbbbbbbbbbbbbb---
-- 
Steve Harris  
Pixel Systems Inc.; 300 Wildwood St.; Woburn, MA.  01801
(617) 933-7735 x2314 (work)   (617) 664-0099 (home)
{allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}!wjh12!pixel!vsh

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/18/85)

> There is a bug in sed:  when processing files which do not end with a newline
> character, the last line is discarded.

Many UNIX text-file utilities will discard a (necessarily final)
text line that does not end in a newline.  Quite simply, such a
file is not a proper UNIX text file.

kay@warwick.UUCP (Kay Dekker) (10/20/85)

In article <2235@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>Many UNIX text-file utilities will discard a (necessarily final)
>text line that does not end in a newline.  Quite simply, such a
>file is not a proper UNIX text file.

Who says?  Where's the definition of a 'proper' UNIX text file?

Maybe the "many UNIX text-file utilities" could do with fixing: discarding
lines that don't end in a newline seems bogus to me.

						Kay.
-- 
"A boy does not put his hand into his pocket until every other means of
gaining his end has failed."		_Tommy_, by J. M. Barrie.
			
			... mcvax!ukc!warwick!flame!kay

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/22/85)

> >Many UNIX text-file utilities will discard a (necessarily final)
> >text line that does not end in a newline.  Quite simply, such a
> >file is not a proper UNIX text file.
> 
> Who says?  Where's the definition of a 'proper' UNIX text file?

The problem is, there are several interpretations of such a file,
depending on the utility involved.  Perhaps there should be a
well-defined standard interpretation, but there isn't currently.

"A file of text consists simply of a string of characters, with
lines demarcated by the newline character."  -- from "The UNIX
Time-Sharing System" by Ritchie & Thompson

"text file, ASCII file -- a file, the bytes of which are understood
to be in ASCII code"  -- from "Glossary" in "UNIX Time-Sharing
System Programmer's Manual", 8th Ed.

"A text stream is an ordered sequence of bytes composed into lines,
each line consisting of zero or more characters plus a terminating
new-line character.  ...  The sequentially last character read in
from a text stream will, however, always be sequentially the last
character that was earlier written out to the text stream, if that
character was a new-line."  -- from ANSI X3J11/85-045

My personal choice would be similar to Ritchie & Thompson, where
newlines delimit (NOT "terminate") text lines, so that the last
character in a text file would not need to be a newline.  However,
this raises the question of what utilities should do with the
null line at the end of every text file that DOES end with a
newline; this will still be utility-dependent (and should be
documented whenever it is handled differently from other text
lines in the file).

X3J11/85-045 botched it anyhow, since they intended that ALL UNIX
files qualify as "text streams" under stdio (vs. "binary streams",
which have to be handled differently on some non-UNIX OSes).

So, how do we establish a standard interpretation for non-newline-
terminated UNIX text files?

(Discussion should move to net.unix.)