vsh@pixel.UUCP (vsh) (10/18/85)
There is a bug in sed: when processing files which do not end with a newline character, the last line is discarded. For example, execute the following: echo aaaaa > foo1 ; echo bbbbb > foo2 ; echo -n ccccc > foo3 echo ddddd > foo4 ; echo eeeee > foo5 ; echo -n fffff >> foo5 cat foo? ; sed '' foo? cat foo5 ; sed '' foo5 In both cases, the 'sed' should be identical with the 'cat', but is not. This shell script creates two 'diff' files, 'd0' and 'd1', which can be applied to the System V, Release 2 versions of 'sed0.c' and 'sed1.c' to fix this shortcoming. In addition, sed will recognize '-' on the command line as representing standard input. These changes have been tested at Pixel and appear to work okay. Please let me know (by e-mail) if you encounter any problems, or if any other features should be added. Steve Harris : ---------------- bourne shell script starts here -------------------------- : sed 's/^X//' << ---aaaaaaaaaaaaaa--- > d0 X138a X execute(); X. X134,137c X if(eargc <= 0) { X eargc = 1; X *eargv = "-"; X. ---aaaaaaaaaaaaaa--- sed 's/^X//' << ---bbbbbbbbbbbbbb--- > d1 X549a X} X Xgbuf () X{ X extern int eargc; X extern char **eargv; X X register int c; X static int not_first = 0; X X for (;;) { X if ((not_first) && ((c = read (f, ibuf, 512))) > 0) X return (c); X else { X if (not_first && f != 0) X close (f); X while (eargc-- > 0) { X if (! strcmp (*eargv, "-")) X f = 0; X else if ((f = open (*eargv, 0)) < 0) { X fprintf (stderr, "Can't open %s\n", *eargv); X eargv++; X continue; X } X not_first = 1; X eargv++; X break; X } X if (eargc < 0) X return (0); X } X } X. X541,543c X else if (c) X if (p1 < lbend) { X *p1++ = c; X lnlflag = 0; X } X. X538a X else { X dolflag = 1; X break; X } X if ((c = *p2++) == '\n') { X lnlflag++; X if (p2 >= ebp) X if (c = gbuf()) { X p2 = ibuf; X ebp = ibuf + c; X } X else X dolflag = 1; X. X535a X for (;;) { X if (p2 >= ebp) X if (c = gbuf()) { X. X520,534d X517a X X int gbuf (); X X sflag = 0; /* BUGFIX, usenet 5/16/85 */ X. X515a X X. X513,514c Xchar *gline(addr) Xchar *addr; X. X488c X fprintf(ipc->r1.fcode, "%s", linebuf); X if (lnlflag) X fprintf(ipc->r1.fcode, "\n"); X. X472c X if (lnlflag) X putc('\n', stdout); X. X450c X if (lnlflag) X putc('\n', stdout); X. X443c X if (lnlflag) X putc('\n', stdout); X. X437c X if (lnlflag) X putc('\n', stdout); X. X426,430c X execp = gline(spend); X. X422a X if (dolflag) X break; X. X414,418c X execp = gline(linebuf); X. X409c X if (lnlflag) X putc('\n', stdout); X. X405a X if (dolflag) X break; X. X402c X fprintf(stdout, "%s", genbuf); X if (lnlflag) X fprintf(stdout, "\n"); X. X395c X fprintf(stdout, "%s", genbuf); X if (lnlflag) X fprintf(stdout, "\n"); X. X387c X fprintf(stdout, "%s", genbuf); X if (lnlflag) X fprintf(stdout, "\n"); X. X377c X fprintf(stdout, "%s", genbuf); X if (lnlflag) X fprintf(stdout, "\n"); X. X299c X if (lnlflag) X putc('\n', stdout); X. X170c X if (lnlflag) X putc('\n', stdout); X. X87,90c X if (dolflag) X return (0); X execp = gline(linebuf); X. X80,85d X70,76d X62,63c Xexecute() X. X17a Xint lnlflag = 1; X. X14c Xint dolflag = 0; X. ---bbbbbbbbbbbbbb--- -- Steve Harris Pixel Systems Inc.; 300 Wildwood St.; Woburn, MA. 01801 (617) 933-7735 x2314 (work) (617) 664-0099 (home) {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}!wjh12!pixel!vsh
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/18/85)
> There is a bug in sed: when processing files which do not end with a newline > character, the last line is discarded. Many UNIX text-file utilities will discard a (necessarily final) text line that does not end in a newline. Quite simply, such a file is not a proper UNIX text file.
kay@warwick.UUCP (Kay Dekker) (10/20/85)
In article <2235@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes: >Many UNIX text-file utilities will discard a (necessarily final) >text line that does not end in a newline. Quite simply, such a >file is not a proper UNIX text file. Who says? Where's the definition of a 'proper' UNIX text file? Maybe the "many UNIX text-file utilities" could do with fixing: discarding lines that don't end in a newline seems bogus to me. Kay. -- "A boy does not put his hand into his pocket until every other means of gaining his end has failed." _Tommy_, by J. M. Barrie. ... mcvax!ukc!warwick!flame!kay
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/22/85)
> >Many UNIX text-file utilities will discard a (necessarily final) > >text line that does not end in a newline. Quite simply, such a > >file is not a proper UNIX text file. > > Who says? Where's the definition of a 'proper' UNIX text file? The problem is, there are several interpretations of such a file, depending on the utility involved. Perhaps there should be a well-defined standard interpretation, but there isn't currently. "A file of text consists simply of a string of characters, with lines demarcated by the newline character." -- from "The UNIX Time-Sharing System" by Ritchie & Thompson "text file, ASCII file -- a file, the bytes of which are understood to be in ASCII code" -- from "Glossary" in "UNIX Time-Sharing System Programmer's Manual", 8th Ed. "A text stream is an ordered sequence of bytes composed into lines, each line consisting of zero or more characters plus a terminating new-line character. ... The sequentially last character read in from a text stream will, however, always be sequentially the last character that was earlier written out to the text stream, if that character was a new-line." -- from ANSI X3J11/85-045 My personal choice would be similar to Ritchie & Thompson, where newlines delimit (NOT "terminate") text lines, so that the last character in a text file would not need to be a newline. However, this raises the question of what utilities should do with the null line at the end of every text file that DOES end with a newline; this will still be utility-dependent (and should be documented whenever it is handled differently from other text lines in the file). X3J11/85-045 botched it anyhow, since they intended that ALL UNIX files qualify as "text streams" under stdio (vs. "binary streams", which have to be handled differently on some non-UNIX OSes). So, how do we establish a standard interpretation for non-newline- terminated UNIX text files? (Discussion should move to net.unix.)