[net.unix-wizards] 4.2bsd eof flag in stdio

goldfarb@ucf-cs.UUCP (Ben Goldfarb) (11/08/84)

[]

Why did Berkeley change stdio such that typing ^D (or whatever EOF character
one is using) on stdin causes that stream to reflect eof until a clearerr()
is done?  Has this been discussed here before?  If so, I apologize for 
further belaboring the issue.

In any case, what is the correct approach to this problem?  Obviously, we
can't expect the authors of programs that have been distributed with UNIX
since V7 to have provided for Berkeley's change; as it stands I've found
that addbib and learn are both broken because of the continual EOF.  So I
patched
		if (feof(stdin))
			clearerr(stdin);
into both programs.  I'm sure more are affected.  Alternatively, I could have
"fixed" stdio, but how many Berkeley programs make use of this "feature?"
I'd appreciate some net wisdom on the subject.  

-- 
Ben Goldfarb
University of Central Florida
uucp: {duke,decvax,princeton}!ucf-cs!goldfarb
ARPA: goldfarb.ucf-cs@csnet.relay
csnet: goldfarb@ucf

geoff@desint.UUCP (Geoff Kuenning) (11/13/84)

In article <1697@ucf-cs.UUCP> goldfarb@ucf-cs.UUCP (Ben Goldfarb) writes:

>Why did Berkeley change stdio such that typing ^D (or whatever EOF character
>one is using) on stdin causes that stream to reflect eof until a clearerr()
>is done?  Has this been discussed here before?  If so, I apologize for 
>further belaboring the issue.
>
>In any case, what is the correct approach to this problem?

We did this when I was at DEC because that's the way a file behaves, and it
is frequently easier to write a program to read the EOF twice.  For example:

	while ((ch = getchar ()) != EOF)
	    switch (ch)
		{
		case '\\':
		    switch (ch = getchar ())
			{
			case EOF:
			    break;
			}
		    break;
		}

Here, reading the EOF twice is a convenient way to handle the loop exit.
(Yes, there are other ways, notably using a goto.  But in more complex code
this approach may be the cleanest).  I never like assuming that I can unget
an EOF character (although it works on some systems).

One can also make a persuasive argument for the advantages of the other
approach, but I prefer this way because of consistency.
-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

shannon@sun.uucp (Bill Shannon) (11/16/84)

Ben Goldfarb writes,
> Why did Berkeley change stdio such that typing ^D (or whatever EOF character
> one is using) on stdin causes that stream to reflect eof until a clearerr()
> is done?  Has this been discussed here before?  If so, I apologize for 
> further belaboring the issue.
> 
> In any case, what is the correct approach to this problem?  Obviously, we
> can't expect the authors of programs that have been distributed with UNIX
> since V7 to have provided for Berkeley's change; as it stands I've found
> that addbib and learn are both broken because of the continual EOF.  So I
> patched
> 		if (feof(stdin))
> 			clearerr(stdin);
> into both programs.  I'm sure more are affected.  Alternatively, I could have
> "fixed" stdio, but how many Berkeley programs make use of this "feature?"
> I'd appreciate some net wisdom on the subject.  

The change was made by Sun and bought back by Berkeley.  I believe this
has been discussed on the net before.  The change actually fixes another
bug.  The bug was that without this change programs using fread on terminals
would never report an EOF condition to the user because internally fread
would just swallow the EOF and return a short record and the next fread
would go on reading past the EOF.  We actually ran into this bug in some
existing program, I forget which one.  Unfortunately, not all the programs
which depended on the old behaviour were fixed.

					Bill Shannon
					Sun Microsystems, Inc.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/18/84)

> The change was made by Sun and bought back by Berkeley.  I believe this
> has been discussed on the net before.  The change actually fixes another
> bug.  The bug was that without this change programs using fread on terminals
> would never report an EOF condition to the user because internally fread
> would just swallow the EOF and return a short record and the next fread
> would go on reading past the EOF.  We actually ran into this bug in some
> existing program, I forget which one.  Unfortunately, not all the programs
> which depended on the old behaviour were fixed.

fread() returns 0 if there are 0 characters left in the terminal
input queue when the ^D is typed.  What would you have it do?

Contrary to popular misconception, ^D is NOT an "EOF" character;
rather, it marks a delimiter for input canonicalization.  If all
previous input has been consumed and a ^D is typed, then read()
returns a count of 0.  This is often interpreted as EOF.  If there
is some uncanonicalized input and ^D is typed, it acts much like
NEWLINE except of course no \n is appended.

If the 4.2BSD fread() was buggy, it should have been fixed rather
than introducing a significant incompatibility with other STDIOs.

thomas@utah-gr.UUCP (Spencer W. Thomas) (11/19/84)

In article <5867@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>fread() returns 0 if there are 0 characters left in the terminal
>input queue when the ^D is typed.  What would you have it do?
The problem is if you type 'foo^D' with no newline.  You would expect
that this would terminate input reading, but it does not -- you must
type another ^D to finish it off.
>
>Contrary to popular misconception, ^D is NOT an "EOF" character;
>rather, it marks a delimiter for input canonicalization.  If all
>previous input has been consumed and a ^D is typed, then read()
>returns a count of 0.  This is often interpreted as EOF.  If there
>is some uncanonicalized input and ^D is typed, it acts much like
>NEWLINE except of course no \n is appended.
>
This is, of course, a matter of opinion, but all the documentation
states that ^D is the *end-of-file* character.  Perhaps the
documentation (unchanged since my memory) is "buggy"?

>If the 4.2BSD fread() was buggy, it should have been fixed rather
>than introducing a significant incompatibility with other STDIOs.
This bug is in ALL versions of fread (and getchar, and ...) *except*
4.2.

=Spencer

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/19/84)

> The problem is if you type 'foo^D' with no newline.  You would expect
> that this would terminate input reading, but it does not -- you must
> type another ^D to finish it off.

This is just what I expect.  Why should the first ^D terminate input
reading, since the read will return 3 characters at that point?

> This is, of course, a matter of opinion, but all the documentation
> states that ^D is the *end-of-file* character.  Perhaps the
> documentation (unchanged since my memory) is "buggy"?

Yup.  Kernighan & Pike got it right in their book.

> This bug is in ALL versions of fread (and getchar, and ...) *except*
> 4.2.

The UNIX System V Release 2.0 fread() acts as I originally described,
which is what I would expect.  In any case, judging from the number of
times people have had problems caused by this change, it was not a wise
move.

shannon@sun.uucp (Bill Shannon) (11/20/84)

> fread() returns 0 if there are 0 characters left in the terminal
> input queue when the ^D is typed.  What would you have it do?

Try this program on your favorite version of stdio:

#include <stdio.h>

char	buf[256];

main()
{
	register int n;

	while (n = fread(buf, 1, sizeof buf, stdin))
		fwrite(buf, 1, n, stdout);
	printf("got EOF\n");
}

Run it and type (e.g.):

testing 1 2 3
^D
another test

Where ^D is your EOT character.  If the program terminates
when you type ^D then your stdio works properly.  The 4.1
version of stdio would "eat" the ^D and echo the first and
third lines.  It would only terminate if you typed ^D twice
in a row.

> If the 4.2BSD fread() was buggy, it should have been fixed rather
> than introducing a significant incompatibility with other STDIOs.

Making EOF sticky was the fix.  It seemed like the right thing to
do; the incompatibility was unfortunate.  If you have a fix to
fread (filbuf, actually) that both fixes this bug and avoids the
incompatibility then please send it to me and/or post it to the
net.  If this works properly in System V I would be interested to
hear that as well.

					Bill Shannon
					Sun Microsystems, Inc.

Ron Natalie <ron@BRL-TGR> (11/20/84)

Doug:

Looking in your beloved System V manuals you will find under READ(2):

	A value of zero is returned when end-of-file has been reached.
and
	When attempting to read a file associated with a tty that has
	no data currently available ... the read will block until the
	data becomes available.

And then looking at the documentation for the TTY driver, where is it
oh yes, it's called TERMIO and it's in the system administrators manual.
Of course, no ordinary user would ever want to change his terminal modes.

	A line is delimited by a new-line (ASCII LF), an  end-of-file
	(ASCII-EOT), or an end-of-line character.

	EOF - may be used to generate an end-of-file from a terminal.
	Thus if there are no characters waiting, which is to say EOF
	occurred at the beginning of line, zero characters will be
	passed back, which is the standard end-of-file indication.


What this implies is the zero return from TTY reads are END-OF-FILE
and should be treated as such.  It is possible to continue reading
past end of file on some devices such as TTY and Magtape, but that
doesn't mean you shouldn't handle EOF properly.

Fread states
	Fread stops appending bytes if an end-of-file or error
	condition occurs.
Ferror states
	Feof returns non-zero when EOF has previously been detected
	reading the named input stream.
	Clearerr resets the error indicator and EOF indicator to zero.

It is obvious from this, that no distinction is made of EOT chars meaning
anything but the absolute end-of-file on TTY.  If you were attempting to
write a Stdio using the definitions in the manual, you would have to implement
it this way.

You need to stop defining UNIX by whatever bugs AT&T has and penalize
Berkeley because they have fixed a legitimate bug in the original UNIX
code.

-Ron

Like I'm from the Mystic Valley.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/20/84)

[The actual manual entries were somewhat different and clearer.]

This is not the same as saying that EOT "means" EOF.  Only in certain
contexts does it have that effect, as the TERMIO(7) manual entry says.

> What this implies is the zero return from TTY reads are END-OF-FILE
> and should be treated as such.  It is possible to continue reading
> past end of file on some devices such as TTY and Magtape, but that
> doesn't mean you shouldn't handle EOF properly.

Agreed.

> Fread states
> 	Fread stops appending bytes if an end-of-file or error
> 	condition occurs.

And so it does!  But this is on that call, not necessarily on future
calls.  This feature works as advertised on UNIX System V Release 2.0.

> Ferror states
> 	Feof returns non-zero when EOF has previously been detected
> 	reading the named input stream.
> 	Clearerr resets the error indicator and EOF indicator to zero.

Again, this is the way it does work.  The EOF is "latched" until
cleared, but fread() can read past EOF if there is data there.

I'm all for bugs being fixed, if they are really bugs and not just
different ideas about what should be happening.  Perhaps 4.2BSD and
UNIX System V now agree about EOF behavior; that would be a pleasant
change.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/21/84)

> #include <stdio.h>
> 
> char	buf[256];
> 
> main()
> {
> 	register int n;
> 
> 	while (n = fread(buf, 1, sizeof buf, stdin))
> 		fwrite(buf, 1, n, stdout);
> 	printf("got EOF\n");
> }
> 
> Run it and type (e.g.):
> 
> testing 1 2 3
> ^D
> another test
> 
> Where ^D is your EOT character.  If the program terminates
> when you type ^D then your stdio works properly.  The 4.1
> version of stdio would "eat" the ^D and echo the first and
> third lines.  It would only terminate if you typed ^D twice
> in a row.

Thanks for the example, Bill.  I guess we disagree about what is
expected here.  The "EOFish" nature of the input is reflected in
fread()'s short return count; as expected the 0-length read forces
fread() to return prematurely.  I see no reason for it to "stick"
at EOF, though.  The programmer certainly can tell that he is at
EOF from the short count.  Continuing to read the stream is a
programming error (that happens to work on "ordinary" files,
unless they are being dynamically appended to), and more than a bit
sloppy besides (just like the internals of most UNIX utilities).

I see the argument for the other interpretation; I just don't
agree with it.

geoff@desint.UUCP (Geoff Kuenning) (11/22/84)

>Contrary to popular misconception, ^D is NOT an "EOF" character;
>rather, it marks a delimiter for input canonicalization.  If all
>previous input has been consumed and a ^D is typed, then read()
>returns a count of 0.  This is often interpreted as EOF.  If there
>is some uncanonicalized input and ^D is typed, it acts much like
>NEWLINE except of course no \n is appended.

	-Doug Gwyn

Contrary to popular misconception, neither the design of the Unix kernel nor
its documentation was handed down on stone tablets from on high.  I don't
really care whether Thompson and Ritchie chose to describe the behavior of
the original Unix TTY driver as "EOF" or "canonicalization".  I strongly
suspect that their motivation was to describe the behavior of the code they
actually wrote, and the code was written for convenient implementation.

We need a way to indicate "end of data" to a program reading TTY input.  It is
convenient for programmers to consider "end of file" as "end of data" when
reading file input.  Since redirection of stdin is one of Unix's great
features, it is thus reasonable to simply provide a way for a TTY to indicate
"end of file".  If T&R implemented it sloppily and documented it accurately,
that is no reason for us to slavishly follow their lead.

Once you decide to have ^D truly mean "end of file", it is only reasonable
to make it operate like a true EOF.  That means that multiple reads return
multiple EOF indications, just like a disk.  The original implementation
can be extremely disconcerting--I had a program a few days ago that wanted
two EOF's to terminate.  It tested fine from a file, but "hung" when I
typed in the input and terminated it with ^D.

 The fact that some programs have in the past misinterpreted this bug as a
 feature and made use of it is unfortunate, but something we will have to
 live with.  It is just not that hard to grep for "EOF" and add "clearerr"
 calls.  In any case, any program that was doing this was already providing
 incompatible behavior between files and TTY's.  That's what you get when
 you special-case TTY input :-).
-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

bsa@ncoast.UUCP (Brandon Allbery) (11/22/84)

TTY(4)              XENIX Programmer's Manual              TTY(4)

		. . .


     EOT  (Control-D) may be used to generate an end of file from
          a terminal.  When an EOT is received, all the charac-
          ters waiting to be read are immediately passed to the
          program, without waiting for a new-line, and the EOT is
          discarded.  Thus if there are no characters waiting,
          which is to say the EOT occurred at the beginning of a
          line, zero characters will be passed back, and this is
          the standard end-of-file indication.

\this is in the system manual; \i'd suggest both you
and \berkeley look it up (in a v7 manual if necessary).
fread() was \n\o\t designed for terminal \i/\o.

--bsa
-- 
  Brandon Allbery @ North Coast Xenix  |   the.world!ucbvax!decvax!cwruecmp!
6504 Chestnut Road, Independence, Ohio |       {atvax!}ncoast!{tdi1!}bsa
   (216) 524-1416             \ 44131  | E1439@CSUOHIO.BITNET (friend's acct.)
---------------------------------------+---------------------------------------
Forgive; we just had a system crash & lost a month's worth of work and patches.

kre@mulga.OZ (Robert Elz) (11/23/84)

From Doug Gwyn (in the last referenced article):
| 
| > This is, of course, a matter of opinion, but all the documentation
| > states that ^D is the *end-of-file* character.  Perhaps the
| > documentation (unchanged since my memory) is "buggy"?
| 
| Yup.  Kernighan & Pike got it right in their book.
| 

Rarely does anyone play into my hands quite no nicely.  Now that
we have K&P cited as the absolute authority on this issue, I
will proceed to quote from page 204.

	   Structurally, readslow is identical to cat except that
	it loops instead of quitting when it encounters the current
	end of the input.  It has to use low-leval I/O because the
	standard library routines continue to report EOF after the
	first end of file.

This immediately precedes the listing of the "readslow" program,
which is the authorised version of "tail -f" according to the
gosple of St Pike.

I'm not sure which particular version of "the standard library
routines" they were referring to - this was written before 4.2
was released.  I always assumed that V8 had fixed the bug as
well, but I was (not too long ago) told that this was not so.
Would you care to clarify rob?

The above inclusion (from pervious articles) is, of course,
completely irrelevant to the original discussion under this
subject line.  It makes absolutely no difference what ^D from
the terminal really does, or does not do.  What is important,
is that stdio returns EOF from a getchar(), fread(), scanf()
or whatever.  Not a zero length read, EOF.  (And as EOF is
actually returned to mean a few other things, there is this
nifty macro "feof" that you can use to verify that this
really was "end of file").

I don't think its at all unreasonable for "end of file" to
be a "sticky" condition,
	Kernighan & Pike got it right in their book.

Finally, it seems that their are two vocal groups of "anti-4.2"
people out there.  There seems to be one group that complains
bitterly about all the "bugs" berkeley introduced, and all the
things that they "broke", and a second group that complains
bitterly about all the "bugs" left in the code, and the things
that weren't done.  What's most amazing is that it seems often
that the most vocal members of each group are the same people.

Rather a double standard - they didn't fix the bugs that make
my life difficult, 'cause I have to fix them to run their code
on my hardware, but they did fix all the bugs I was relying on ...

Can we end this useless discussion now, and allow it to die
until someone else new "discovers" it again (in about a week)?

Robert Elz					decvax!mulga!kre

shannon@sun.uucp (Bill Shannon) (11/24/84)

> Thanks for the example, Bill.  I guess we disagree about what is
> expected here.  The "EOFish" nature of the input is reflected in
> fread()'s short return count; as expected the 0-length read forces
> fread() to return prematurely.  I see no reason for it to "stick"
> at EOF, though.  The programmer certainly can tell that he is at
> EOF from the short count.  Continuing to read the stream is a
> programming error (that happens to work on "ordinary" files,
> unless they are being dynamically appended to), and more than a bit
> sloppy besides (just like the internals of most UNIX utilities).

If you think of fread as the stdio equivalent of read, and you are
prepared to handle input from a terminal, you will not think of a
short return count as cause for alarm.  Certainly the manual page
gave you no reason to think otherwise.

Also, the manual said fread would return NULL on EOF.  I've clearly
presented an example where it did not return NULL on EOF.  We
considered that a bug, in the manual or in the code, and we chose the
code.  The System V Release 2 manual page for fread has been rewritten
so that it corresponds to what the (non-4.2) code actually does.

This is just another example of the inconsistencies between the UNIX
manuals and the code.  One group chose to fix the code ("people have
been programming according to the manual") while another chose to
fix the manual ("no one reads the manuals anyway, the CODE defines
UNIX").

> I see the argument for the other interpretation; I just don't
> agree with it.

The only good argument against the change was compatibility.  That
may be a strong enough reason to change it back, now that AT&T has
clarified the operation of fread.

					Bill Shannon
					Sun Microsystems

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/25/84)

> >Contrary to popular misconception, ^D is NOT an "EOF" character;
> >rather, it marks a delimiter for input canonicalization.
> 
> Contrary to popular misconception, neither the design of the Unix kernel nor
> its documentation was handed down on stone tablets from on high.  ...
> 
> We need a way to indicate "end of data" to a program reading TTY input.  ...

Geoff, I think you missed the point:  ^D (or whatever) from a terminal
DOES act like EOF if there is nothing between the previous delimiter
and this one, since read() will return a count of 0 on that record.
But I have made good use of the more general behavior of ^D in forcing
non-newline terminated input to the reading process.

The only reason repeated reading of an ordinary (disk) file keeps
returning 0 bytes (NOT "EOF"; there is no such thing in UNIX) is
that the file size is static.  If the file is being appended to by
some other process, then continued reading should return data AFTER
the original "end of file".  The same applies to magtape and terminals.
This is not only reasonable, it is quite useful.

I much prefer the thoughtful design of UNIX over the attempts to make
it look "safe and ordinary".

Whatever program you had that required two successive 0-length reads
("EOF" indication, by convention) to detect end of input was simply
WRONG.  (Some old-time Pascal programmers may recognize the problem.)
Instead of trying to change UNIX by reducing its generality, why not
fix the erroneous program.  There is no excuse for such sloppiness.

chris@umcp-cs.UUCP (Chris Torek) (11/26/84)

Doug Gwyn seems to be complaining because 4.2's "sticky EOF" will make
things like

	% cat -u
	foo^D

exit.  Not true!  If you type

	% cat -u
	foo
	^D

(assuming ^D is your EOF character) *then* cat will exit, but for the
former, it will print "foo" and keep reading.  One more ^D (unless
preceded by other text) will cause it to terminate.
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

geoff@desint.UUCP (Geoff Kuenning) (11/28/84)

In article <6059@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:

>Geoff, I think you missed the point:  ^D (or whatever) from a terminal
>DOES act like EOF if there is nothing between the previous delimiter
>and this one, since read() will return a count of 0 on that record.
>But I have made good use of the more general behavior of ^D in forcing
>non-newline terminated input to the reading process.

No, I didn't miss the point.  I happen to think that the stdio package
should not embed features of UN*X into its assumptions about operating
systems.  There are lots more operating systems with "hard" EOF's than with
soft ones.  If you want to make use of the general behavior of ^D, write your
program to either use clearerr() or use UNIX I/O.

>The only reason repeated reading of an ordinary (disk) file keeps
>returning 0 bytes (NOT "EOF"; there is no such thing in UNIX) is
>that the file size is static.  If the file is being appended to by
>some other process, then continued reading should return data AFTER
>the original "end of file".  The same applies to magtape and terminals.
>This is not only reasonable, it is quite useful.

As to UNIX having EOF, try looking in read(2).  In any case stdio is not UNIX,
and grepping stdio.h for EOF will succeed.  The features you talk to about
(simultaneous read and write, for example) are not available on all OS's.

>I much prefer the thoughtful design of UNIX over the attempts to make
>it look "safe and ordinary".

Good, then use read(2) and write(2).  Stdio is explicitly a compatibility
package.  As such, it _s_h_o_u_l_d be safe and ordinary.

>Whatever program you had that required two successive 0-length reads
>("EOF" indication, by convention) to detect end of input was simply
>WRONG.  (Some old-time Pascal programmers may recognize the problem.)
>Instead of trying to change UNIX by reducing its generality, why not
>fix the erroneous program.  There is no excuse for such sloppiness.

I can do without the snottiness, Doug.  I mentioned in my original posting
that I had already fixed the bug.  My point was that an inconsistency in the
way stdio (_n_o_t UNIX) is implemented caused the program to behave strangely
only when reading from a terminal.

It is unfortunate that this bug stayed in the system for so long that some
people mistook it for a feature.  But that doesn't mean we shouldn't fix it.
-- 

	Geoff Kuenning
	...!ihnp4!trwrb!desint!geoff

bruce@ISM780.UUCP (11/29/84)

>  Bill Shannon says in part:
>  ... Try this program on your favorite version of stdio:
>
>  #include <stdio.h>
>  char    buf[256];
>  main()
>  {
>          register int n;
>          while (n = fread(buf, 1, sizeof buf, stdin))
>                  fwrite(buf, 1, n, stdout);
>          printf("got EOF\n");
>  }
>
>  ... Where ^D is your EOT character.  If the program terminates
>  when you type ^D then your stdio works properly. ...
>
>                                         ... If you have a fix to
>  fread (filbuf, actually) that both fixes this bug and avoids the
>  incompatibility then please send it to me and/or post it to the
>  net.  If this works properly in System V I would be interested to
>  hear that as well.

I believe your test program doesn't produce the desired results because
it's buggy, not stdio. Try the following on a system that doesn't have a
buggered fread(), notice the call to feof I've inserted:

#include <stdio.h>
char    buf[256];
main()
{
	register int n;
	while (!feof(stdin) && (n = fread(buf, 1, sizeof buf, stdin)))
		fwrite(buf, 1, n, stdout);
	printf("got EOF\n");
}

I've tested this on our VAX IS/3 system (System III stdio) and with our
vanilla SystemV stdio. Both versions produced the desired (i.e., correct)
behaviour.

Bruce Adler             {sdcrdcf,uscvax,ucla-vax,vortex}!ism780!bruce
Interactive Systems     decvax!yale-co!ima!bruce

jim@ISM780B.UUCP (11/29/84)

>Try this program on your favorite version of stdio:
>
>#include <stdio.h>
>
>char    buf[256];
>
>main()
>{
>        register int n;
>
>        while (n = fread(buf, 1, sizeof buf, stdin))
>                fwrite(buf, 1, n, stdout);
>        printf("got EOF\n");
>}
>
>Run it and type (e.g.):
>
>testing 1 2 3
>^D
>another test

fread() is not read().  Read() from a terminal is delimited by the newline
character, so that an EOF is always determined by a read that returns 0.
No such guarantee is offered by fread; show me the manual page for fread
that says that 0 is returned upon EOF!  Had you used an fgets or getc loop,
the documentation states that NULL (fgets) or EOF (getc) indicates EOF on
the stream, and you can depend on that.  All you can depend on with
fread is feof().  Thus your program is wrong, and rather than fix it you
broke the library.

>It seemed like the right thing to
>do; the incompatibility was unfortunate.

That is a pretty clear statement of BSD philosophy; it causes some problems.

>>fread() returns 0 if there are 0 characters left in the terminal
>>input queue when the ^D is typed.  What would you have it do?
>The problem is if you type 'foo^D' with no newline.  You would expect
>that this would terminate input reading, but it does not -- you must
>type another ^D to finish it off.

As an experienced UNIX user who has read tty(4) [termio(7) in SysV],
I certainly would not expect that.

>>Contrary to popular misconception, ^D is NOT an "EOF" character;
>>rather, it marks a delimiter for input canonicalization.  If all
>>previous input has been consumed and a ^D is typed, then read()
>>returns a count of 0.  This is often interpreted as EOF.  If there
>>is some uncanonicalized input and ^D is typed, it acts much like
>>NEWLINE except of course no \n is appended.
>>
>This is, of course, a matter of opinion, but all the documentation
>states that ^D is the *end-of-file* character.  Perhaps the
>documentation (unchanged since my memory) is "buggy"?

It of course *is not* a matter of opinion, and while the documentation
calls ^D the EOF character, the formal behavior described in the documentation
is less naive than the name:

EOF     (Control-d or ASCII EOT) may be used to generate an end-of-file from
	a terminal.  When received, all the characters waiting to be read are
	immediately passed to the program, without waiting for a new-line, and
	the EOF is discarded.  Thus, if there are no characters waiting, which
	is to say *the EOF occurred at the beginning of a line*, zero
	characters will be passed back, which is the standard end-of-file
	indication.

(That is the >=SysIII text; the BSD text merely says that newline or ^D
terminate a line being read in cooked mode; nothing anywhere says
that simply entering a ^D will cause an end-of-file indication anywhere).

When discussing fine points of documentation, it is more accurate and less
embarrassing to use your eyeballs, not your memory.  When something is
claimed to be a popular misconception, you should not be so arrogant as
to assume that you are not subject to such misconceptions without
verifying it.

>>If the 4.2BSD fread() was buggy, it should have been fixed rather
>>than introducing a significant incompatibility with other STDIOs.
>This bug is in ALL versions of fread (and getchar, and ...) *except*
>4.2.

Do you consider it a bug to be able to read() from a terminal after getting
an end-of-file indication?  The behavior of fread was consistent with the
documentation.  Changing it, whether desirable or not, is a change in
functionality.  A change can only be considered a bug fix if it brings into
line behavior previously out of line with the documentation.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

jim@ISM780B.UUCP (11/29/84)

>It is obvious from this, that no distinction is made of EOT chars meaning
>anything but the absolute end-of-file on TTY.

Quite wrong.  As you quoted, EOT terminates an input line, and *if that line
is empty*, the return value is zero, and the *zero return value from read*
is interpreted as EOF, not receipt of EOT.  But that is not really relevant
to fread.  What is relevant is that the current call to fread stops when it
encounters EOF; NOWHERE DOES IT SAY THAT FREAD RETURNS ZERO UPON END-OF-FILE.
It doesn't unless an EOF is encountered when fread tries to read its first
byte.  Nowhere does it say that EOF latches, any more than it does for read.
It is the nature of UNIX terminals that you can read past the EOF;
that is why fread behaved as it does, totally consistent with the
documentation.  To quote Bill Shannon,

"The bug was that without this change programs using fread on terminals
would never report an EOF condition to the user because internally fread
would just swallow the EOF and return a short record and the next fread
would go on reading past the EOF."

But that is exactly what fread should do: return a short record
(as documented; read returns 0, as documented) and go on reading
past the EOF on the next fread (just as read goes on reading beyond
EOF).  Only improperly written programs that erroneously assume that
fread signals EOF with a zero return value (it doesn't;
it isn't documented to) have the "bug".

read is different from fread because it is delimited by newline, so that
EOT at the beginning of a line always causes a zero return, because the
first character of a line must be the first character read (although not
necessarily vice versa); this simply isn't true of fread.
For routines for which it is true, such as fgets or getc, then the return
value can be used to detect EOF.

You cannot show that the SysV fread is wrong by quoting the *read*
documentation, especially without understanding why the two are different.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

guy@rlgvax.UUCP (Guy Harris) (11/30/84)

> It is the nature of UNIX terminals that you can read past the EOF;

It isn't just the nature of UNIX terminals.  Some DEC OSes use the same
behavior; EDT terminates input mode with a ^Z, their EOF character.
Actually, quoting the VMS manuals, "CTRL/Z - Echoes ^Z when CTRL/Z is
typed as a *read terminator*.  *By convention*, CTRL/Z constitutes
end-of-file."  This implies (although it may not be the case) that ^Z
works in VMS exactly like ^D does in UNIX.  This is worth pointing out,
since it was stated in an earlier article that there are more OSes with
"hard" rather than "soft" EOFs.  I hope their EOFs aren't too hard; most
systems I've seen will let you type in a bunch of text as input to a program
and type your favorite EOF character and end input to that program without
ending input to all programs that run from that terminal during that session.
(Admittedly, most systems I've seen are either UNIX or DEC OSes.)

By the way, I saw a later version of "stdio" for 4.2 that looked like
it had the change rescinded; was this the case?  (In which case, a lot of
this discussion is somewhat moot.)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

ka@cbosgd.UUCP (Kenneth Almquist) (11/30/84)

> The fact that some programs have in the past misinterpreted this bug as a
> feature and made use of it is unfortunate, but something we will have to
> live with.

Arggh!

The 4.2 BSD manual page for getchar states that, "These functions return
the integer constant EOF at end of file...."  Now for a standard UNIX
file, the end of file is the location immediately above the last byte
written.  Thus if getchar returns EOF, something is appended to the
input file, and getchar is called again, getchar should not return EOF
because the file pointer is no longer at end of file.

The fact that the 4.2 BSD implementation of getchar handles EOF differ-
ently not only from all other variants of UNIX, but also from the way
its own documentation says it should handle EOF, is indeed "unfortunate,
but something we will have to live with."

If we may believe Bill Shannon, the whole issue arose because nobody
could figure out how to make a straightforward change to fread.  The
change could have been implemented as follows:

1)  Add a new flag named _EOF_PUSHED_BACK to stdio.h.
2)  When _filbuf is called with this flag set, have it clear the flag
    and return EOF.
3)  While a new routine called pushback which is just like ungetc except
    that pushback(EOF, fp) should set _EOF_PUSHED_BACK and return.
4)  When fread encounter EOF and it has read at least one item, have it
    call pushback(EOF, fp) before returning.

				Kenneth Almquist

thomas@utah-gr.UUCP (Spencer W. Thomas) (11/30/84)

In article <266@rlgvax.UUCP> guy@rlgvax.UUCP (Guy Harris) writes:
>It isn't just the nature of UNIX terminals.  Some DEC OSes use the same
>behavior; EDT terminates input mode with a ^Z, their EOF character.
>Actually, quoting the VMS manuals, "CTRL/Z - Echoes ^Z when CTRL/Z is
>typed as a *read terminator*.  *By convention*, CTRL/Z constitutes
>end-of-file."  This implies (although it may not be the case) that ^Z
>works in VMS exactly like ^D does in UNIX.  

Well, I don't know about VMS, but in TOPS-20, if you type ^Z, you see
the EOF, even if you type it in the middle of the line (unlike ^D on
Unix).  Personally, no matter what the manual says about "terminating
input" on ^D, and so on, I find that it is very confusing to naive users
that they must SOMETIMES type ^D twice, but other times, typing it once
suffices.  Just because you have gotten used to the behaviour, doesn't
mean it's right.

=Spencer

shannon@sun.uucp (Bill Shannon) (12/02/84)

Jim Balter says,
    "show me the manual page for fread that says that 0 is returned upon EOF!"

Here's an excerpt from the 4.2BSD man page for fread, V7 is identical:

DESCRIPTION
     Fread reads, into a block beginning at ptr, nitems  of  data
     of the type of *ptr from the named input stream.  It returns
     the number of items actually read.

	. . .

DIAGNOSTICS
     Fread and fwrite return 0 upon end of file or error.

He also says,
    "A change can only be considered a bug fix if it brings into
    line behavior previously out of line with the documentation."

Thank you, Jim, for justifying our change.  It seems apparent from your
argument that it was System III/V that did the wrong thing.

					Bill Shannon
					Sun Microsystems, Inc.

lepreau@utah-cs.UUCP (Jay Lepreau) (12/02/84)

Jim@ISM780B states in two separate articles:
> show me the manual page for fread that says that 0 is returned upon EOF! 
> ...
> When discussing fine points of documentation, it is more accurate and less
> embarrassing to use your eyeballs, not your memory....
> you should not be so arrogant as to assume that you are not subject to
> such misconceptions without verifying it.
> ...
> NOWHERE DOES IT SAY THAT FREAD RETURNS ZERO UPON END-OF-FILE.

Taking Jim's own humble advice on the use of eyeballs and arrogance
I found in the v7 manual under fread(3):
	DIAGNOSTICS
		Fread and fwrite return 0 upon end of file or error.
And so does the 4.2 manual, which is derived from 32v which is derived
from v7.  Now, in Sys V (and Sys 3?), rather than change the code to
fix a bug they changed the documentation and removed that sentence.
Sun and UCB chose to fix the code.  Fine.  Arguments can be made both
ways.  (Of course I have strong opinions as to which is preferable.)

However, the so-called issue of whether or not ^D other than at the
beginning of a line should mean EOF is a straw man, and is not at issue
(or shouldn't be, anyway).  In any case it is orthogonal to the issue of
sticky-eof on ttys, and is just muddying the waters.  It's about as
germane and likely to change as adding ^^ or BCD to C.

Jay Lepreau

geoff@desint.UUCP (Geoff Kuenning) (12/03/84)

In article <528@cbosgd.UUCP> ka@cbosgd.UUCP (Kenneth Almquist) writes:

>If we may believe Bill Shannon, the whole issue arose because nobody
>could figure out how to make a straightforward change to fread.

Bill said quite explicitly that the change arose because they wanted to make
the behavior of fread consistent.  I am sure that Bill is capable of coming
up with the push_back_eof algorithm all by his little old self -- if, after
considering the design aspects of the situation, he decides that is the
behavior he wants.

If you intend to write portable software, don't assume you can continue
reading from a terminal after EOF.  For my money, I would much rather pay
a small backwards-compatibility price to achieve a stdio implementation that
was truly portable.

In any case, most programs that expect to get more than one EOF from a
terminal are broken, because you will get different results if you redirect
from a file.  Sure, there are special exceptions like slowread (aka tail -f
aka tra), but let's be honest, folks -- of all the files you access in a
day, how many do you access while they are growing?  Normally, you make use
of existing, non-growing files, and a program expecting two EOF's from a
terminal will always get a null second file if it is redirected.
-- 

	Geoff Kuenning
	...!ihnp4!trwrb!desint!geoff

henry@utzoo.UUCP (Henry Spencer) (12/05/84)

> ...
> Here's an excerpt from the 4.2BSD man page for fread, V7 is identical:
> 
> DIAGNOSTICS
>      Fread and fwrite return 0 upon end of file or error.

Not just a short count, mind you, but 0.

bsa@ncoast.UUCP (Brandon Allbery) (12/05/84)

The Plexus manuals have an entry for a command (I forget the name and
I'm 25 miles or so away from the manuals at the moment :-) that works
like cat except that EOF it sleeps for some user-specified amount of
time and then tries to read to the next EOF, so on forever.  This is
for ORDINARY FILES, mind you (i.e. redirected output from make; I'd like
to see that option); if an ordinary file can be so handled, why should
a terminal be any different?  Especially since the terminal works that
way anyway???  (About you DECcies:  I remember a problem on a DEC 20/60
that forced a shutdown because the program was looking for hardware EOF
on a terminal.  I don't expect to EVER see that on a Unix system.  If
that bug exists in TOPS-20, why not other nonsensical bugs -- and I choose
to treat sticky EOF as a bug, given that a terminal doesn't sticky EOF
at all, in reality.

I give you 3 choices:

1) inconsistent file handling.  What sticky EOF is in 4.2bsd, what it
is on any system that treats magtape EOFs as not absolute (most, I think)
EXCEPT standard Unix.  And if you do that to Unix, you lose the whole
argument for Unix because files are *no longer* always identical in the
view of the program.  In fact, I don't think the result can be CALLED
Unix.

2) consistent file handling with sticky EOF.  And how do you propose
to make compatible magtapes?

3) consistent file handling with NON-sticky EOF.  What most Unix versions
do.  Thus working nicely with magtapes and terminals; and also useful
in examining dynamic files like the running output of make (or
/usr/spool/uucp/LOGFILE :-)

--bsa
-- 
  Brandon Allbery @ North Coast Xenix  |   the.world!ucbvax!decvax!cwruecmp!
6504 Chestnut Road, Independence, Ohio |       {atvax!}ncoast!{tdi1!}bsa
   (216) 524-1416             \ 44131  | E1439@CSUOHIO.BITNET (friend's acct.)
				       |    BALLBERY (161-7070) on MCI Mail
---------------------------------------+---------------------------------------
	      Keeping the Galaxies safe for Civilization... :-)

ka@cbosgd.UUCP (Kenneth Almquist) (12/08/84)

>>If we may believe Bill Shannon, the whole issue arose because nobody
>>could figure out how to make a straightforward change to fread.
>
>Bill said quite explicitly that the change arose because they wanted to make
>the behavior of fread consistent.

That's what I said he said.

>I am sure that Bill is capable of coming
>up with the push_back_eof algorithm all by his little old self -- if, after
>considering the design aspects of the situation, he decides that is the
>behavior he wants.

You already stated the behavior that Bill wanted:  he wanted to make the
behavior of fread match the description in the manual page.  He did not
want to change the behavior of any other functions.

Of course Bill is capable of coming up with the push_back_eof algorithm
himself, but as it happens he did't.  He asked in his posting how fread
could be made to correspond to the manual page description of it without
changing getc, and I answered him.  None of this is intended as an attack
on Bill--any programmer is entitled to an occasional slip--but I wonder why
it wasn't caught before the release of 4.2.

>If you intend to write portable software, don't assume you can continue
>reading from a terminal after EOF.

And if I don't want to write software that is portable to anything other
than another UNIX system?  And anyway, I have never heard of a system that
couldn't support reading on a terminal after EOF.  Such a system would be
a bit awkward to use since every time you typed an EOF at your terminal
all programs, including the command processor, would presumably encounter
and EOF indication and you would be logged out.

>For my money, I would much rather pay
>a small backwards-compatibility price to achieve a stdio implementation that
>was truly portable.

Currently, stdio is not truly portable.  Try to implement fseek on a
non-UNIX system some time.  Stdio does hide differences between various
versions of UNIX and I am not suggesting that that should change.

>In any case, most programs that expect to get more than one EOF from a
>terminal are broken, because you will get different results if you redirect
>from a file.

Horrors, EMACS won't work if you redirect it's input to a file--I guess
we had better throw it out.  Seriously, differences between UNIX variants
create problems for people.  The idea that "they won't break very many
programs" is not a justification.  Obviously nobody would have raised the
issue if no programs were affected.

I can appreciate Bill Shannon's postion on fread, but changing the
functioning of getc is a different issue.
				Kenneth Almquist

chris@umcp-cs.UUCP (Chris Torek) (12/10/84)

> Horrors, EMACS won't work if you redirect it's input to a file--I guess
> we had better throw it out.

*Whose* Emacs won't work?
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/11/84)

> DIAGNOSTICS
>      Fread and fwrite return 0 upon end of file or error.

Yes, it used to say that.  It failed to clarify what would happen on
a "short read".  Returning 0 in such a case is absolutely incompatible
with the DESIGN of fread(): that you tell it how many items you want
and it returns the number you got.  Clearly the above portion of the
manual was not well thought out, since it led to this discussion.
AT&T has clarified this (and many similar oversights, ambiguities, and
confusion) in the manual; and in this particular case I think they did
it right (since it makes the function design make more sense than the
other interpretation).

geoff@desint.UUCP (12/17/84)

In article <560@cbosgd.UUCP> ka@cbosgd.UUCP (Kenneth Almquist) writes:

>And if I don't want to write software that is portable to anything other
>than another UNIX system?  And anyway, I have never heard of a system that
>couldn't support reading on a terminal after EOF.  Such a system would be
>a bit awkward to use since every time you typed an EOF at your terminal
>all programs, including the command processor, would presumably encounter
>and EOF indication and you would be logged out.

If you want to write non-portable software, use UNIX system calls.  They
handle EOF in the UNIX way.

Just because you haven't heard of an operating system that has hard EOF's
doesn't mean one doesn't exist.  Your presumption about logouts shows
a strong UNIX prejudice.  *Very* few operating systems interpret EOF's to the
command processer as a logout indication.  Furthermore, many operating
systems put the command processor in the kernel, so that an EOF delivered
to a user program is not at all the same as an EOF given to the command
processor.  Indeed, this is frequently part of the reason they have "hard"
EOF's.  (No, I don't like this design either -- shells should be user
processes.  But such systems do exist.)

>Seriously, differences between UNIX variants
>create problems for people.  The idea that "they won't break very many
>programs" is not a justification.  Obviously nobody would have raised the
>issue if no programs were affected.

Yup, catching up with the real world is frequently painful.  Check out the
heat that has risen over 6-character externals in the draft ANSI standard.
But in that case and this one, I would rather bite the bullet and do it the
way that will make life easier in the future.

BTW, I have an editor that is very similar to EMACS, and it does not object
at all if its descriptors are redirected to files.  I added the feature
because I had a need for it.
-- 

	Geoff Kuenning
	...!ihnp4!trwrb!desint!geoff

bsa@ncoast.UUCP (12/18/84)

> Article <6535@brl-tgr.ARPA>, from henry@utzoo.uucp
+----------------
| > Here's an excerpt from the 4.2BSD man page for fread, V7 is identical:
| > 
| > DIAGNOSTICS
| >      Fread and fwrite return 0 upon end of file or error.
| 
| Not just a short count, mind you, but 0.

Which is wrong.  If you request 7 characters and it reads 4 before EOF,
you've either lost 4 characters or gotten 3 garbage characters, depending
on what fread returns and how your program deals with EOF.  Sounds to
me like fread is wrong from square one.

--bsa
-- 
  Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business)
6504 Chestnut Road, Independence, Ohio 44131   (216) 524-1416
<<<<<< An equal opportunity employer: I both create and destroy bugs :-) >>>>>>