[comp.unix.wizards] stdio EOF

chris@mimsy.UUCP (Chris Torek) (09/07/88)

In article <8422@smoke.ARPA> gwyn@smoke.ARPA (Doug Gwyn ) writes:
>... In fact [stdio] EOF should not be "sticky"; if more data becomes
>available, as on a terminal, it should be available for subsequent
>reading.  The 4.2BSD implementation broke this but it might be okay
>on 4.3BSD.

I thought this behaviour was added to 4.2BSD to conform to some
existing standard.

What does the dpANS say?  POSIX?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.ARPA (Doug Gwyn ) (09/08/88)

In article <13427@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <8422@smoke.ARPA> gwyn@smoke.ARPA (Doug Gwyn ) writes:
>>... In fact [stdio] EOF should not be "sticky"; if more data becomes
>>available, as on a terminal, it should be available for subsequent
>>reading.  The 4.2BSD implementation broke this but it might be okay
>>on 4.3BSD.
>I thought this behaviour was added to 4.2BSD to conform to some
>existing standard.

No; it was added because Bill Shannon thought it was a good idea.
I noticed it because it broke several interesting applications.

>What does the dpANS say?  POSIX?

Remember that the C dpANS does not address multitasking issues
(where a file can grow due to other concurrent processes), nor
does it specify much about "terminal" device behavior.  I recall
the 4.2BSD sticky-EOF behavior coming up in dicsussion and not
finding any demurrers when it was labeled "bogus", but I also
doubt that it is explictly ruled "nonconforming".

I don't remember seeing this specific issue addressed by 1003.1.

ka@june.cs.washington.edu (Kenneth Almquist) (09/11/88)

In article <13427@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <8422@smoke.ARPA> gwyn@smoke.ARPA (Doug Gwyn ) writes:
>> ... In fact [stdio] EOF should not be "sticky"; if more data becomes
>> available, as on a terminal, it should be available for subsequent
>> reading.  The 4.2BSD implementation broke this but it might be okay
>> on 4.3BSD.
> 
> I thought this behaviour was added to 4.2BSD to conform to some
> existing standard.

Berkeley conform to an existing standard?  You must be kidding.

The story I read on the net a few years ago is that Berkeley made this
change to fix a problem with fread.  The problem is that the fread
documentation contradicts itself, stating both that, "fread returns
the number of items actually read," and "fread returns 0 on end of
file or error."  What should fread do when its caller requests three
items, but fread encounters and end of file after reading only two?
The first sentence claims it should return two (the number of items
read), while the second claims it should return zero (because end of
file was encountered).

Berkeley interpreted the documentation as indicating that fread should
return two, but should then return zero on the next call.  The obvious
way to implement this would be to have fread do an ungetc on the EOF
so that the next time it was called it would immediately read an EOF
and return zero.  However, ungetc does not allow an EOF to be pushed
back onto the input.  This deficiency of ungetc is (in my view) the
biggest flaw in the design of the stdio library, and it makes it
impossible to implement scanf correctly, so Berkeley would have done
the world a favor by extending the stdio library to allow EOF to be
pushed back.

Instead, they chose a simpler approach:  make getc always return EOF
when the eof or error flags are set.  This approach allowed them to
fix the fread problem by writing only a couple of lines of code, but
it also broke getc.  In 4.2 BSD the behavior of getc is a bug since it
disagrees with the documentation.  In 4.3 BSD, Berkeley modified the
documentation to agree with the code.  ("It's not a bug, it's a feature!")

By the way, AT&T also noticed the contradiction in the fread documentation.
They fixed the documentation so that it clearly reflected the behavior
of the code.  This seems like a better approach since modifying the code
to agree with the documentation doesn't make much sense when the meaning
of the documentation is so unclear.  In any case, AT&T's approach, unlike
Berkeley's, didn't break working code.

> What does the dpANS say?  POSIX?

I don't know, and how they resolve this issue is less important than
that the issue is resolved.  The standard I/O library is supposed to be
*standard*; that's the whole point of it.  There are, however, several
reasons why they should prefer Dennis Ritchie's original definition of
getc over Berkeley's:

1.  Ritchie's definition has seniority.  Berkeley's gratuitous change to
    getc was not made until 4.2 BSD and was not documented until 4.3 BSD.
    All other versions of UN*X use Ritchie's definition.

2.  Aesthetics.  Ritchie's definition can be stated in seven words:  Return
    EOF when at end of file.

3.  Authority.  If anyone's opinion should be respected when setting UN*X
    standards, Ritchie's should be.
					Kenneth Almquist

-- 
And there shall come among you false prophets, who will corrupt my
teachings and teach that EOF should be sticky....

shannon%datsun@Sun.COM (Bill Shannon) (09/14/88)

In article <8459@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes:
> No; it was added because Bill Shannon thought it was a good idea.
> I noticed it because it broke several interesting applications.

What you didn't notice is that it fixed many programs that were
broken.  We weren't able to come up with a fix which maintained
compatibility with all programs which *happened* to work and with the
documentation, and which also fixed the programs that were broken.  If
you have such a fix and haven't already told us about it, please do so.

gwyn@smoke.ARPA (Doug Gwyn ) (09/15/88)

In article <68214@sun.uucp> shannon%datsun@Sun.COM (Bill Shannon) writes:
[re. sticky EOF]
>What you didn't notice is that it fixed many programs that were broken.

It's certainly true that there were several programs found on UNIX systems
that "read the EOF" more than once, and thus would fail miserably on files
other than static fixed-length files, most notably when reading from ttys.
Because "EOF" is not an official UNIX notion (really it is "0 bytes read")
and because this is often a transient condition, I much prefer to fix the
applications that made the bogus assumption and leave the library alone.

shannon%datsun@Sun.COM (Bill Shannon) (09/15/88)

In article <8495@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes:
> I much prefer to fix the
> applications that made the bogus assumption and leave the library alone.

based on the documentation available at the time, it wasn't clear
*which* applications were making bogus assumptions.  all we knew is
that we had a set of applications that were making incompatible
assumptions.  based on our reading of the documentation that was
available, we made the change that you're complaining about.  needless
to say, we did not make this change in isolation, we consulted with
several other experienced UNIX developers and they agreed with us.
I'm sorry we didn't consult you.

I'd be happy to let POSIX or ANSI C or whatever tell us what the
right answer is, but at this point your answer is only different.