goldfarb@ucf-cs.UUCP (Ben Goldfarb) (11/08/84)
[] Why did Berkeley change stdio such that typing ^D (or whatever EOF character one is using) on stdin causes that stream to reflect eof until a clearerr() is done? Has this been discussed here before? If so, I apologize for further belaboring the issue. In any case, what is the correct approach to this problem? Obviously, we can't expect the authors of programs that have been distributed with UNIX since V7 to have provided for Berkeley's change; as it stands I've found that addbib and learn are both broken because of the continual EOF. So I patched if (feof(stdin)) clearerr(stdin); into both programs. I'm sure more are affected. Alternatively, I could have "fixed" stdio, but how many Berkeley programs make use of this "feature?" I'd appreciate some net wisdom on the subject. -- Ben Goldfarb University of Central Florida uucp: {duke,decvax,princeton}!ucf-cs!goldfarb ARPA: goldfarb.ucf-cs@csnet.relay csnet: goldfarb@ucf
geoff@desint.UUCP (Geoff Kuenning) (11/13/84)
In article <1697@ucf-cs.UUCP> goldfarb@ucf-cs.UUCP (Ben Goldfarb) writes: >Why did Berkeley change stdio such that typing ^D (or whatever EOF character >one is using) on stdin causes that stream to reflect eof until a clearerr() >is done? Has this been discussed here before? If so, I apologize for >further belaboring the issue. > >In any case, what is the correct approach to this problem? We did this when I was at DEC because that's the way a file behaves, and it is frequently easier to write a program to read the EOF twice. For example: while ((ch = getchar ()) != EOF) switch (ch) { case '\\': switch (ch = getchar ()) { case EOF: break; } break; } Here, reading the EOF twice is a convenient way to handle the loop exit. (Yes, there are other ways, notably using a goto. But in more complex code this approach may be the cleanest). I never like assuming that I can unget an EOF character (although it works on some systems). One can also make a persuasive argument for the advantages of the other approach, but I prefer this way because of consistency. -- Geoff Kuenning First Systems Corporation ...!ihnp4!trwrb!desint!geoff
shannon@sun.uucp (Bill Shannon) (11/16/84)
Ben Goldfarb writes, > Why did Berkeley change stdio such that typing ^D (or whatever EOF character > one is using) on stdin causes that stream to reflect eof until a clearerr() > is done? Has this been discussed here before? If so, I apologize for > further belaboring the issue. > > In any case, what is the correct approach to this problem? Obviously, we > can't expect the authors of programs that have been distributed with UNIX > since V7 to have provided for Berkeley's change; as it stands I've found > that addbib and learn are both broken because of the continual EOF. So I > patched > if (feof(stdin)) > clearerr(stdin); > into both programs. I'm sure more are affected. Alternatively, I could have > "fixed" stdio, but how many Berkeley programs make use of this "feature?" > I'd appreciate some net wisdom on the subject. The change was made by Sun and bought back by Berkeley. I believe this has been discussed on the net before. The change actually fixes another bug. The bug was that without this change programs using fread on terminals would never report an EOF condition to the user because internally fread would just swallow the EOF and return a short record and the next fread would go on reading past the EOF. We actually ran into this bug in some existing program, I forget which one. Unfortunately, not all the programs which depended on the old behaviour were fixed. Bill Shannon Sun Microsystems, Inc.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/18/84)
> The change was made by Sun and bought back by Berkeley. I believe this > has been discussed on the net before. The change actually fixes another > bug. The bug was that without this change programs using fread on terminals > would never report an EOF condition to the user because internally fread > would just swallow the EOF and return a short record and the next fread > would go on reading past the EOF. We actually ran into this bug in some > existing program, I forget which one. Unfortunately, not all the programs > which depended on the old behaviour were fixed. fread() returns 0 if there are 0 characters left in the terminal input queue when the ^D is typed. What would you have it do? Contrary to popular misconception, ^D is NOT an "EOF" character; rather, it marks a delimiter for input canonicalization. If all previous input has been consumed and a ^D is typed, then read() returns a count of 0. This is often interpreted as EOF. If there is some uncanonicalized input and ^D is typed, it acts much like NEWLINE except of course no \n is appended. If the 4.2BSD fread() was buggy, it should have been fixed rather than introducing a significant incompatibility with other STDIOs.
thomas@utah-gr.UUCP (Spencer W. Thomas) (11/19/84)
In article <5867@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes: >fread() returns 0 if there are 0 characters left in the terminal >input queue when the ^D is typed. What would you have it do? The problem is if you type 'foo^D' with no newline. You would expect that this would terminate input reading, but it does not -- you must type another ^D to finish it off. > >Contrary to popular misconception, ^D is NOT an "EOF" character; >rather, it marks a delimiter for input canonicalization. If all >previous input has been consumed and a ^D is typed, then read() >returns a count of 0. This is often interpreted as EOF. If there >is some uncanonicalized input and ^D is typed, it acts much like >NEWLINE except of course no \n is appended. > This is, of course, a matter of opinion, but all the documentation states that ^D is the *end-of-file* character. Perhaps the documentation (unchanged since my memory) is "buggy"? >If the 4.2BSD fread() was buggy, it should have been fixed rather >than introducing a significant incompatibility with other STDIOs. This bug is in ALL versions of fread (and getchar, and ...) *except* 4.2. =Spencer
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/19/84)
> The problem is if you type 'foo^D' with no newline. You would expect > that this would terminate input reading, but it does not -- you must > type another ^D to finish it off. This is just what I expect. Why should the first ^D terminate input reading, since the read will return 3 characters at that point? > This is, of course, a matter of opinion, but all the documentation > states that ^D is the *end-of-file* character. Perhaps the > documentation (unchanged since my memory) is "buggy"? Yup. Kernighan & Pike got it right in their book. > This bug is in ALL versions of fread (and getchar, and ...) *except* > 4.2. The UNIX System V Release 2.0 fread() acts as I originally described, which is what I would expect. In any case, judging from the number of times people have had problems caused by this change, it was not a wise move.
shannon@sun.uucp (Bill Shannon) (11/20/84)
> fread() returns 0 if there are 0 characters left in the terminal > input queue when the ^D is typed. What would you have it do? Try this program on your favorite version of stdio: #include <stdio.h> char buf[256]; main() { register int n; while (n = fread(buf, 1, sizeof buf, stdin)) fwrite(buf, 1, n, stdout); printf("got EOF\n"); } Run it and type (e.g.): testing 1 2 3 ^D another test Where ^D is your EOT character. If the program terminates when you type ^D then your stdio works properly. The 4.1 version of stdio would "eat" the ^D and echo the first and third lines. It would only terminate if you typed ^D twice in a row. > If the 4.2BSD fread() was buggy, it should have been fixed rather > than introducing a significant incompatibility with other STDIOs. Making EOF sticky was the fix. It seemed like the right thing to do; the incompatibility was unfortunate. If you have a fix to fread (filbuf, actually) that both fixes this bug and avoids the incompatibility then please send it to me and/or post it to the net. If this works properly in System V I would be interested to hear that as well. Bill Shannon Sun Microsystems, Inc.
Ron Natalie <ron@BRL-TGR> (11/20/84)
Doug: Looking in your beloved System V manuals you will find under READ(2): A value of zero is returned when end-of-file has been reached. and When attempting to read a file associated with a tty that has no data currently available ... the read will block until the data becomes available. And then looking at the documentation for the TTY driver, where is it oh yes, it's called TERMIO and it's in the system administrators manual. Of course, no ordinary user would ever want to change his terminal modes. A line is delimited by a new-line (ASCII LF), an end-of-file (ASCII-EOT), or an end-of-line character. EOF - may be used to generate an end-of-file from a terminal. Thus if there are no characters waiting, which is to say EOF occurred at the beginning of line, zero characters will be passed back, which is the standard end-of-file indication. What this implies is the zero return from TTY reads are END-OF-FILE and should be treated as such. It is possible to continue reading past end of file on some devices such as TTY and Magtape, but that doesn't mean you shouldn't handle EOF properly. Fread states Fread stops appending bytes if an end-of-file or error condition occurs. Ferror states Feof returns non-zero when EOF has previously been detected reading the named input stream. Clearerr resets the error indicator and EOF indicator to zero. It is obvious from this, that no distinction is made of EOT chars meaning anything but the absolute end-of-file on TTY. If you were attempting to write a Stdio using the definitions in the manual, you would have to implement it this way. You need to stop defining UNIX by whatever bugs AT&T has and penalize Berkeley because they have fixed a legitimate bug in the original UNIX code. -Ron Like I'm from the Mystic Valley.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/20/84)
[The actual manual entries were somewhat different and clearer.] This is not the same as saying that EOT "means" EOF. Only in certain contexts does it have that effect, as the TERMIO(7) manual entry says. > What this implies is the zero return from TTY reads are END-OF-FILE > and should be treated as such. It is possible to continue reading > past end of file on some devices such as TTY and Magtape, but that > doesn't mean you shouldn't handle EOF properly. Agreed. > Fread states > Fread stops appending bytes if an end-of-file or error > condition occurs. And so it does! But this is on that call, not necessarily on future calls. This feature works as advertised on UNIX System V Release 2.0. > Ferror states > Feof returns non-zero when EOF has previously been detected > reading the named input stream. > Clearerr resets the error indicator and EOF indicator to zero. Again, this is the way it does work. The EOF is "latched" until cleared, but fread() can read past EOF if there is data there. I'm all for bugs being fixed, if they are really bugs and not just different ideas about what should be happening. Perhaps 4.2BSD and UNIX System V now agree about EOF behavior; that would be a pleasant change.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/21/84)
> #include <stdio.h> > > char buf[256]; > > main() > { > register int n; > > while (n = fread(buf, 1, sizeof buf, stdin)) > fwrite(buf, 1, n, stdout); > printf("got EOF\n"); > } > > Run it and type (e.g.): > > testing 1 2 3 > ^D > another test > > Where ^D is your EOT character. If the program terminates > when you type ^D then your stdio works properly. The 4.1 > version of stdio would "eat" the ^D and echo the first and > third lines. It would only terminate if you typed ^D twice > in a row. Thanks for the example, Bill. I guess we disagree about what is expected here. The "EOFish" nature of the input is reflected in fread()'s short return count; as expected the 0-length read forces fread() to return prematurely. I see no reason for it to "stick" at EOF, though. The programmer certainly can tell that he is at EOF from the short count. Continuing to read the stream is a programming error (that happens to work on "ordinary" files, unless they are being dynamically appended to), and more than a bit sloppy besides (just like the internals of most UNIX utilities). I see the argument for the other interpretation; I just don't agree with it.
geoff@desint.UUCP (Geoff Kuenning) (11/22/84)
>Contrary to popular misconception, ^D is NOT an "EOF" character; >rather, it marks a delimiter for input canonicalization. If all >previous input has been consumed and a ^D is typed, then read() >returns a count of 0. This is often interpreted as EOF. If there >is some uncanonicalized input and ^D is typed, it acts much like >NEWLINE except of course no \n is appended. -Doug Gwyn Contrary to popular misconception, neither the design of the Unix kernel nor its documentation was handed down on stone tablets from on high. I don't really care whether Thompson and Ritchie chose to describe the behavior of the original Unix TTY driver as "EOF" or "canonicalization". I strongly suspect that their motivation was to describe the behavior of the code they actually wrote, and the code was written for convenient implementation. We need a way to indicate "end of data" to a program reading TTY input. It is convenient for programmers to consider "end of file" as "end of data" when reading file input. Since redirection of stdin is one of Unix's great features, it is thus reasonable to simply provide a way for a TTY to indicate "end of file". If T&R implemented it sloppily and documented it accurately, that is no reason for us to slavishly follow their lead. Once you decide to have ^D truly mean "end of file", it is only reasonable to make it operate like a true EOF. That means that multiple reads return multiple EOF indications, just like a disk. The original implementation can be extremely disconcerting--I had a program a few days ago that wanted two EOF's to terminate. It tested fine from a file, but "hung" when I typed in the input and terminated it with ^D. The fact that some programs have in the past misinterpreted this bug as a feature and made use of it is unfortunate, but something we will have to live with. It is just not that hard to grep for "EOF" and add "clearerr" calls. In any case, any program that was doing this was already providing incompatible behavior between files and TTY's. That's what you get when you special-case TTY input :-). -- Geoff Kuenning First Systems Corporation ...!ihnp4!trwrb!desint!geoff
bsa@ncoast.UUCP (Brandon Allbery) (11/22/84)
TTY(4) XENIX Programmer's Manual TTY(4) . . . EOT (Control-D) may be used to generate an end of file from a terminal. When an EOT is received, all the charac- ters waiting to be read are immediately passed to the program, without waiting for a new-line, and the EOT is discarded. Thus if there are no characters waiting, which is to say the EOT occurred at the beginning of a line, zero characters will be passed back, and this is the standard end-of-file indication. \this is in the system manual; \i'd suggest both you and \berkeley look it up (in a v7 manual if necessary). fread() was \n\o\t designed for terminal \i/\o. --bsa -- Brandon Allbery @ North Coast Xenix | the.world!ucbvax!decvax!cwruecmp! 6504 Chestnut Road, Independence, Ohio | {atvax!}ncoast!{tdi1!}bsa (216) 524-1416 \ 44131 | E1439@CSUOHIO.BITNET (friend's acct.) ---------------------------------------+--------------------------------------- Forgive; we just had a system crash & lost a month's worth of work and patches.
kre@mulga.OZ (Robert Elz) (11/23/84)
From Doug Gwyn (in the last referenced article): | | > This is, of course, a matter of opinion, but all the documentation | > states that ^D is the *end-of-file* character. Perhaps the | > documentation (unchanged since my memory) is "buggy"? | | Yup. Kernighan & Pike got it right in their book. | Rarely does anyone play into my hands quite no nicely. Now that we have K&P cited as the absolute authority on this issue, I will proceed to quote from page 204. Structurally, readslow is identical to cat except that it loops instead of quitting when it encounters the current end of the input. It has to use low-leval I/O because the standard library routines continue to report EOF after the first end of file. This immediately precedes the listing of the "readslow" program, which is the authorised version of "tail -f" according to the gosple of St Pike. I'm not sure which particular version of "the standard library routines" they were referring to - this was written before 4.2 was released. I always assumed that V8 had fixed the bug as well, but I was (not too long ago) told that this was not so. Would you care to clarify rob? The above inclusion (from pervious articles) is, of course, completely irrelevant to the original discussion under this subject line. It makes absolutely no difference what ^D from the terminal really does, or does not do. What is important, is that stdio returns EOF from a getchar(), fread(), scanf() or whatever. Not a zero length read, EOF. (And as EOF is actually returned to mean a few other things, there is this nifty macro "feof" that you can use to verify that this really was "end of file"). I don't think its at all unreasonable for "end of file" to be a "sticky" condition, Kernighan & Pike got it right in their book. Finally, it seems that their are two vocal groups of "anti-4.2" people out there. There seems to be one group that complains bitterly about all the "bugs" berkeley introduced, and all the things that they "broke", and a second group that complains bitterly about all the "bugs" left in the code, and the things that weren't done. What's most amazing is that it seems often that the most vocal members of each group are the same people. Rather a double standard - they didn't fix the bugs that make my life difficult, 'cause I have to fix them to run their code on my hardware, but they did fix all the bugs I was relying on ... Can we end this useless discussion now, and allow it to die until someone else new "discovers" it again (in about a week)? Robert Elz decvax!mulga!kre
shannon@sun.uucp (Bill Shannon) (11/24/84)
> Thanks for the example, Bill. I guess we disagree about what is > expected here. The "EOFish" nature of the input is reflected in > fread()'s short return count; as expected the 0-length read forces > fread() to return prematurely. I see no reason for it to "stick" > at EOF, though. The programmer certainly can tell that he is at > EOF from the short count. Continuing to read the stream is a > programming error (that happens to work on "ordinary" files, > unless they are being dynamically appended to), and more than a bit > sloppy besides (just like the internals of most UNIX utilities). If you think of fread as the stdio equivalent of read, and you are prepared to handle input from a terminal, you will not think of a short return count as cause for alarm. Certainly the manual page gave you no reason to think otherwise. Also, the manual said fread would return NULL on EOF. I've clearly presented an example where it did not return NULL on EOF. We considered that a bug, in the manual or in the code, and we chose the code. The System V Release 2 manual page for fread has been rewritten so that it corresponds to what the (non-4.2) code actually does. This is just another example of the inconsistencies between the UNIX manuals and the code. One group chose to fix the code ("people have been programming according to the manual") while another chose to fix the manual ("no one reads the manuals anyway, the CODE defines UNIX"). > I see the argument for the other interpretation; I just don't > agree with it. The only good argument against the change was compatibility. That may be a strong enough reason to change it back, now that AT&T has clarified the operation of fread. Bill Shannon Sun Microsystems
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/25/84)
> >Contrary to popular misconception, ^D is NOT an "EOF" character; > >rather, it marks a delimiter for input canonicalization. > > Contrary to popular misconception, neither the design of the Unix kernel nor > its documentation was handed down on stone tablets from on high. ... > > We need a way to indicate "end of data" to a program reading TTY input. ... Geoff, I think you missed the point: ^D (or whatever) from a terminal DOES act like EOF if there is nothing between the previous delimiter and this one, since read() will return a count of 0 on that record. But I have made good use of the more general behavior of ^D in forcing non-newline terminated input to the reading process. The only reason repeated reading of an ordinary (disk) file keeps returning 0 bytes (NOT "EOF"; there is no such thing in UNIX) is that the file size is static. If the file is being appended to by some other process, then continued reading should return data AFTER the original "end of file". The same applies to magtape and terminals. This is not only reasonable, it is quite useful. I much prefer the thoughtful design of UNIX over the attempts to make it look "safe and ordinary". Whatever program you had that required two successive 0-length reads ("EOF" indication, by convention) to detect end of input was simply WRONG. (Some old-time Pascal programmers may recognize the problem.) Instead of trying to change UNIX by reducing its generality, why not fix the erroneous program. There is no excuse for such sloppiness.
chris@umcp-cs.UUCP (Chris Torek) (11/26/84)
Doug Gwyn seems to be complaining because 4.2's "sticky EOF" will make things like % cat -u foo^D exit. Not true! If you type % cat -u foo ^D (assuming ^D is your EOF character) *then* cat will exit, but for the former, it will print "foo" and keep reading. One more ^D (unless preceded by other text) will cause it to terminate. -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
geoff@desint.UUCP (Geoff Kuenning) (11/28/84)
In article <6059@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes: >Geoff, I think you missed the point: ^D (or whatever) from a terminal >DOES act like EOF if there is nothing between the previous delimiter >and this one, since read() will return a count of 0 on that record. >But I have made good use of the more general behavior of ^D in forcing >non-newline terminated input to the reading process. No, I didn't miss the point. I happen to think that the stdio package should not embed features of UN*X into its assumptions about operating systems. There are lots more operating systems with "hard" EOF's than with soft ones. If you want to make use of the general behavior of ^D, write your program to either use clearerr() or use UNIX I/O. >The only reason repeated reading of an ordinary (disk) file keeps >returning 0 bytes (NOT "EOF"; there is no such thing in UNIX) is >that the file size is static. If the file is being appended to by >some other process, then continued reading should return data AFTER >the original "end of file". The same applies to magtape and terminals. >This is not only reasonable, it is quite useful. As to UNIX having EOF, try looking in read(2). In any case stdio is not UNIX, and grepping stdio.h for EOF will succeed. The features you talk to about (simultaneous read and write, for example) are not available on all OS's. >I much prefer the thoughtful design of UNIX over the attempts to make >it look "safe and ordinary". Good, then use read(2) and write(2). Stdio is explicitly a compatibility package. As such, it _s_h_o_u_l_d be safe and ordinary. >Whatever program you had that required two successive 0-length reads >("EOF" indication, by convention) to detect end of input was simply >WRONG. (Some old-time Pascal programmers may recognize the problem.) >Instead of trying to change UNIX by reducing its generality, why not >fix the erroneous program. There is no excuse for such sloppiness. I can do without the snottiness, Doug. I mentioned in my original posting that I had already fixed the bug. My point was that an inconsistency in the way stdio (_n_o_t UNIX) is implemented caused the program to behave strangely only when reading from a terminal. It is unfortunate that this bug stayed in the system for so long that some people mistook it for a feature. But that doesn't mean we shouldn't fix it. -- Geoff Kuenning ...!ihnp4!trwrb!desint!geoff
bruce@ISM780.UUCP (11/29/84)
> Bill Shannon says in part: > ... Try this program on your favorite version of stdio: > > #include <stdio.h> > char buf[256]; > main() > { > register int n; > while (n = fread(buf, 1, sizeof buf, stdin)) > fwrite(buf, 1, n, stdout); > printf("got EOF\n"); > } > > ... Where ^D is your EOT character. If the program terminates > when you type ^D then your stdio works properly. ... > > ... If you have a fix to > fread (filbuf, actually) that both fixes this bug and avoids the > incompatibility then please send it to me and/or post it to the > net. If this works properly in System V I would be interested to > hear that as well. I believe your test program doesn't produce the desired results because it's buggy, not stdio. Try the following on a system that doesn't have a buggered fread(), notice the call to feof I've inserted: #include <stdio.h> char buf[256]; main() { register int n; while (!feof(stdin) && (n = fread(buf, 1, sizeof buf, stdin))) fwrite(buf, 1, n, stdout); printf("got EOF\n"); } I've tested this on our VAX IS/3 system (System III stdio) and with our vanilla SystemV stdio. Both versions produced the desired (i.e., correct) behaviour. Bruce Adler {sdcrdcf,uscvax,ucla-vax,vortex}!ism780!bruce Interactive Systems decvax!yale-co!ima!bruce
jim@ISM780B.UUCP (11/29/84)
>Try this program on your favorite version of stdio: > >#include <stdio.h> > >char buf[256]; > >main() >{ > register int n; > > while (n = fread(buf, 1, sizeof buf, stdin)) > fwrite(buf, 1, n, stdout); > printf("got EOF\n"); >} > >Run it and type (e.g.): > >testing 1 2 3 >^D >another test fread() is not read(). Read() from a terminal is delimited by the newline character, so that an EOF is always determined by a read that returns 0. No such guarantee is offered by fread; show me the manual page for fread that says that 0 is returned upon EOF! Had you used an fgets or getc loop, the documentation states that NULL (fgets) or EOF (getc) indicates EOF on the stream, and you can depend on that. All you can depend on with fread is feof(). Thus your program is wrong, and rather than fix it you broke the library. >It seemed like the right thing to >do; the incompatibility was unfortunate. That is a pretty clear statement of BSD philosophy; it causes some problems. >>fread() returns 0 if there are 0 characters left in the terminal >>input queue when the ^D is typed. What would you have it do? >The problem is if you type 'foo^D' with no newline. You would expect >that this would terminate input reading, but it does not -- you must >type another ^D to finish it off. As an experienced UNIX user who has read tty(4) [termio(7) in SysV], I certainly would not expect that. >>Contrary to popular misconception, ^D is NOT an "EOF" character; >>rather, it marks a delimiter for input canonicalization. If all >>previous input has been consumed and a ^D is typed, then read() >>returns a count of 0. This is often interpreted as EOF. If there >>is some uncanonicalized input and ^D is typed, it acts much like >>NEWLINE except of course no \n is appended. >> >This is, of course, a matter of opinion, but all the documentation >states that ^D is the *end-of-file* character. Perhaps the >documentation (unchanged since my memory) is "buggy"? It of course *is not* a matter of opinion, and while the documentation calls ^D the EOF character, the formal behavior described in the documentation is less naive than the name: EOF (Control-d or ASCII EOT) may be used to generate an end-of-file from a terminal. When received, all the characters waiting to be read are immediately passed to the program, without waiting for a new-line, and the EOF is discarded. Thus, if there are no characters waiting, which is to say *the EOF occurred at the beginning of a line*, zero characters will be passed back, which is the standard end-of-file indication. (That is the >=SysIII text; the BSD text merely says that newline or ^D terminate a line being read in cooked mode; nothing anywhere says that simply entering a ^D will cause an end-of-file indication anywhere). When discussing fine points of documentation, it is more accurate and less embarrassing to use your eyeballs, not your memory. When something is claimed to be a popular misconception, you should not be so arrogant as to assume that you are not subject to such misconceptions without verifying it. >>If the 4.2BSD fread() was buggy, it should have been fixed rather >>than introducing a significant incompatibility with other STDIOs. >This bug is in ALL versions of fread (and getchar, and ...) *except* >4.2. Do you consider it a bug to be able to read() from a terminal after getting an end-of-file indication? The behavior of fread was consistent with the documentation. Changing it, whether desirable or not, is a change in functionality. A change can only be considered a bug fix if it brings into line behavior previously out of line with the documentation. -- Jim Balter, INTERACTIVE Systems (ima!jim)
jim@ISM780B.UUCP (11/29/84)
>It is obvious from this, that no distinction is made of EOT chars meaning >anything but the absolute end-of-file on TTY. Quite wrong. As you quoted, EOT terminates an input line, and *if that line is empty*, the return value is zero, and the *zero return value from read* is interpreted as EOF, not receipt of EOT. But that is not really relevant to fread. What is relevant is that the current call to fread stops when it encounters EOF; NOWHERE DOES IT SAY THAT FREAD RETURNS ZERO UPON END-OF-FILE. It doesn't unless an EOF is encountered when fread tries to read its first byte. Nowhere does it say that EOF latches, any more than it does for read. It is the nature of UNIX terminals that you can read past the EOF; that is why fread behaved as it does, totally consistent with the documentation. To quote Bill Shannon, "The bug was that without this change programs using fread on terminals would never report an EOF condition to the user because internally fread would just swallow the EOF and return a short record and the next fread would go on reading past the EOF." But that is exactly what fread should do: return a short record (as documented; read returns 0, as documented) and go on reading past the EOF on the next fread (just as read goes on reading beyond EOF). Only improperly written programs that erroneously assume that fread signals EOF with a zero return value (it doesn't; it isn't documented to) have the "bug". read is different from fread because it is delimited by newline, so that EOT at the beginning of a line always causes a zero return, because the first character of a line must be the first character read (although not necessarily vice versa); this simply isn't true of fread. For routines for which it is true, such as fgets or getc, then the return value can be used to detect EOF. You cannot show that the SysV fread is wrong by quoting the *read* documentation, especially without understanding why the two are different. -- Jim Balter, INTERACTIVE Systems (ima!jim)
guy@rlgvax.UUCP (Guy Harris) (11/30/84)
> It is the nature of UNIX terminals that you can read past the EOF;
It isn't just the nature of UNIX terminals. Some DEC OSes use the same
behavior; EDT terminates input mode with a ^Z, their EOF character.
Actually, quoting the VMS manuals, "CTRL/Z - Echoes ^Z when CTRL/Z is
typed as a *read terminator*. *By convention*, CTRL/Z constitutes
end-of-file." This implies (although it may not be the case) that ^Z
works in VMS exactly like ^D does in UNIX. This is worth pointing out,
since it was stated in an earlier article that there are more OSes with
"hard" rather than "soft" EOFs. I hope their EOFs aren't too hard; most
systems I've seen will let you type in a bunch of text as input to a program
and type your favorite EOF character and end input to that program without
ending input to all programs that run from that terminal during that session.
(Admittedly, most systems I've seen are either UNIX or DEC OSes.)
By the way, I saw a later version of "stdio" for 4.2 that looked like
it had the change rescinded; was this the case? (In which case, a lot of
this discussion is somewhat moot.)
Guy Harris
{seismo,ihnp4,allegra}!rlgvax!guy
ka@cbosgd.UUCP (Kenneth Almquist) (11/30/84)
> The fact that some programs have in the past misinterpreted this bug as a > feature and made use of it is unfortunate, but something we will have to > live with. Arggh! The 4.2 BSD manual page for getchar states that, "These functions return the integer constant EOF at end of file...." Now for a standard UNIX file, the end of file is the location immediately above the last byte written. Thus if getchar returns EOF, something is appended to the input file, and getchar is called again, getchar should not return EOF because the file pointer is no longer at end of file. The fact that the 4.2 BSD implementation of getchar handles EOF differ- ently not only from all other variants of UNIX, but also from the way its own documentation says it should handle EOF, is indeed "unfortunate, but something we will have to live with." If we may believe Bill Shannon, the whole issue arose because nobody could figure out how to make a straightforward change to fread. The change could have been implemented as follows: 1) Add a new flag named _EOF_PUSHED_BACK to stdio.h. 2) When _filbuf is called with this flag set, have it clear the flag and return EOF. 3) While a new routine called pushback which is just like ungetc except that pushback(EOF, fp) should set _EOF_PUSHED_BACK and return. 4) When fread encounter EOF and it has read at least one item, have it call pushback(EOF, fp) before returning. Kenneth Almquist
thomas@utah-gr.UUCP (Spencer W. Thomas) (11/30/84)
In article <266@rlgvax.UUCP> guy@rlgvax.UUCP (Guy Harris) writes: >It isn't just the nature of UNIX terminals. Some DEC OSes use the same >behavior; EDT terminates input mode with a ^Z, their EOF character. >Actually, quoting the VMS manuals, "CTRL/Z - Echoes ^Z when CTRL/Z is >typed as a *read terminator*. *By convention*, CTRL/Z constitutes >end-of-file." This implies (although it may not be the case) that ^Z >works in VMS exactly like ^D does in UNIX. Well, I don't know about VMS, but in TOPS-20, if you type ^Z, you see the EOF, even if you type it in the middle of the line (unlike ^D on Unix). Personally, no matter what the manual says about "terminating input" on ^D, and so on, I find that it is very confusing to naive users that they must SOMETIMES type ^D twice, but other times, typing it once suffices. Just because you have gotten used to the behaviour, doesn't mean it's right. =Spencer
shannon@sun.uucp (Bill Shannon) (12/02/84)
Jim Balter says, "show me the manual page for fread that says that 0 is returned upon EOF!" Here's an excerpt from the 4.2BSD man page for fread, V7 is identical: DESCRIPTION Fread reads, into a block beginning at ptr, nitems of data of the type of *ptr from the named input stream. It returns the number of items actually read. . . . DIAGNOSTICS Fread and fwrite return 0 upon end of file or error. He also says, "A change can only be considered a bug fix if it brings into line behavior previously out of line with the documentation." Thank you, Jim, for justifying our change. It seems apparent from your argument that it was System III/V that did the wrong thing. Bill Shannon Sun Microsystems, Inc.
lepreau@utah-cs.UUCP (Jay Lepreau) (12/02/84)
Jim@ISM780B states in two separate articles: > show me the manual page for fread that says that 0 is returned upon EOF! > ... > When discussing fine points of documentation, it is more accurate and less > embarrassing to use your eyeballs, not your memory.... > you should not be so arrogant as to assume that you are not subject to > such misconceptions without verifying it. > ... > NOWHERE DOES IT SAY THAT FREAD RETURNS ZERO UPON END-OF-FILE. Taking Jim's own humble advice on the use of eyeballs and arrogance I found in the v7 manual under fread(3): DIAGNOSTICS Fread and fwrite return 0 upon end of file or error. And so does the 4.2 manual, which is derived from 32v which is derived from v7. Now, in Sys V (and Sys 3?), rather than change the code to fix a bug they changed the documentation and removed that sentence. Sun and UCB chose to fix the code. Fine. Arguments can be made both ways. (Of course I have strong opinions as to which is preferable.) However, the so-called issue of whether or not ^D other than at the beginning of a line should mean EOF is a straw man, and is not at issue (or shouldn't be, anyway). In any case it is orthogonal to the issue of sticky-eof on ttys, and is just muddying the waters. It's about as germane and likely to change as adding ^^ or BCD to C. Jay Lepreau
geoff@desint.UUCP (Geoff Kuenning) (12/03/84)
In article <528@cbosgd.UUCP> ka@cbosgd.UUCP (Kenneth Almquist) writes: >If we may believe Bill Shannon, the whole issue arose because nobody >could figure out how to make a straightforward change to fread. Bill said quite explicitly that the change arose because they wanted to make the behavior of fread consistent. I am sure that Bill is capable of coming up with the push_back_eof algorithm all by his little old self -- if, after considering the design aspects of the situation, he decides that is the behavior he wants. If you intend to write portable software, don't assume you can continue reading from a terminal after EOF. For my money, I would much rather pay a small backwards-compatibility price to achieve a stdio implementation that was truly portable. In any case, most programs that expect to get more than one EOF from a terminal are broken, because you will get different results if you redirect from a file. Sure, there are special exceptions like slowread (aka tail -f aka tra), but let's be honest, folks -- of all the files you access in a day, how many do you access while they are growing? Normally, you make use of existing, non-growing files, and a program expecting two EOF's from a terminal will always get a null second file if it is redirected. -- Geoff Kuenning ...!ihnp4!trwrb!desint!geoff
henry@utzoo.UUCP (Henry Spencer) (12/05/84)
> ... > Here's an excerpt from the 4.2BSD man page for fread, V7 is identical: > > DIAGNOSTICS > Fread and fwrite return 0 upon end of file or error. Not just a short count, mind you, but 0.
bsa@ncoast.UUCP (Brandon Allbery) (12/05/84)
The Plexus manuals have an entry for a command (I forget the name and
I'm 25 miles or so away from the manuals at the moment :-) that works
like cat except that EOF it sleeps for some user-specified amount of
time and then tries to read to the next EOF, so on forever. This is
for ORDINARY FILES, mind you (i.e. redirected output from make; I'd like
to see that option); if an ordinary file can be so handled, why should
a terminal be any different? Especially since the terminal works that
way anyway??? (About you DECcies: I remember a problem on a DEC 20/60
that forced a shutdown because the program was looking for hardware EOF
on a terminal. I don't expect to EVER see that on a Unix system. If
that bug exists in TOPS-20, why not other nonsensical bugs -- and I choose
to treat sticky EOF as a bug, given that a terminal doesn't sticky EOF
at all, in reality.
I give you 3 choices:
1) inconsistent file handling. What sticky EOF is in 4.2bsd, what it
is on any system that treats magtape EOFs as not absolute (most, I think)
EXCEPT standard Unix. And if you do that to Unix, you lose the whole
argument for Unix because files are *no longer* always identical in the
view of the program. In fact, I don't think the result can be CALLED
Unix.
2) consistent file handling with sticky EOF. And how do you propose
to make compatible magtapes?
3) consistent file handling with NON-sticky EOF. What most Unix versions
do. Thus working nicely with magtapes and terminals; and also useful
in examining dynamic files like the running output of make (or
/usr/spool/uucp/LOGFILE :-)
--bsa
--
Brandon Allbery @ North Coast Xenix | the.world!ucbvax!decvax!cwruecmp!
6504 Chestnut Road, Independence, Ohio | {atvax!}ncoast!{tdi1!}bsa
(216) 524-1416 \ 44131 | E1439@CSUOHIO.BITNET (friend's acct.)
| BALLBERY (161-7070) on MCI Mail
---------------------------------------+---------------------------------------
Keeping the Galaxies safe for Civilization... :-)
ka@cbosgd.UUCP (Kenneth Almquist) (12/08/84)
>>If we may believe Bill Shannon, the whole issue arose because nobody >>could figure out how to make a straightforward change to fread. > >Bill said quite explicitly that the change arose because they wanted to make >the behavior of fread consistent. That's what I said he said. >I am sure that Bill is capable of coming >up with the push_back_eof algorithm all by his little old self -- if, after >considering the design aspects of the situation, he decides that is the >behavior he wants. You already stated the behavior that Bill wanted: he wanted to make the behavior of fread match the description in the manual page. He did not want to change the behavior of any other functions. Of course Bill is capable of coming up with the push_back_eof algorithm himself, but as it happens he did't. He asked in his posting how fread could be made to correspond to the manual page description of it without changing getc, and I answered him. None of this is intended as an attack on Bill--any programmer is entitled to an occasional slip--but I wonder why it wasn't caught before the release of 4.2. >If you intend to write portable software, don't assume you can continue >reading from a terminal after EOF. And if I don't want to write software that is portable to anything other than another UNIX system? And anyway, I have never heard of a system that couldn't support reading on a terminal after EOF. Such a system would be a bit awkward to use since every time you typed an EOF at your terminal all programs, including the command processor, would presumably encounter and EOF indication and you would be logged out. >For my money, I would much rather pay >a small backwards-compatibility price to achieve a stdio implementation that >was truly portable. Currently, stdio is not truly portable. Try to implement fseek on a non-UNIX system some time. Stdio does hide differences between various versions of UNIX and I am not suggesting that that should change. >In any case, most programs that expect to get more than one EOF from a >terminal are broken, because you will get different results if you redirect >from a file. Horrors, EMACS won't work if you redirect it's input to a file--I guess we had better throw it out. Seriously, differences between UNIX variants create problems for people. The idea that "they won't break very many programs" is not a justification. Obviously nobody would have raised the issue if no programs were affected. I can appreciate Bill Shannon's postion on fread, but changing the functioning of getc is a different issue. Kenneth Almquist
chris@umcp-cs.UUCP (Chris Torek) (12/10/84)
> Horrors, EMACS won't work if you redirect it's input to a file--I guess > we had better throw it out. *Whose* Emacs won't work? -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/11/84)
> DIAGNOSTICS > Fread and fwrite return 0 upon end of file or error. Yes, it used to say that. It failed to clarify what would happen on a "short read". Returning 0 in such a case is absolutely incompatible with the DESIGN of fread(): that you tell it how many items you want and it returns the number you got. Clearly the above portion of the manual was not well thought out, since it led to this discussion. AT&T has clarified this (and many similar oversights, ambiguities, and confusion) in the manual; and in this particular case I think they did it right (since it makes the function design make more sense than the other interpretation).
geoff@desint.UUCP (12/17/84)
In article <560@cbosgd.UUCP> ka@cbosgd.UUCP (Kenneth Almquist) writes: >And if I don't want to write software that is portable to anything other >than another UNIX system? And anyway, I have never heard of a system that >couldn't support reading on a terminal after EOF. Such a system would be >a bit awkward to use since every time you typed an EOF at your terminal >all programs, including the command processor, would presumably encounter >and EOF indication and you would be logged out. If you want to write non-portable software, use UNIX system calls. They handle EOF in the UNIX way. Just because you haven't heard of an operating system that has hard EOF's doesn't mean one doesn't exist. Your presumption about logouts shows a strong UNIX prejudice. *Very* few operating systems interpret EOF's to the command processer as a logout indication. Furthermore, many operating systems put the command processor in the kernel, so that an EOF delivered to a user program is not at all the same as an EOF given to the command processor. Indeed, this is frequently part of the reason they have "hard" EOF's. (No, I don't like this design either -- shells should be user processes. But such systems do exist.) >Seriously, differences between UNIX variants >create problems for people. The idea that "they won't break very many >programs" is not a justification. Obviously nobody would have raised the >issue if no programs were affected. Yup, catching up with the real world is frequently painful. Check out the heat that has risen over 6-character externals in the draft ANSI standard. But in that case and this one, I would rather bite the bullet and do it the way that will make life easier in the future. BTW, I have an editor that is very similar to EMACS, and it does not object at all if its descriptors are redirected to files. I added the feature because I had a need for it. -- Geoff Kuenning ...!ihnp4!trwrb!desint!geoff
bsa@ncoast.UUCP (12/18/84)
> Article <6535@brl-tgr.ARPA>, from henry@utzoo.uucp +---------------- | > Here's an excerpt from the 4.2BSD man page for fread, V7 is identical: | > | > DIAGNOSTICS | > Fread and fwrite return 0 upon end of file or error. | | Not just a short count, mind you, but 0. Which is wrong. If you request 7 characters and it reads 4 before EOF, you've either lost 4 characters or gotten 3 garbage characters, depending on what fread returns and how your program deals with EOF. Sounds to me like fread is wrong from square one. --bsa -- Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business) 6504 Chestnut Road, Independence, Ohio 44131 (216) 524-1416 <<<<<< An equal opportunity employer: I both create and destroy bugs :-) >>>>>>