karl@haddock.UUCP (Karl Heuer) (06/29/87)
main() {
char buf[5];
for (;;) printf("%d\n", read(0, buf, 5));
}
If you type *exactly* 5 characters and terminate the read with EOT (which is
not an EOF in this context, in the middle of a line), the first read returns 5
(as it should) and the second returns 0 (instead of waiting for more input).
Tested on 4.3ansan
gwyn@brl-smoke.UUCP (06/30/87)
In article <648@haddock.UUCP> karl@haddock.isc.com (Karl Heuer) writes: >If you type *exactly* 5 characters and terminate the read with EOT (which is >not an EOF in this context, in the middle of a line), the first read returns 5 >(as it should) and the second returns 0 (instead of waiting for more input). That is correct behavior. In cooked mode, the "EOT" character is a delimiter that is inserted into the stream along with the others. It is NEVER an "end of file" character; that is merely a conventional interpretation given to a delimiter found as the first character of a text line. Your first read got 5 characters, and the second read encountered the delimiter, which stops input and returns the number of characters found before the delimiter (0 in this case).
ron@topaz.rutgers.edu (Ron Natalie) (06/30/87)
Excuse me System V breath, if you look on your own beloved operating system you will see that EOT works the opposite way that it does on system V, that is, the following code main() { int count; char buf[10]; do { count = read(0, buf, 5); printf("\ncount = %d\n", count); } while(count); } Does the following on Berkeley UNIX (SUN 3.2): % a.out a<NL> count = 2 abcde<EOT> count = 5 count = 0 % note that it doesn't read the keyboard between the last two printfs. on both a 3B20 running Sys VR2v3 and a 3B2 running Sys VR3 % a.out a<NL> count = 2 abcde<EOT> count = 5 ...at this point it waits for you to type more input... I guess System V is wrong for once :-) -Ron
kre@munnari.oz (Robert Elz) (07/01/87)
In article <13048@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie): > Does the following on Berkeley UNIX (SUN 3.2): ... > note that it doesn't read the keyboard between the last two printfs. that's wrong. > on both a 3B20 running Sys VR2v3 and a 3B2 running Sys VR3 ... > ...at this point it waits for you to type more input... that's right. Sys V is clearly right here, and bsd is wrong, and it should be fixed. (And for anyone who doesn't know, I'm hardly a Sys V supporter). kre
rpw3@amdcad.AMD.COM (Rob Warnock) (07/02/87)
My understanding has always been that <EOT> was a "push" which did not store data in the stream. By "push" I simply mean "return from the read with whatever you've got so far. (Under this interpretation, <LF> usually means "store an <LF> then 'push'".) The function of <EOF> arises because if you "push" at the beginning of a line (before data is typed), the "read()" will return zero. But if you "push" after N characters have been typed, you get N characters. Therefore, by the "Principle Of [my own] Least Astonishment": abcde<EOT> should return 5 characters, and the next call to "read()" should block. In this case, System-V does it *right*! Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun,attmail}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
ford@crash.CTS.COM (Michael Ditto) (07/03/87)
In article <13048@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes: >Excuse me System V breath, if you look on your own beloved operating system >you will see that EOT works the opposite way that it does on system V, that >is, the following code > > main() { > int count; > char buf[10]; > > do { > count = read(0, buf, 5); > printf("\ncount = %d\n", count); > } while(count); > } > > [...] > > % a.out > a<NL> > > count = 2 > abcde<EOT> > count = 5 > >...at this point it waits for you to type more input... > >I guess System V is wrong for once :-) > No, this is the spec for EOF in termio(7) (SysV's equivalent to tty(4)): [When EOF is received] all the characters waiting to be read are immediately passed to the program, without waiting for a new-line, and the EOF is descarded. Thus, if there are no characters waiting, which is to say the EOF occurred at the beginning of a line, zero characters will be passed back [...] This is the way UNIX has always worked, except for Berkeley's versions, and AT&T still does it this way. (The above quote from termio(7) is copyright by AT&T, but you can see it on your SysV system with "man 7 termio". [Lame attempt at disclaimer]). -- Michael "Ford" Ditto -=] Ford [=- P.O. Box 1721 ford@crash.CTS.COM Bonita, CA 92002 ford%oz@prep.mit.ai.edu
henry@utzoo.UUCP (Henry Spencer) (07/03/87)
> That is correct behavior...
Uh, correct by whose definition, Doug? The original Unix semantics of
EOT were the "push" semantics (as opposed to the "delimiter" semantics you
describe), in which the EOT forces the existing input queue (possibly
zero-length) to be pushed through to the user, and then disappears utterly.
--
Mars must wait -- we have un- Henry Spencer @ U of Toronto Zoology
finished business on the Moon. {allegra,ihnp4,decvax,pyramid}!utzoo!henry
ron@topaz.rutgers.edu (Ron Natalie) (07/04/87)
>I guess System V is wrong for once :-) > No, this is the spec for EOF in termio(7) (SysV's equivalent to tty(4)): [When EOF is received] all the characters waiting to be read are immediately passed to the program, without waiting for a new-line, and the EOF is descarded. Thus, if there are no characters waiting, which is to say the EOF occurred at the beginning of a line, zero characters will be passed back [...] You seem to have missed the fact that I was jeering at Doug Gwyn (Notable System V proponent) for putting forth this opinion was exactly contrary to what System V does: That is correct behavior. In cooked mode, the "EOT" character is a delimiter that is inserted into the stream along with the others. It is NEVER an "end of file" character; that is merely a conventional interpretation given to a delimiter found as the first character of a text line. Your first read got 5 characters, and the second read encountered the delimiter, which stops input and returns the number of characters found before the delimiter (0 in this case). He's right except that System V associates the delimeter with the characters before it, out of band. Berkeley, places the delimeter (still an EOF) in band, which causes it not to be noticed if the read size exactly matches the number of characters queued before the delimeter. In neither case is the ^D merely discarded, that would imply that sleep(10) read(0, buf, 10); with a<EOT>b<EOT>c<NL> typed during the sleep would return "abc\n". The belief that the EOF should not be treated as in BSD is confirmed by the statement earlier in the termio manual page that states that the read size may be smaller than the number of characters in the queue, even a single character, without loss of information. Thus, this implies that the loop: while(1) { i = read(0, buf, N); if(i == 0) break; write(1, buf, i); } will work regardless of the size of N, which is not true on Berkeley as setting N to 1 will cause any EOT terminated lines to return apparent EOF indications. To make BSD work like Sys V you can kludge it by changing tty.c routine ttread (around line 2191 in mine) where it says if(u.u_resid == 0) break; to say something like if(u.u_resid == 0) { if( /* IF there p->c_cc > 0 && /* are more characters AND */ (*p->c_cf & 0x377) == eof && /* ..the next is EOF AND */ (t_flags & CBREAK) == 0 && /* ..we're in cooked mode AND */ (ttbreakc(c, tp) == 0) /* .. last char wasn't break */ ) getc(tp); /* Throw away EOF that goes with this data. */ break; } I don't feel like remaking the kernel now, so I can't tell you if it works. -Ron
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)
In article <17345@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes: >My understanding has always been that <EOT> was a "push" which did not >store data in the stream. At one time, a special "delimiter" marker was inserted into the stream at that point. Apparently, some UNIXy implementations do it one way and some another. I seem to recall that SVR3.0 STREAMS was missing the M_DELIM message type, so whenever AT&T finally gets the whole character I/O system converted to STREAMS, they couldn't insert a delimiter if they wanted too (according to Ron, that would be consistent with current UNIX System V behavior). Alas, another difference among UNIX variants. What does POSIX have to say about this?
guy%gorodish@Sun.COM (Guy Harris) (07/05/87)
> At one time, a special "delimiter" marker was inserted into the stream > at that point. Apparently, some UNIXy implementations do it one way > and some another. Non-STREAMS tty drivers generally have a "raw" queue and a "canonical" queue. Reads in "cooked" mode take place from the "canonical" queue. In the AT&T drivers, of various flavors (V7, S3, S5), characters accumulate in the "raw" queue until a "read" is done. If the terminal is in cooked mode when the "read" is done, the "read" blocks until a line terminator (newline, EOF, or "secondary end-of-line" character) is received. At that point, one and only one line is canonicalized (erase/kill processing is done) and is moved to the "canonical" queue. If the "line" is terminated by an EOF rather than an end-of-line character, the EOF does NOT appear in the canonical queue. Thus, the top-level reading code won't see delimiters. The 4BSD driver(s) move data from the "raw" queue to the "canonical" queue as soon as a line terminator is received. "Canonicalization" is done on the fly; for example, as soon as an "erase" character is received, the character it erases is removed from the "raw" queue. (This makes it easier to implement more correct handling of the "erase" character - it's easier for the driver to know what character is being erased, so it can do a better job of erasing it from the screen - and also makes it easier to handle a "reprint" character that causes the current queued-up input to be re-echoed. It also means that erase, kill, etc. characters do NOT count against the 256-character limit of uncanonicalized characters, but subtract from that count.) If the line ended with EOF, the EOF is left in the canonical queue as a delimiter. It is stripped out when the "read" is done; however, if there are five characters in the queue, and the "read" asks for five bytes, only those five characters are looked at. If an EOF follows them, it is left in the queue and seen by the next "read". > I seem to recall that SVR3.0 STREAMS was missing the M_DELIM message type, > so whenever AT&T finally gets the whole character I/O system converted to > STREAMS, they couldn't insert a delimiter if they wanted too (according to > Ron, that would be consistent with current UNIX System V behavior). This is the true. STREAMS messages somewhat resemble "mbuf" chains; delimiters are implicit in the structure of these chains (when you get to the end of one, you're at the end of a message). A line would be a single STREAMS message; the EOF would be discarded ASAP, since it is not needed as a delimiter. As such, any driver based on the S5R3 STREAMS code will give the "push", rather than the "delimiter" behavior (regardless of whether it implements "canonicalize at read time" or "canonicalize at input time" behavior). The "streams" code described in Dennis Ritchie's paper in the BSTJ (I have no idea if that implementation is called STREAMS or just "streams") has a "delimiter" message type. I don't know what sort of behavior the various V8 "streams"-based (as opposed to S5R3 STREAMS-based) tty drivers provide; Dennis' paper described two drivers, one giving the 4.1BSD "old" line discipline behavior (which may resemble V7 behavior) and one giving the 4.1BSD "new" line discipline behavior (which probably resembles other 4BSD systems). I agree with most of the people here; the non-4BSD behavior is correct. When I type ^D, it doesn't mean that I'm putting a ^D into the input queue, it measns I'm terminating a record. > Alas, another difference among UNIX variants. What does POSIX have to > say about this? From the draft of Draft 10 (*sic*) we have here: 7.1.1.11 Special Characters ... EOF ...When received, all the characters waiting to be read are immediately passed to the program, without waiting for a new-line, and the EOF is discarded. Thus, if there are no characters waiting (that is, the EOF occurred at the beginning of a line), zero characters shall be passed back, representing an end-of-file indication. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)
In article <1325@crash.CTS.COM> ford@crash.CTS.COM (Michael Ditto) writes: > [When EOF is received] all the characters waiting to be read are > immediately passed to the program, without waiting for a new-line, > and the EOF is descarded. ... This is of course nonsense, because the characters are NOT necessarily "passed to the program". (What program? Terminal I/O proceeds asynchronously, and there is no telling in advance which process will ultimately read the terminal input.) Typical UNIXy terminal handlers have a "canonical" input queue and a "raw" queue; in "cooked mode" (ICANON on), characters are passed from the canonical queue to the raw queue by a canonicalization gnome that "knows" that a newline or an EOF (also an EOL in System V) delimits a chunk of input (so that the chunk is immune to a subsequent char-erase or line-kill). In order to keep track of chunk ("line") boundaries in the absence of a newline, it is traditional to store a special "delimiter" marker in the input queue. There is an earlier section of the TERMIO spec that mentions line delimiters. The above quotation from the manual (same as in the SVID) is incomplete (as well as erroneous), in that it does not specify the boundary behavior of delimiters (i.e., the phenomena Ron reported on). >This is the way UNIX has always worked, except for Berkeley's versions, ... I dispute that. It MAY be the way that the USG 3.0 and derivative terminal handler (the "termio" one) has always worked.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)
In article <6055@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >..., characters are passed from the canonical queue to the raw queue ... Oops, I got the queue names backwards (it's been a long time since I've had to work on the terminal handler). Guy Harris's explanation, which I hadn't seen when I posted my previous note, looks accurate to me. >The above quotation from the manual (same as in the SVID) is incomplete >(as well as erroneous), ... Ahem, this also applies to Draft 10 of IEEE 1003.1. "Passed to the program" indeed.
henry@utzoo.UUCP (Henry Spencer) (07/08/87)
> The "streams" code described in Dennis Ritchie's paper in the BSTJ (I > have no idea if that implementation is called STREAMS or just > "streams") has a "delimiter" message type. (Probably just "streams" -- I've never seen Dennis capitalize it that I recall.) > I don't know what sort of > behavior the various V8 "streams"-based (as opposed to S5R3 > STREAMS-based) tty drivers provide... The tty drivers just put a delimiter message on after they pass a line through. However, there is some subtlety in the behavior of a stream read when the count is exactly satisfied that causes a trailing delimiter to be swallowed. So the "push" behavior is what is provided. Unless I'm much mistaken, this applies to both tty drivers. -- Mars must wait -- we have un- Henry Spencer @ U of Toronto Zoology finished business on the Moon. {allegra,ihnp4,decvax,pyramid}!utzoo!henry
thorinn@diku.UUCP (Lars Henrik Mathiesen) (07/09/87)
In article <648@haddock.UUCP> karl@haddock.UUCP (Karl Heuer) writes: >main() { > char buf[5]; > for (;;) printf("%d\n", read(0, buf, 5)); >} >If you type *exactly* 5 characters and terminate the read with EOT (which is >not an EOF in this context, in the middle of a line), the first read returns 5 >(as it should) and the second returns 0 (instead of waiting for more input). >Tested on 4.3bsd. I agree that this seems wrong, but look at it this way: If you had tried to read, say, 6 characters, you would still have got only 5; you could therefore conclude that the user had typed an EOF. According to the 4.3 tty(4) manual: It is not, however, necessary to read a whole line at once; any number of characters may be requested in a read, even one, without losing information. But if the next read (after the read( , , 5)) returned some further input, you would never know that the EOF was there, thus information is lost. This seems to be the way AT&T systems behave. If we can agree that the user-interface definition of an EOF indication is something like "An EOF immediately following a newline or another EOF", AND if we want this to be the only way to provoke a return of zero characters from read, the AT&T behaviour is best. But if we want to be able to detect arbitrary EOFs even when it is not practical to provide a buffer large enough for any input, the BSD behaviour is necessary. Regrettably you have to use code like this: /* * new_canon is a boolean variable that is true if we've just read * past a "canonicalization point". Assume that there's no t_brkc. */ int new_canon = 1; ... nextline: do { if ((n = read(0, buf, BUFSIZ)) < 0) /* ERROR */ exit(1); if (n == 0 && new_canon) /* EOF */ exit(0); newline = n && buf[n - 1] == '\n'; new_canon = newline || n < BUFSIZ; /* PROCESS buf */ } while (new_canon == 0); if (!newline) /* Input was terminated by EOF */ putchar('\n'); ... goto nextline; -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark ..mcvax!diku!thorinn Institute of Datalogy -- we're scientists, not engineers.
laman@ncr-sd.UUCP (07/09/87)
In article <13145@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes: : : : > >To make BSD work like Sys V you can kludge it by changing tty.c routine >ttread (around line 2191 in mine) where it says > > if(u.u_resid == 0) > break; > >to say something like > > if(u.u_resid == 0) { > if( /* IF there > p->c_cc > 0 && /* are more characters AND */ > (*p->c_cf & 0x377) == eof && /* ..the next is EOF AND */ ^ Get rid of the 'x' so you get an octal contant > (t_flags & CBREAK) == 0 && /* ..we're in cooked mode AND */ > (ttbreakc(c, tp) == 0) /* .. last char wasn't break */ > ) getc(tp); /* Throw away EOF that goes with this data. */ > break; > } > >I don't feel like remaking the kernel now, so I can't tell you if >it works. > >-Ron Just thought I'd point this out in case some did want to try this. Not having access to a BSD kernel, I can't comment on the rest of the code. Mike Laman UUCP: {ihnp4,sdcsvax,noscvax,...}!ncr-sd!laman
karl@haddock.UUCP (07/15/87)
In article <3320@diku.UUCP> thorinn@diku.UUCP (Lars Henrik Mathiesen) writes: >According to the 4.3 tty(4) manual: "... any number of characters may be >requested in a read ... without losing information." But if the next read >(after the read( , , 5)) returned some further input, you would never know >that the EOF was there, thus information is lost. True. (I don't think that's what they meant by "information", though.) >But if we want to be able to detect arbitrary EOFs even when it is not >practical to provide a buffer large enough for any input, the BSD behaviour >is necessary. Regrettably you have to use code like this: [complicated code >that uses an extra variable and observes newlines and full buffers]. But the fact is that existing code -- e.g. the guts of getchar() -- does not do anything of the sort, and therefore will behave as if a real end-of-file were signalled. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint