neeraj@matrix.UUCP (neeraj sangal) (05/03/89)
These are questions related to the call: n = write(fd, buf, len); As documented in write(2), n is the number of bytes written if the call is successful; n is -1 otherwise. I had all along assumed that when write completes successfully, n would equal len. However, at least in Sun OS 4.0 this assumption is not true if fd is a descriptor for a TCP/IP socket and the socket has been set to do non-blocking I/O. This raises a number of questions: 1. Can n be less than len if fd is NOT set to non-blocking? 2. Is it true that stream based "sockets" behave similarily? 3. If this is true then won't a number of existing programs break (especially when they are piped together)? This approach regarding writes to network file descriptors is quite logical, especially when used in conjunction with select(2). Select can be used to detect when write can be done without blocking; however, there is no way to determine the amount that can be written out without blocking. The question I have is related to the implementation of select(2) (and poll). When does select return an indication that a file descriptor can be written to? Will the underlying protocol complete select the moment a single byte can be transferred or will it wait before completing select in the hope that a larger buffer space will free up? There is a need for a signal to indicate when write to a file descriptor can be done (similar to SIGIO which is used to indicate when a read may be done). Is there any reason why this has not been implemented? Is it being planned or considered for a future version? On a related matter, how does one do an asynchronous connect on a BSD socket? The documentation does mention an errno called EINPROGRESS to indicate that asynchronous connect is in progress. But then how does one find out whether connect completed and whether it succeeded or failed? Neeraj Sangal Matrix Computer Systems, Inc. 7 1/2 Harris Rd, Nashua, NH 03062 uunet!matrix!neeraj (603) 888-7790
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/03/89)
In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes: > n = write(fd, buf, len); > 1. Can n be less than len if fd is NOT set to non-blocking? Certainly; if only some but not all bytes were transferred, for example due to the system call being interrupted by a signal, then the best thing for write() to report is the number of bytes successfully transferred. (I forget whether IEEE 1003.1 ended up permitting this or not; it was hotly debated.) Robust stdio implementations have to loop on the write() call until all bytes are transferred or an error occurs. >Will the underlying protocol complete select the moment a single >byte can be transferred...? Obviously it should. The details depend on the exact implementation and are complicated by things like the guaranteed pipe atomic write size, etc.
jiii@visdc.UUCP (John E Van Deusen III) (05/04/89)
In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes: > n = write(fd, buf, len); > > As documented in write(2), n is the number of bytes written if the > call is successful; n is -1 otherwise. I have recently purchased a book called C LANGUAGE INTERFACES that is part of the Prentice Hall, AT&T series. It has a wealth of information about this sort of thing that was completely absent in the documentation that I previously used; the AT&T UNIX SYSTEM V PROGRAMMER'S REFERENCE MANUAL, also published by Prentice Hall. When writing to pipes and FIFO's there are only so many bytes that will fit at any given time (PIPE_BUF). If more than PIPE_BUF bytes are to be written, and O_NDELAY is clear, the bytes are written piecemeal until the write is satisfied. Because this process is not atomic, it is possible for other processes to interleave output. If this is a problem O_NDELAY can be set, and write(2) will return the number of bytes that could be written contiguously. If the request is for PIPE_BUF bytes or less, and O_NDELAY is clear, the process will block until the entire request can be accommodated. If O_NDELAY is set, and there is not enough room for the entire write request, zero is returned. Write requests of PIPE_BUF bytes or less are always contiguous. -- John E Van Deusen III, PO Box 9283, Boise, ID 83707, (208) 343-1865 uunet!visdc!jiii
len@synthesis.Synthesis.COM (Len Lattanzi) (05/04/89)
In article <10198@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: :In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes: :> n = write(fd, buf, len); :> 1. Can n be less than len if fd is NOT set to non-blocking? : :Certainly; if only some but not all bytes were transferred, :for example due to the system call being interrupted by a signal, :then the best thing for write() to report is the number of bytes :successfully transferred. (I forget whether IEEE 1003.1 ended up :permitting this or not; it was hotly debated.) Robust stdio :implementations have to loop on the write() call until all bytes :are transferred or an error occurs. Does anyone know for sure about IEEE 1003.1? A (-1,EINTR) return from write is worthless if some bytes were written. And you'll have to depend on your signal handler to not smash errno. Do any of these OS/C library standards define useful schemes for handling system call error returns in a multi-threaded process besides every signal handler doing a save and restore of errno? -Len Len Lattanzi (len@Synthesis.com) <{ames,pyramid,decwrl}!mips!synthesis!len> Synthesis Software Solutions, Inc. The RISC Software Company I would have put a disclaimer here but I already posted the article.
jiii@visdc.UUCP (John E Van Deusen III) (05/05/89)
In article <18735@mips.mips.COM> (Len Lattanzi) writes: > In article <10198@smoke.BRL.MIL> (Doug Gwyn) writes: >> >> ... due to the system call being interrupted by a signal, then the >> best thing for write() to report is the number of bytes successfully >> transferred. Unfortunately, if write(2) gets interrupted by a signal, the user is in effect saying, "stop what you are doing, no matter how critical; don't save anything; and immediately jump to a piece of code in my program". It is difficult to see how you could re-enter the function and pick up the pieces. Even if a reliable number of bytes written could be obtained from a partially completed system call, this would result in an ambiguity. A positive return, that is less than the number of bytes requested, is already used to indicate that a limit was reached to the amount of space available on the physical device or pipe. > > A (-1,EINTR) return from write is worthless if some bytes were > written. And you'll have to depend on your signal handler to not > smash errno. Do any of these OS/C library standards define useful > schemes for handling system call error returns in a multi-threaded > process besides every signal handler doing a save and restore of > errno? It isn't that errno has been smashed. EINTR simply reflects the fact that the program bombed out of write(2); so as things stand, the system call in in error. You are correct that this information is not very useful if your intention is to continue the program. As of System V release 3 the sighold and sigrelse functions were added in order to establish critical regions of code. In this way you can be sure that your system calls won't be interrupted. -- John E Van Deusen III, PO Box 9283, Boise, ID 83707, (208) 343-1865 uunet!visdc!jiii
clive@ixi.UUCP (Clive) (05/05/89)
In article <18735@mips.mips.COM> len@synthesis.synthesis.com (Len Lattanzi) writes: >In article <10198@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >:In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes: >:> 1. Can n be less than len if fd is NOT set to non-blocking? >:Certainly; if only some but not all bytes were transferred, >:for example due to the system call being interrupted by a signal, >:then the best thing for write() to report is the number of bytes >:successfully transferred. (I forget whether IEEE 1003.1 ended up >:permitting this or not; it was hotly debated.) >Does anyone know for sure about IEEE 1003.1? >A (-1,EINTR) return from write is worthless if some bytes were written. From IEEE 1003.1, draft 13 (the one that was adopted), section 6.4.2.2: [irrelevant text omitted, sometimes saying why] Upon successful completion, the write() function shall return the number of bytes actually written [...] If a write() is interrupted by a signal before it writes any data, it shall return -1 with errno set to [EINTR]. If a write() is interrupted by a signal after it successfully writes some data, either it shall return -1 with errno set to [EINTR], or it shall return the number of bytes written. A write() to a pipe or FIFO shall never return with errno set to [EINTR] if it has transferred any data and nbyte is less than or equal to {PIPE_BUF}. Write requests to a pipe (or FIFO) shall be handled the same as a regular file with the following exceptions: (1) [... all writes are appends] (2) Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes [... whatever O_NONBLOCK is set to]. (3) If the O_NONBLOCK flag is clear [on the file descriptor], a write request may cause the process to block, but on normal completion it shall return nbyte. (4) If the O_NONBLOCK flag is set, [...] the write() function shall not block the process; write requests for {PIPE_BUF} or fewer bytes shall either succeed completely and return nbyte, or return -1 and set errno to [EAGAIN]; a write() request for greater than {PIPE_BUF} bytes shall either transfer what it can and return the number of bytes written, or transfer no data and return -1 with errno set to [EAGAIN]. When attempting to write to a file descriptor (other than a pipe or FIFO) that supports nonblocking writes and cannot accept the data immediately: (1) If the O_NONBLOCK flag is clear, write() shall block until the data can be accepted. (2) If the O_NONBLOCK flag is set, write() shall not block the process. If some data can be written without blocking the process, write() shall write what it can and return the number of bytes written. Otherwise it shall return -1 and errno shall be set to [EAGAIN]. Section 2.9.5 requires either that PIPE_BUF be defined in <limits.h>, or, where it varies according to filesystem, that it be determinable by the pathconf() function. It is required to be not less than {_POSIX_PIPE_BUF}. Section 2.9.2 requires <limits.h> to contain code equivalent in effect to: #define _POSIX_PIPE_BUF 512 I always ensure that writes to pipes are done in 512 or less byte chunks, with techniques (e.g. prefixed byte counts) to cope with interlacing from different processes. -- Clive D.W. Feather clive@ixi.uucp IXI Limited ...!mcvax!ukc!acorn!ixi!clive (untested) +44 223 462 131
frank@ladcgw.UUCP (Frank Mayhar) (05/06/89)
In article <530@visdc.UUCP> jiii@visdc.UUCP (John E Van Deusen III) writes: >In article <18735@mips.mips.COM> (Len Lattanzi) writes: >> In article <10198@smoke.BRL.MIL> (Doug Gwyn) writes: >>> ... due to the system call being interrupted by a signal, then the >>> best thing for write() to report is the number of bytes successfully >>> transferred. >Unfortunately, if write(2) gets interrupted by a signal, the user is in >effect saying, "stop what you are doing, no matter how critical; don't >save anything; and immediately jump to a piece of code in my program". >It is difficult to see how you could re-enter the function and pick up >the pieces. Even if a reliable number of bytes written could be >obtained from a partially completed system call, this would result in an >ambiguity. A positive return, that is less than the number of bytes >requested, is already used to indicate that a limit was reached to the >amount of space available on the physical device or pipe. Pardon my saying so, but, CRAP! I've fought with this piece of Unix wrong- headedness, and I see no good reason for it. At the very least, errno should contain some indication that the operation was interrupted, and the return should be the number of bytes written. Or that information should be retrievable in some fashion (a new system call, perhaps?). There are many good reasons to interrupt a process asynchronously (timer runout, I/O completion, etc.), and doing so should _always_ be recoverable. My personal opinion is that write()s should be atomic, and uninterruptible. read()s may or may not be atomic (I think they should be), but if they are not, it should be possible to easily recover in this case, as well. >As of System V release 3 the sighold and sigrelse functions were added >in order to establish critical regions of code. In this way you can be >sure that your system calls won't be interrupted. About d*mn time! Note that I'm not flaming you, I'm flaming what I see as a Unix misfeature. -- Frank Mayhar ..!uunet!ladcgw!frank (soon to be frank@ladc.bull.com) Frank-Mayhar%ladc@bco-multics.hbi.honeywell.com (until June 1) Bull HN Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
mcgrath@paris.Berkeley.EDU (Roland McGrath) (05/10/89)
In article <372@ladcgw.UUCP> frank@ladcgw.UUCP (Frank Mayhar) writes: >As of System V release 3 the sighold and sigrelse functions were added >in order to establish critical regions of code. In this way you can be >sure that your system calls won't be interrupted. About d*mn time! So, in release 3, System V has caught up to 4.1 BSD. Impressive. -- Roland McGrath Free Software Foundation, Inc. roland@ai.mit.edu, uunet!ai.mit.edu!roland Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/10/89)
In article <MCGRATH.89May9190027@paris.Berkeley.EDU> mcgrath@paris.Berkeley.EDU (Roland McGrath) writes: >So, in release 3, System V has caught up to 4.1 BSD. Impressive. Yeah, now if only 4BSD would catch up to System V...
scott@dtscp1.UUCP (Scott Barman) (05/12/89)
In article <10248@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <MCGRATH.89May9190027@paris.Berkeley.EDU> mcgrath@paris.Berkeley.EDU (Roland McGrath) writes: >>So, in release 3, System V has caught up to 4.1 BSD. Impressive. >Yeah, now if only 4BSD would catch up to System V... It has... It's called System V Release 4! "for now on, call it HUGE" :-) -- scott barman {gatech, emory}!dtscp1!scott
mcgrath@homer.Berkeley.EDU (Roland McGrath) (05/13/89)
In article <10248@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >So, in release 3, System V has caught up to 4.1 BSD. Impressive. Yeah, now if only 4BSD would catch up to System V... Religious war alert. Crusades imminent. I think we can all make a mutual agreement NOT to start up with this (despite the grand old Usenet tradition). Anyway, I don't care. The GNU kernel will be orders of magnitude better than either, and even now has none of the nitty-gritty kernel details that plague other systems, through the simple and elegant solution of nonexistence. -- Roland McGrath Free Software Foundation, Inc. roland@ai.mit.edu, uunet!ai.mit.edu!roland Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.