[comp.unix.wizards] Write calls which do partial writes

neeraj@matrix.UUCP (neeraj sangal) (05/03/89)

    These are questions related to the call:
        n = write(fd, buf, len);

    As documented in write(2), n is the number of bytes written if the call
is successful; n is -1 otherwise. I had all along assumed that when write
completes successfully, n would equal len.  However, at least in Sun OS 4.0
this assumption is not true if fd is a descriptor for a TCP/IP socket and the
socket has been set to do non-blocking I/O.

    This raises a number of questions:

    1. Can n be less than len if fd is NOT set to non-blocking?
    2. Is it true that stream based "sockets" behave similarily?
    3. If this is true then won't a number of existing programs break
       (especially when they are piped together)?


    This approach regarding writes to network file descriptors is quite
logical, especially when used in conjunction with select(2).  Select can be
used to detect when write can be done without blocking; however, there is no
way to determine the amount that can be written out without blocking.
The question I have is related to the implementation of select(2) (and
poll).  When does select return an indication that a file descriptor
can be written to?  Will the underlying protocol complete select the
moment a single byte can be transferred or will it wait before completing
select in the hope that a larger buffer space will free up?


    There is a need for a signal to indicate when write to a file
descriptor can be done (similar to SIGIO which is used to indicate when a
read may be done).  Is there any reason why this has not been implemented?
Is it being planned or considered for a future version?


    On a related matter, how does one do an asynchronous connect on
a BSD socket?  The documentation does mention an errno called
EINPROGRESS to indicate that asynchronous connect is in progress.  But
then how does one find out whether connect completed and whether it
succeeded or failed?



Neeraj Sangal
Matrix Computer Systems, Inc.           7 1/2 Harris Rd, Nashua, NH 03062
uunet!matrix!neeraj                     (603) 888-7790

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/03/89)

In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes:
>        n = write(fd, buf, len);
>    1. Can n be less than len if fd is NOT set to non-blocking?

Certainly; if only some but not all bytes were transferred,
for example due to the system call being interrupted by a signal,
then the best thing for write() to report is the number of bytes
successfully transferred.  (I forget whether IEEE 1003.1 ended up
permitting this or not; it was hotly debated.)  Robust stdio
implementations have to loop on the write() call until all bytes
are transferred or an error occurs.

>Will the underlying protocol complete select the moment a single
>byte can be transferred...?

Obviously it should.

The details depend on the exact implementation and are complicated
by things like the guaranteed pipe atomic write size, etc.

jiii@visdc.UUCP (John E Van Deusen III) (05/04/89)

In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes:
>        n = write(fd, buf, len);
>
>    As documented in write(2), n is the number of bytes written if the
> call  is successful; n is -1 otherwise.

I have recently purchased a book called C LANGUAGE INTERFACES that is
part of the Prentice Hall, AT&T series.  It has a wealth of information
about this sort of thing that was completely absent in the documentation
that I previously used; the AT&T UNIX SYSTEM V PROGRAMMER'S REFERENCE
MANUAL, also published by Prentice Hall.

When writing to pipes and FIFO's there are only so many bytes that will
fit at any given time (PIPE_BUF).  If more than PIPE_BUF bytes are to be
written, and O_NDELAY is clear, the bytes are written piecemeal until
the write is satisfied.  Because this process is not atomic, it is
possible for other processes to interleave output.  If this is a problem
O_NDELAY can be set, and write(2) will return the number of bytes that
could be written contiguously.

If the request is for PIPE_BUF bytes or less, and O_NDELAY is clear, the
process will block until the entire request can be accommodated.  If
O_NDELAY is set, and there is not enough room for the entire write
request, zero is returned.  Write requests of PIPE_BUF bytes or less
are always contiguous.
--
John E Van Deusen III, PO Box 9283, Boise, ID  83707, (208) 343-1865

uunet!visdc!jiii

len@synthesis.Synthesis.COM (Len Lattanzi) (05/04/89)

In article <10198@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
:In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes:
:>        n = write(fd, buf, len);
:>    1. Can n be less than len if fd is NOT set to non-blocking?
:
:Certainly; if only some but not all bytes were transferred,
:for example due to the system call being interrupted by a signal,
:then the best thing for write() to report is the number of bytes
:successfully transferred.  (I forget whether IEEE 1003.1 ended up
:permitting this or not; it was hotly debated.)  Robust stdio
:implementations have to loop on the write() call until all bytes
:are transferred or an error occurs.

Does anyone know for sure about IEEE 1003.1?
A (-1,EINTR) return from write is worthless if some bytes were written.
And you'll have to depend on your signal handler to not smash errno.
Do any of these OS/C library standards define useful schemes for handling
system call error returns in a multi-threaded process besides every
signal handler doing a save and restore of errno?

-Len
 Len Lattanzi (len@Synthesis.com) <{ames,pyramid,decwrl}!mips!synthesis!len>
 Synthesis Software Solutions, Inc. 		The RISC Software Company
I would have put a disclaimer here but I already posted the article.

jiii@visdc.UUCP (John E Van Deusen III) (05/05/89)

In article <18735@mips.mips.COM> (Len Lattanzi) writes:
> In article <10198@smoke.BRL.MIL> (Doug Gwyn) writes:
>>
>> ... due to the system call being interrupted by a signal, then the
>> best thing for write() to report is the number of bytes successfully
>> transferred.

Unfortunately, if write(2) gets interrupted by a signal, the user is in
effect saying, "stop what you are doing, no matter how critical; don't
save anything; and immediately jump to a piece of code in my program".
It is difficult to see how you could re-enter the function and pick up
the pieces.  Even if a reliable number of bytes written could be
obtained from a partially completed system call, this would result in an
ambiguity.  A positive return, that is less than the number of bytes
requested, is already used to indicate that a limit was reached to the
amount of space available on the physical device or pipe.

>
> A (-1,EINTR) return from write is worthless if some bytes were
> written.  And you'll have to depend on your signal handler to not
> smash errno.  Do any of these OS/C library standards define useful
> schemes for handling system call error returns in a multi-threaded
> process besides every signal handler doing a save and restore of
> errno?

It isn't that errno has been smashed.  EINTR simply reflects the fact
that the program bombed out of write(2); so as things stand, the system
call in in error.  You are correct that this information is not very
useful if your intention is to continue the program.

As of System V release 3 the sighold and sigrelse functions were added
in order to establish critical regions of code.  In this way you can be
sure that your system calls won't be interrupted.
--
John E Van Deusen III, PO Box 9283, Boise, ID  83707, (208) 343-1865

uunet!visdc!jiii

clive@ixi.UUCP (Clive) (05/05/89)

In article <18735@mips.mips.COM> len@synthesis.synthesis.com (Len Lattanzi) writes:
>In article <10198@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>:In article <103@matrix.UUCP> neeraj@matrix.UUCP (neeraj sangal) writes:
>:>    1. Can n be less than len if fd is NOT set to non-blocking?
>:Certainly; if only some but not all bytes were transferred,
>:for example due to the system call being interrupted by a signal,
>:then the best thing for write() to report is the number of bytes
>:successfully transferred.  (I forget whether IEEE 1003.1 ended up
>:permitting this or not; it was hotly debated.)
>Does anyone know for sure about IEEE 1003.1?
>A (-1,EINTR) return from write is worthless if some bytes were written.

From IEEE 1003.1, draft 13 (the one that was adopted), section 6.4.2.2:
[irrelevant text omitted, sometimes saying why]

Upon successful completion, the write() function shall return the number of bytes
actually written [...]
If a write() is interrupted by a signal before it writes any data, it shall return
-1 with errno set to [EINTR].
If a write() is interrupted by a signal after it successfully writes some data,
either it shall return -1 with errno set to [EINTR], or it shall return the number
of bytes written. A write() to a pipe or FIFO shall never return with errno set to
[EINTR] if it has transferred any data and nbyte is less than or equal to {PIPE_BUF}.

Write requests to a pipe (or FIFO) shall be handled the same as a regular file with
the following exceptions:
(1) [... all writes are appends]
(2) Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data
from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF}
bytes may have data interleaved, on arbitrary boundaries, with writes by other
processes [... whatever O_NONBLOCK is set to].
(3) If the O_NONBLOCK flag is clear [on the file descriptor], a write request may
cause the process to block, but on normal completion it shall return nbyte.
(4) If the O_NONBLOCK flag is set, [...] the write() function shall not block the
process; write requests for {PIPE_BUF} or fewer bytes shall either succeed completely
and return nbyte, or return -1 and set errno to [EAGAIN]; a write() request for
greater than {PIPE_BUF} bytes shall either transfer what it can and return the
number of bytes written, or transfer no data and return -1 with errno set to [EAGAIN].

When attempting to write to a file descriptor (other than a pipe or FIFO) that
supports nonblocking writes and cannot accept the data immediately:
(1) If the O_NONBLOCK flag is clear, write() shall block until the data can be
accepted.
(2) If the O_NONBLOCK flag is set, write() shall not block the process. If some data
can be written without blocking the process, write() shall write what it can and
return the number of bytes written. Otherwise it shall return -1 and errno shall be
set to [EAGAIN].

Section 2.9.5 requires either that PIPE_BUF be defined in <limits.h>, or, where it
varies according to filesystem, that it be determinable by the pathconf() function.
It is required to be not less than {_POSIX_PIPE_BUF}.

Section 2.9.2 requires <limits.h> to contain code equivalent in effect to:

#define _POSIX_PIPE_BUF 512


I always ensure that writes to pipes are done in 512 or less byte chunks, with
techniques (e.g. prefixed byte counts) to cope with interlacing from different
processes.

-- 
Clive D.W. Feather           clive@ixi.uucp
IXI Limited                  ...!mcvax!ukc!acorn!ixi!clive (untested)
                             +44 223 462 131

frank@ladcgw.UUCP (Frank Mayhar) (05/06/89)

In article <530@visdc.UUCP> jiii@visdc.UUCP (John E Van Deusen III) writes:
>In article <18735@mips.mips.COM> (Len Lattanzi) writes:
>> In article <10198@smoke.BRL.MIL> (Doug Gwyn) writes:
>>> ... due to the system call being interrupted by a signal, then the
>>> best thing for write() to report is the number of bytes successfully
>>> transferred.
>Unfortunately, if write(2) gets interrupted by a signal, the user is in
>effect saying, "stop what you are doing, no matter how critical; don't
>save anything; and immediately jump to a piece of code in my program".
>It is difficult to see how you could re-enter the function and pick up
>the pieces.  Even if a reliable number of bytes written could be
>obtained from a partially completed system call, this would result in an
>ambiguity.  A positive return, that is less than the number of bytes
>requested, is already used to indicate that a limit was reached to the
>amount of space available on the physical device or pipe.

Pardon my saying so, but, CRAP!  I've fought with this piece of Unix wrong-
headedness, and I see no good reason for it.  At the very least, errno should
contain some indication that the operation was interrupted, and the return
should be the number of bytes written.  Or that information should be 
retrievable in some fashion (a new system call, perhaps?).  There are many
good reasons to interrupt a process asynchronously (timer runout, I/O 
completion, etc.), and doing so should _always_ be recoverable.  My personal
opinion is that write()s should be atomic, and uninterruptible.  read()s
may or may not be atomic (I think they should be), but if they are not, it
should be possible to easily recover in this case, as well.
 
>As of System V release 3 the sighold and sigrelse functions were added
>in order to establish critical regions of code.  In this way you can be
>sure that your system calls won't be interrupted.

About d*mn time!

Note that I'm not flaming you, I'm flaming what I see as a Unix misfeature.
-- 
Frank Mayhar  ..!uunet!ladcgw!frank (soon to be frank@ladc.bull.com)
              Frank-Mayhar%ladc@bco-multics.hbi.honeywell.com (until June 1)
              Bull HN Los Angeles Development Center
              5250 W. Century Blvd., LA, CA  90045  Phone:  (213) 216-6241

mcgrath@paris.Berkeley.EDU (Roland McGrath) (05/10/89)

In article <372@ladcgw.UUCP> frank@ladcgw.UUCP (Frank Mayhar) writes:

   >As of System V release 3 the sighold and sigrelse functions were added
   >in order to establish critical regions of code.  In this way you can be
   >sure that your system calls won't be interrupted.

   About d*mn time!

So, in release 3, System V has caught up to 4.1 BSD.  Impressive.
--
	Roland McGrath
	Free Software Foundation, Inc.
roland@ai.mit.edu, uunet!ai.mit.edu!roland
Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/10/89)

In article <MCGRATH.89May9190027@paris.Berkeley.EDU> mcgrath@paris.Berkeley.EDU (Roland McGrath) writes:
>So, in release 3, System V has caught up to 4.1 BSD.  Impressive.

Yeah, now if only 4BSD would catch up to System V...

scott@dtscp1.UUCP (Scott Barman) (05/12/89)

In article <10248@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <MCGRATH.89May9190027@paris.Berkeley.EDU> mcgrath@paris.Berkeley.EDU (Roland McGrath) writes:
>>So, in release 3, System V has caught up to 4.1 BSD.  Impressive.
>Yeah, now if only 4BSD would catch up to System V...

It has...
It's called System V Release 4!
	"for now on, call it HUGE"	:-)

-- 
scott barman
{gatech, emory}!dtscp1!scott

mcgrath@homer.Berkeley.EDU (Roland McGrath) (05/13/89)

In article <10248@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
   >So, in release 3, System V has caught up to 4.1 BSD.  Impressive.

   Yeah, now if only 4BSD would catch up to System V...

Religious war alert.  Crusades imminent.

I think we can all make a mutual agreement NOT to start up with this
(despite the grand old Usenet tradition).

Anyway, I don't care.  The GNU kernel will be orders of magnitude better
than either, and even now has none of the nitty-gritty kernel details that
plague other systems, through the simple and elegant solution of nonexistence.
--
	Roland McGrath
	Free Software Foundation, Inc.
roland@ai.mit.edu, uunet!ai.mit.edu!roland
Copyright 1989 Roland McGrath, under the GNU General Public License, version 1.