[comp.std.unix] Pipe Write Problems

jsq@usenix.uucp (John Quarterman) (08/25/87)

From: jsq@usenix.uucp (John Quarterman)

       Pipe Write Problems     Page 1 of 11	  IEEE 1003.1 N.116



					       John S. Quarterman
					       Institutional Representative
					       From USENIX to IEEE P1003
					       {uunet,ucbvax,seismo}!usenix!jsq

					       Texas Internet Consulting
					       701 Brazos, Suite 500
					       Austin, Texas 78701-3243
					       +1-512-320-9031
					       jsq@longway.tic.com

					       24 August 1987

       Attention: P1003 Working Group
       Secretary, IEEE Standards Board
       345 East 47th Street
       New York, NY 10017

       Cc: 1003.1 Technical Reviewers:

       Maggie Lee, 2	     Jeff Smits, 6	   Hal Jespersen, Rationale
       +1-408-746-7216	     +1-201-522-6263	   +1-415-420-6400
       ihnp4!amdahl!maggie   ihnp4!attunix!smits   ucbvax!unisoft!hlj

       There are several problems in IEEE Std 1003.1, Draft 11
       regarding writes to a pipe or FIFO.  These problems are
       sufficient to produce a no ballot from USENIX.  This
       objection includes discussion of the problems, their
       sources, and suggested solutions, including both standard
       and rationale text.


       1.  Problems

       1.1  Ambiguous O_NONBLOCK wording in Draft 11, 6.4.2.2.

       Understanding the case of the triple condition

	  + O_NONBLOCK is set,

	  + and {PIPE_BUF} < nbyte <= {PIPE_MAX},

	  + and 0 < immediately writable < nbyte,

       requires a close reading of Draft 11, 6.4.2.2, page 125,
       lines 224-227:








       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 2 of 11	  IEEE 1003.1 N.116



	    If the O_NONBLOCK flag is set, write() shall not block
	    the process.  If nbyte > {PIPE_BUF}, and some data can
	    be written without blocking the process, write() shall
	    write what it can and return the number of bytes
	    written.  Otherwise, it shall return -1 and errno shall
	    be set to [EAGAIN].

       It is not immediately obvious what ``Otherwise'' refers to
       (which clause of the condition?).  But in the context of the
       paragraph at lines 217-221 it must refer to the case when
       {PIPE_BUF} < nbyte <= {PIPE_MAX} and no data can be written
       without blocking the process.

       1.2  Nonblocking partial pipe writes are an option in
	    Draft 11.

       According to David Willcox, who was in many of the atomic
       pipe write small groups, the word ``can'' in both uses in
       the preceding quote is meant to refer to what the
       implementation permits.	In other words, the case where
       ``some data can be written'' may refer to there being some
       space free in the pipe, or the case may be null, meaning
       that [EAGAIN] will always be returned when {PIPE_BUF} <
       nbyte <= {PIPE_MAX}, regardless of whether there is free
       space in the pipe or not.  Which is to say that the standard
       permits the implementation to perform partial writes, but
       does not require it to do so.

       Partial writes are not implementation-defined (according to
       the definition in 2.1), because the standard completely
       describes their behavior (or attempts to).  So partial
       writes are an interface implementation option in Draft 11,
       even though they are not properly specified as such by the
       use of the word ``may'' or listing in 2.2.1.2.

       1.3  Incorrect error code?

       If partial writes are not implemented, the error [EAGAIN] is
       not appropriate, because the write will never succeed, no
       matter how many times it is retried.  Better would be
       [EINVAL], which matches the other cases where retrying will
       not help.  However, this argument assumes that {PIPE_BUF} is
       not only the maximum atomic size, but also the maximum
       amount writable on one operation: this may not be so; see
       below.









       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 3 of 11	  IEEE 1003.1 N.116



       1.4  {PIPE_MAX} with O_NONBLOCK clear.

       Should {PIPE_MAX} apply when O_NONBLOCK is not set?  All of
       Version 7, System V Release 3, 4.2BSD, and 4.3BSD permit
       arbitrarily large values of nbyte when O_NDELAY is not set.
       While it is possible to imagine a system where such a limit
       would be required by the implementation, there seem to be
       none at the moment, so there are probably no applications
       that depend on it.  The enforcement of such a limit would
       make pipes basically different from other things that
       write() can be applied to, requiring extra code in
       applications.  Thus there is no obvious advantage in
       portability for applications.  So {PIPE_MAX} should not be
       applied when O_NONBLOCK is clear.


       2.  Sources of the problems.

       There are three basic sources of confusion about the
       behavior of pipes and FIFOs (especially when the non-
       blocking flag is set):

	 1.  It is not clear what the various existing systems do.

	 2.  It is clear that they do many things differently.

	 3.  It is not clear what behavior is important to
	     applications, and thus worth standardizing.

       2.1  Existing systems.

       Some of the following descriptions may not be totally
       accurate, but they should serve to illustrate the point of
       diversity.

	  + Version 7 introduced atomicity of writes to pipes.	The
	    manual page write(2) guarantees that write requests of
	    4096 bytes or less will not be interleaved with writes
	    from any other process.  The purpose of this feature
	    was to allow multiple processes to write to the same
	    pipe while permitting a single reader to parse their
	    data.

	    4096 also happens to be the size of a pipe, and is
	    fixed at compile time (it is not larger because that
	    would have made pipes large files, that is, they would
	    have had indirect blocks).

	    Any amount (that will fit in an int) of data may be
	    requested on a single call to write().




       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 4 of 11	  IEEE 1003.1 N.116



	    Version 7 does not have a non-blocking flag.

	  + The SVID requires atomicity of writes to pipes when the
	    request is of {PIPE_BUF} bytes or less.  This feature
	    may have been introduced from the /usr/group Standard,
	    which had it.

	    There is no maximum write request, regardless of
	    whether O_NDELAY is set.

	    With O_NDELAY set, write requests of less than
	    {PIPE_BUF} bytes either succeed or return zero.  Write
	    requests of more than that may also succeed partially,
	    returning the amount written.

	  + 4.2BSD appears to guarantee atomicity of pipe write
	    requests up to 1024 bytes.	It will return an error for
	    requests for more than 4096 bytes when the O_NDELAY
	    flag is set.  Partial writes are not done.	With the
	    flag clear, any size write request will succeed
	    eventually.

	  + 4.3BSD does not guarantee atomicity of any size pipe
	    write (greater than one byte).  The maximum amount that
	    can be requested will vary dynamically, as will the
	    maximum amount that can be written on a single
	    operation.	With the O_NDELAY flag set, any write of
	    more than one byte may be partial.	UCB CSRG is
	    probably amenable to changing this behavior.

	  + Version 8 does not necessarily measure the maximum
	    amount of data that can be written to a pipe on a given
	    operation in bytes, i.e., it may depend on the number
	    of outstanding write requests.

	    There is no nonblocking flag in Version 8 or Version 9.

       2.2  Useful behavior.

       It is more useful to specify how an application should
       interpret a return value than it is to specify precisely
       when the implementation shall return it.	 I believe this
       observation may be the rope for climbing out of the chronic
       pipe write morasse.

       [EAGAIN]	   should mean that retrying later with the same
		   size request may succeed.  The Rationale should
		   recommend actions the application should take in
		   such a case.	 Because some systems dynamically
		   vary their pipe size, what would have succeeded
		   this time on an empty pipe may not succeed next



       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 5 of 11	  IEEE 1003.1 N.116



		   time.  Of course, if the request was for
		   {PIPE_BUF} or less bytes, retries shall
		   eventually succeed (unless no reader reads
		   enough from the pipe).  But it is not useful for
		   the standard to attempt to specify for exactly
		   what larger requests [EAGAIN] will be returned,
		   or the probability of success on later retries.
		   After all, if the reader does not read, no
		   retries will succeed.

       [EINVAL]	   should mean that retrying later with the same
		   size request shall never succeed.  But the
		   standard should not require the implementation
		   to always return this error at a fixed limit.

       There is no reason for the standard to try to specify what
       happens in every corner case produced by the intersections
       of all the known implementations.  The standard should
       specify behavior that promotes portability of applications
       and that is implementable relatively readily on existing
       systems.	 In addition, the behavior of writes to pipes or
       FIFOs should be made as little different from that of writes
       to other file descriptors as possible.  The main reason for
       making it different at all is that POSIX does not currently
       include any more sophisticated interprocess communication
       facility: for example, given a reliable sequenced datagram
       service, there would be no need to require pipes to be
       atomic.

	 1.  Atomic writes are useful.	The standard should specify
	     that write requests of {PIPE_BUF} or less bytes shall
	     be atomic, regardless of whether O_NONBLOCK is set.

	 2.  Write requests of more than {PIPE_BUF} bytes with
	     O_NONBLOCK set are useful.	 A real time data
	     acquisition process might want to write large amounts
	     of data through a pipe to a single processing process,
	     while never blocking.

	 3.  Partial writes are useful, but not useful enough for
	     the standard to require the implementation to include
	     them.  The standard should require portable
	     applications to expect them, however: since the
	     application should expect them for other kinds of
	     writes, anyway.  In other words, partial writes should
	     not be a major option, instead merely an
	     implementation-defined detail.  Exactly when they
	     occur is not important enough to specify (especially
	     considering that it is not specified for other kinds
	     of writes), except that they are prohibited when nbyte
	     <= {PIPE_BUF} because of the guarantee of atomicity.



       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 6 of 11	  IEEE 1003.1 N.116



	     There is no strong reason for an application to be
	     able to discover at compile or run time whether
	     partial writes are implemented: every application
	     should assume that they may be implemented.

       The usefulness of {PIPE_MAX} is slightly dubious, and it
       might be better to eliminate it, instead specifying that
       [EINVAL] may be returned whenever O_NONBLOCK is set and
       nbyte > {PIPE_BUF}.  But let us assume that it is useful.

	 1.  A maximum amount that can be requested without ever
	     producing [EINVAL] is worthwhile.	{PIPE_MAX} could be
	     used for this.  But it should not apply if O_NONBLOCK
	     is not set.

	 2.  {PIPE_MAX} >= {PIPE_BUF}.	Allowing {PIPE_MAX} <
	     {PIPE_BUF} would permit a guaranteed atomic write to
	     return [EINVAL], which is a contradiction.

	 3.  The standard should explicitly permit an
	     implementation to set {PIPE_MAX} = {PIPE_BUF}, simply
	     because there is no reason to prohibit it.	 This would
	     not rule out partial writes, but would mean that
	     applications running on such an implementation should
	     never depend on successful writes with nbyte >
	     {PIPE_BUF}.

	 4.  The standard should permit an implementation to set
	     {PIPE_MAX} = {INT_MAX}, meaning that [EINVAL] will
	     never be returned.	 That is effectively what some
	     implementations do, and there is no reason not to if
	     partial writes are implemented.

	 5.  An implementation could even set all three limits
	     equal: {PIPE_BUF} = {PIPE_MAX} = {INT_MAX}, meaning
	     that [EINVAL] will never be returned, there are no
	     partial writes, and all writes are atomic.

       Finally, this is an interface standard: it should not try to
       specify implementation details, such as the internal
       buffering arrangements of the pipe.  Such phrases as ``it
       shall write as much as it can'' are inappropriate.


       3.  Rewording.

       Here is rewording to account for the implications of the
       above arguments.

       The text and tables below include specifications and
       rationale for {PIPE_MAX}.  But, if the Working Group decides



       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 7 of 11	  IEEE 1003.1 N.116



       to drop {PIPE_MAX}, it can be excised with no ill effects.
       References to it should then also be removed from Draft 11
       2.9.2, page 42, lines 808-810, and 5.7.1.2, page 117, line
       971.

       3.1  Standard.

       Move the definition of {PIPE_MAX} down into the text that
       specifies what happens when O_NONBLOCK is set.  That is,
       first remove Draft 11 6.4.2.2, page 125, lines 215-216:

	    Write requests for greater than {PIPE_MAX} bytes shall
	    result in a return of value of -1 and set errno to
	    [EINVAL].

       Then replace the wording (quoted in 1.1 above) of Draft 11,
       6.4.2.2, page 125, lines 224-227 with this new wording:

	    If the O_NONBLOCK flag is set, write requests shall be
	    handled differently in the following ways: The write()
	    function shall not block the process.  Write requests
	    for {PIPE_BUF} or less bytes shall either succeed
	    completely and return nbyte, or return -1 and set errno
	    to [EAGAIN] to indicate that retrying the write() later
	    with the same arguments may succeed.  Write requests
	    for more than {PIPE_BUF} bytes may in addition write
	    some amount of data less than nbyte and return the
	    amount written.  Write requests for more than
	    {PIPE_MAX} bytes may in addition return -1 and set
	    errno to [EINVAL] to indicate that retrying the write()
	    later with the same arguments shall never succeed.
	    {PIPE_MAX} shall be greater than or equal to {PIPE_BUF}
	    and less than or equal to {INT_MAX}.

       The beginning of the following paragraph, 6.4.2.2, page 125,
       lines 228-229, is misleading and should be changed from

	    When attempting to write to a file descriptor...

       to

	    When attempting to write to a file descriptor (other
	    than one for a pipe or FIFO)...

       The meaning of [EINVAL] when set by write() as specified in
       6.4.2.4, page 126, lines 260-261, should be changed from

       [EINVAL]	   An attempt was made to write more than
		   {PIPE_MAX} bytes to a pipe or FIFO special file.
       to




       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 8 of 11	  IEEE 1003.1 N.116



       [EINVAL]	   An attempt was made to write to a pipe or FIFO
		   special file with a value of nbyte greater than
		   {PIPE_MAX} and also large enough that the
		   operation shall never succeed if retried.

       3.2  Rationale.

       In the Rationale, remove the editorial note from B.6.4.2,
       Page 240, line 2104, and replace B.6.4.2, Page 240, line
       2105 (``Write to a Pipe'') with:

       [begin replacement]

       An attempt to write to a pipe or FIFO has several major
       characteristics:

       Atomic/non-atomic
	    A write is atomic if the whole amount written in one
	    operation is not interleaved with data from any other
	    process.  This is useful when there are multiple
	    writers sending data to a single reader.  Applications
	    need to know how large a write request can be expected
	    to be performed atomically.	 We call this maximum
	    {PIPE_BUF}.	 The standard does not say whether write
	    requests for more than {PIPE_BUF} bytes will be atomic,
	    but requires that writes of {PIPE_BUF} or less bytes
	    shall be atomic.

       Blocking/immediate
	    Blocking is only possible with O_NONBLOCK clear.  If
	    there is enough space for all the data requested to be
	    written immediately, the implementation should do so.
	    Otherwise, the process may block, that is, pause until
	    enough space is available for writing.  The effective
	    size of a pipe or FIFO (the maximum amount that can be
	    written in one operation without blocking) may vary
	    dynamically, depending on the implementation, so it is
	    not possible to specify a fixed value for it.

       Complete/partial/deferred
	    A write request,

		 int fildes, nbyte, ret;
		 char *buf;

		 ret = write(fildes, buf, nbyte);

	    may return

	    complete:	ret = nbyte




       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems     Page 9 of 11	  IEEE 1003.1 N.116



	    partial:	ret < nbyte
			This shall never happen if nbyte <=
			{PIPE_BUF}.  If it does happen (with nbyte
			> {PIPE_BUF}), the standard does not
			guarantee atomicity, even if ret <=
			{PIPE_BUF}, because atomicity is guaranteed
			according to the amount requested, not the
			amount written.

	    deferred:	ret = -1, errno = [EAGAIN]
			This error indicates that a later request
			may succeed.  It does not indicate that it
			shall succeed, even if nbyte <= {PIPE_BUF},
			because if no process reads from the pipe
			or FIFO, the write will never succeed.	An
			application could usefully count the number
			of times [EAGAIN] is caused by a particular
			value of nbyte > {PIPE_BUF} and perhaps do
			later writes with a smaller value, on the
			assumption that the effective size of the
			pipe may have decreased.

	    Partial and deferred writes are only possible with
	    O_NONBLOCK set.

       Requestable/invalid
	    If a write request shall never succeed with the value
	    given for nbyte, the request is invalid, and write()
	    shall return -1 with errno set to [EINVAL].	 This is
	    only permitted to happen when nbyte > {PIPE_MAX} and
	    O_NONBLOCK is set, and it is never required to happen.
	    {PIPE_MAX} is not necessarily a minimum on the
	    effective size of a pipe or FIFO; if it says anything
	    about that size, it is that it sometimes varies above
	    {PIPE_MAX}.	 Because {PIPE_MAX} specifies the maximum
	    size write request that shall never cause [EINVAL], it
	    must be greater than or equal to the maximum atomic
	    write size, {PIPE_BUF}.  {PIPE_BUF} and {PIPE_MAX} may
	    be equal, which means that [EINVAL] may be produced by
	    any write of greater than {PIPE_BUF} bytes.	 {PIPE_MAX}
	    may be equal to {INT_MAX}, meaning that [EINVAL] shall
	    never be returned (unless nbyte > {INT_MAX}, when the
	    result is implementation-defined).	All three limits
	    may be equal, meaning that [EINVAL] shall never be
	    returned, no partial writes are done, and all completed
	    writes are atomic.	Applications should be prepared for
	    all these cases.

       The relations of these properties are best shown in tables.





       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems    Page 10 of 11	  IEEE 1003.1 N.116



	     ________________________________________________
	    | Write to a Pipe or FIFO with O_NONBLOCK clear.|
	    |_____________|_________________________________|
	    | immediately |				    |
	    |  writable:  |   none	  some	     nbyte  |
	    |_____________|_________________________________|
	    |		  | atomic	atomic	    atomic  |
	    |	nbyte <=  | blocking	blocking   immediate|
	    | {PIPE_BUF}  |   nbyte	 nbyte	     nbyte  |
	    |_____________|_________________________________|
	    |		  | atomic?	atomic?	    atomic? |
	    |	nbyte >	  | blocking	blocking   immediate|
	    | {PIPE_BUF}  |   nbyte	 nbyte	     nbyte  |
	    |_____________|_________________________________|

       If the O_NONBLOCK flag is clear, a write request shall block
       if the amount writable immediately is less than that
       requested.  If the flag is set (by fcntl()), a write request
       shall never block.

	__________________________________________________________
       |       Write to a Pipe or FIFO with O_NONBLOCK set.	 |
       |____________|____________________________________________|
       | immediately|						 |
       |  writable: |	   none		  some		nbyte	 |
       |____________|____________________________________________|
       |   nbyte <= |	   -1,		  -1,		atomic	 |
       | {PIPE_BUF} |	 [EAGAIN]	[EAGAIN]	nbyte	 |
       |____________|____________________________________________|
       |	    |			atomic?	       atomic?	 |
       |	    |			< nbyte	       <=nbyte	 |
       |   nbyte >  |	   -1,		 or -1,		or -1,	 |
       | {PIPE_BUF} |	 [EAGAIN]	[EAGAIN]       [EAGAIN]	 |
       |____________|____________________________________________|
       |	    |			atomic?	       atomic?	 |
       |	    |			< nbyte	       <=nbyte	 |
       |   nbyte >  |	   -1,		 or -1,		or -1,	 |
       | {PIPE_MAX} |	([EAGAIN]      ([EAGAIN]      ([EAGAIN]	 |
       |	    |  or [EINVAL])   or [EINVAL])   or [EINVAL])|
       |____________|____________________________________________|

       There is no way provided for an application to determine
       whether the implementation will ever perform partial writes
       to a pipe or FIFO.  Every application should be prepared to
       handle partial writes when O_NONBLOCK is set and the
       requested amount is greater than {PIPE_BUF}, just as every
       application should be prepared to handle partial writes on
       other kinds of file descriptors.

       Where the standard requires -1 returned and errno set to
       [EAGAIN], most historical implementations return 0 (with the



       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $



       Pipe Write Problems    Page 11 of 11	  IEEE 1003.1 N.116



       O_NDELAY flag set: that flag is the historical predecessor
       of O_NONBLOCK, but is not itself in the standard).  The
       error indications in the standard were chosen so that an
       application can distinguish these cases from end of file.
       While write() cannot receive an indication of end of file,
       read() can, and the Working Group chose to make the two
       functions have similar return values.  Also, some existing
       systems (e.g., Version 8) permit a write of zero bytes to
       mean that the reader should get an end of file indication:
       for those systems, a return value of zero from write
       indicates a successful write of an end of file indication.
       [end replacement]










































       $Revision: 3.1 $		  DRAFT	 $Date: 87/08/24 10:54:56 $







				 CONTENTS


       1.  Problems.............................................  1
	   1.1	Ambiguous O_NONBLOCK wording in Draft 11,
		6.4.2.2.........................................  1
	   1.2	Nonblocking partial pipe writes are an option
		in Draft 11.....................................  2
	   1.3	Incorrect error code?...........................  2
	   1.4	{PIPE_MAX} with O_NONBLOCK clear................  3

       2.  Sources of the problems..............................  3
	   2.1	Existing systems................................  3
	   2.2	Useful behavior.................................  4

       3.  Rewording............................................  6
	   3.1	Standard........................................  7
	   3.2	Rationale.......................................  8




































				  - i -





Volume-Number: Volume 12, Number 22