jsq@usenix.uucp (John Quarterman) (08/25/87)
From: jsq@usenix.uucp (John Quarterman)
Pipe Write Problems Page 1 of 11 IEEE 1003.1 N.116
John S. Quarterman
Institutional Representative
From USENIX to IEEE P1003
{uunet,ucbvax,seismo}!usenix!jsq
Texas Internet Consulting
701 Brazos, Suite 500
Austin, Texas 78701-3243
+1-512-320-9031
jsq@longway.tic.com
24 August 1987
Attention: P1003 Working Group
Secretary, IEEE Standards Board
345 East 47th Street
New York, NY 10017
Cc: 1003.1 Technical Reviewers:
Maggie Lee, 2 Jeff Smits, 6 Hal Jespersen, Rationale
+1-408-746-7216 +1-201-522-6263 +1-415-420-6400
ihnp4!amdahl!maggie ihnp4!attunix!smits ucbvax!unisoft!hlj
There are several problems in IEEE Std 1003.1, Draft 11
regarding writes to a pipe or FIFO. These problems are
sufficient to produce a no ballot from USENIX. This
objection includes discussion of the problems, their
sources, and suggested solutions, including both standard
and rationale text.
1. Problems
1.1 Ambiguous O_NONBLOCK wording in Draft 11, 6.4.2.2.
Understanding the case of the triple condition
+ O_NONBLOCK is set,
+ and {PIPE_BUF} < nbyte <= {PIPE_MAX},
+ and 0 < immediately writable < nbyte,
requires a close reading of Draft 11, 6.4.2.2, page 125,
lines 224-227:
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 2 of 11 IEEE 1003.1 N.116
If the O_NONBLOCK flag is set, write() shall not block
the process. If nbyte > {PIPE_BUF}, and some data can
be written without blocking the process, write() shall
write what it can and return the number of bytes
written. Otherwise, it shall return -1 and errno shall
be set to [EAGAIN].
It is not immediately obvious what ``Otherwise'' refers to
(which clause of the condition?). But in the context of the
paragraph at lines 217-221 it must refer to the case when
{PIPE_BUF} < nbyte <= {PIPE_MAX} and no data can be written
without blocking the process.
1.2 Nonblocking partial pipe writes are an option in
Draft 11.
According to David Willcox, who was in many of the atomic
pipe write small groups, the word ``can'' in both uses in
the preceding quote is meant to refer to what the
implementation permits. In other words, the case where
``some data can be written'' may refer to there being some
space free in the pipe, or the case may be null, meaning
that [EAGAIN] will always be returned when {PIPE_BUF} <
nbyte <= {PIPE_MAX}, regardless of whether there is free
space in the pipe or not. Which is to say that the standard
permits the implementation to perform partial writes, but
does not require it to do so.
Partial writes are not implementation-defined (according to
the definition in 2.1), because the standard completely
describes their behavior (or attempts to). So partial
writes are an interface implementation option in Draft 11,
even though they are not properly specified as such by the
use of the word ``may'' or listing in 2.2.1.2.
1.3 Incorrect error code?
If partial writes are not implemented, the error [EAGAIN] is
not appropriate, because the write will never succeed, no
matter how many times it is retried. Better would be
[EINVAL], which matches the other cases where retrying will
not help. However, this argument assumes that {PIPE_BUF} is
not only the maximum atomic size, but also the maximum
amount writable on one operation: this may not be so; see
below.
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 3 of 11 IEEE 1003.1 N.116
1.4 {PIPE_MAX} with O_NONBLOCK clear.
Should {PIPE_MAX} apply when O_NONBLOCK is not set? All of
Version 7, System V Release 3, 4.2BSD, and 4.3BSD permit
arbitrarily large values of nbyte when O_NDELAY is not set.
While it is possible to imagine a system where such a limit
would be required by the implementation, there seem to be
none at the moment, so there are probably no applications
that depend on it. The enforcement of such a limit would
make pipes basically different from other things that
write() can be applied to, requiring extra code in
applications. Thus there is no obvious advantage in
portability for applications. So {PIPE_MAX} should not be
applied when O_NONBLOCK is clear.
2. Sources of the problems.
There are three basic sources of confusion about the
behavior of pipes and FIFOs (especially when the non-
blocking flag is set):
1. It is not clear what the various existing systems do.
2. It is clear that they do many things differently.
3. It is not clear what behavior is important to
applications, and thus worth standardizing.
2.1 Existing systems.
Some of the following descriptions may not be totally
accurate, but they should serve to illustrate the point of
diversity.
+ Version 7 introduced atomicity of writes to pipes. The
manual page write(2) guarantees that write requests of
4096 bytes or less will not be interleaved with writes
from any other process. The purpose of this feature
was to allow multiple processes to write to the same
pipe while permitting a single reader to parse their
data.
4096 also happens to be the size of a pipe, and is
fixed at compile time (it is not larger because that
would have made pipes large files, that is, they would
have had indirect blocks).
Any amount (that will fit in an int) of data may be
requested on a single call to write().
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 4 of 11 IEEE 1003.1 N.116
Version 7 does not have a non-blocking flag.
+ The SVID requires atomicity of writes to pipes when the
request is of {PIPE_BUF} bytes or less. This feature
may have been introduced from the /usr/group Standard,
which had it.
There is no maximum write request, regardless of
whether O_NDELAY is set.
With O_NDELAY set, write requests of less than
{PIPE_BUF} bytes either succeed or return zero. Write
requests of more than that may also succeed partially,
returning the amount written.
+ 4.2BSD appears to guarantee atomicity of pipe write
requests up to 1024 bytes. It will return an error for
requests for more than 4096 bytes when the O_NDELAY
flag is set. Partial writes are not done. With the
flag clear, any size write request will succeed
eventually.
+ 4.3BSD does not guarantee atomicity of any size pipe
write (greater than one byte). The maximum amount that
can be requested will vary dynamically, as will the
maximum amount that can be written on a single
operation. With the O_NDELAY flag set, any write of
more than one byte may be partial. UCB CSRG is
probably amenable to changing this behavior.
+ Version 8 does not necessarily measure the maximum
amount of data that can be written to a pipe on a given
operation in bytes, i.e., it may depend on the number
of outstanding write requests.
There is no nonblocking flag in Version 8 or Version 9.
2.2 Useful behavior.
It is more useful to specify how an application should
interpret a return value than it is to specify precisely
when the implementation shall return it. I believe this
observation may be the rope for climbing out of the chronic
pipe write morasse.
[EAGAIN] should mean that retrying later with the same
size request may succeed. The Rationale should
recommend actions the application should take in
such a case. Because some systems dynamically
vary their pipe size, what would have succeeded
this time on an empty pipe may not succeed next
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 5 of 11 IEEE 1003.1 N.116
time. Of course, if the request was for
{PIPE_BUF} or less bytes, retries shall
eventually succeed (unless no reader reads
enough from the pipe). But it is not useful for
the standard to attempt to specify for exactly
what larger requests [EAGAIN] will be returned,
or the probability of success on later retries.
After all, if the reader does not read, no
retries will succeed.
[EINVAL] should mean that retrying later with the same
size request shall never succeed. But the
standard should not require the implementation
to always return this error at a fixed limit.
There is no reason for the standard to try to specify what
happens in every corner case produced by the intersections
of all the known implementations. The standard should
specify behavior that promotes portability of applications
and that is implementable relatively readily on existing
systems. In addition, the behavior of writes to pipes or
FIFOs should be made as little different from that of writes
to other file descriptors as possible. The main reason for
making it different at all is that POSIX does not currently
include any more sophisticated interprocess communication
facility: for example, given a reliable sequenced datagram
service, there would be no need to require pipes to be
atomic.
1. Atomic writes are useful. The standard should specify
that write requests of {PIPE_BUF} or less bytes shall
be atomic, regardless of whether O_NONBLOCK is set.
2. Write requests of more than {PIPE_BUF} bytes with
O_NONBLOCK set are useful. A real time data
acquisition process might want to write large amounts
of data through a pipe to a single processing process,
while never blocking.
3. Partial writes are useful, but not useful enough for
the standard to require the implementation to include
them. The standard should require portable
applications to expect them, however: since the
application should expect them for other kinds of
writes, anyway. In other words, partial writes should
not be a major option, instead merely an
implementation-defined detail. Exactly when they
occur is not important enough to specify (especially
considering that it is not specified for other kinds
of writes), except that they are prohibited when nbyte
<= {PIPE_BUF} because of the guarantee of atomicity.
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 6 of 11 IEEE 1003.1 N.116
There is no strong reason for an application to be
able to discover at compile or run time whether
partial writes are implemented: every application
should assume that they may be implemented.
The usefulness of {PIPE_MAX} is slightly dubious, and it
might be better to eliminate it, instead specifying that
[EINVAL] may be returned whenever O_NONBLOCK is set and
nbyte > {PIPE_BUF}. But let us assume that it is useful.
1. A maximum amount that can be requested without ever
producing [EINVAL] is worthwhile. {PIPE_MAX} could be
used for this. But it should not apply if O_NONBLOCK
is not set.
2. {PIPE_MAX} >= {PIPE_BUF}. Allowing {PIPE_MAX} <
{PIPE_BUF} would permit a guaranteed atomic write to
return [EINVAL], which is a contradiction.
3. The standard should explicitly permit an
implementation to set {PIPE_MAX} = {PIPE_BUF}, simply
because there is no reason to prohibit it. This would
not rule out partial writes, but would mean that
applications running on such an implementation should
never depend on successful writes with nbyte >
{PIPE_BUF}.
4. The standard should permit an implementation to set
{PIPE_MAX} = {INT_MAX}, meaning that [EINVAL] will
never be returned. That is effectively what some
implementations do, and there is no reason not to if
partial writes are implemented.
5. An implementation could even set all three limits
equal: {PIPE_BUF} = {PIPE_MAX} = {INT_MAX}, meaning
that [EINVAL] will never be returned, there are no
partial writes, and all writes are atomic.
Finally, this is an interface standard: it should not try to
specify implementation details, such as the internal
buffering arrangements of the pipe. Such phrases as ``it
shall write as much as it can'' are inappropriate.
3. Rewording.
Here is rewording to account for the implications of the
above arguments.
The text and tables below include specifications and
rationale for {PIPE_MAX}. But, if the Working Group decides
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 7 of 11 IEEE 1003.1 N.116
to drop {PIPE_MAX}, it can be excised with no ill effects.
References to it should then also be removed from Draft 11
2.9.2, page 42, lines 808-810, and 5.7.1.2, page 117, line
971.
3.1 Standard.
Move the definition of {PIPE_MAX} down into the text that
specifies what happens when O_NONBLOCK is set. That is,
first remove Draft 11 6.4.2.2, page 125, lines 215-216:
Write requests for greater than {PIPE_MAX} bytes shall
result in a return of value of -1 and set errno to
[EINVAL].
Then replace the wording (quoted in 1.1 above) of Draft 11,
6.4.2.2, page 125, lines 224-227 with this new wording:
If the O_NONBLOCK flag is set, write requests shall be
handled differently in the following ways: The write()
function shall not block the process. Write requests
for {PIPE_BUF} or less bytes shall either succeed
completely and return nbyte, or return -1 and set errno
to [EAGAIN] to indicate that retrying the write() later
with the same arguments may succeed. Write requests
for more than {PIPE_BUF} bytes may in addition write
some amount of data less than nbyte and return the
amount written. Write requests for more than
{PIPE_MAX} bytes may in addition return -1 and set
errno to [EINVAL] to indicate that retrying the write()
later with the same arguments shall never succeed.
{PIPE_MAX} shall be greater than or equal to {PIPE_BUF}
and less than or equal to {INT_MAX}.
The beginning of the following paragraph, 6.4.2.2, page 125,
lines 228-229, is misleading and should be changed from
When attempting to write to a file descriptor...
to
When attempting to write to a file descriptor (other
than one for a pipe or FIFO)...
The meaning of [EINVAL] when set by write() as specified in
6.4.2.4, page 126, lines 260-261, should be changed from
[EINVAL] An attempt was made to write more than
{PIPE_MAX} bytes to a pipe or FIFO special file.
to
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 8 of 11 IEEE 1003.1 N.116
[EINVAL] An attempt was made to write to a pipe or FIFO
special file with a value of nbyte greater than
{PIPE_MAX} and also large enough that the
operation shall never succeed if retried.
3.2 Rationale.
In the Rationale, remove the editorial note from B.6.4.2,
Page 240, line 2104, and replace B.6.4.2, Page 240, line
2105 (``Write to a Pipe'') with:
[begin replacement]
An attempt to write to a pipe or FIFO has several major
characteristics:
Atomic/non-atomic
A write is atomic if the whole amount written in one
operation is not interleaved with data from any other
process. This is useful when there are multiple
writers sending data to a single reader. Applications
need to know how large a write request can be expected
to be performed atomically. We call this maximum
{PIPE_BUF}. The standard does not say whether write
requests for more than {PIPE_BUF} bytes will be atomic,
but requires that writes of {PIPE_BUF} or less bytes
shall be atomic.
Blocking/immediate
Blocking is only possible with O_NONBLOCK clear. If
there is enough space for all the data requested to be
written immediately, the implementation should do so.
Otherwise, the process may block, that is, pause until
enough space is available for writing. The effective
size of a pipe or FIFO (the maximum amount that can be
written in one operation without blocking) may vary
dynamically, depending on the implementation, so it is
not possible to specify a fixed value for it.
Complete/partial/deferred
A write request,
int fildes, nbyte, ret;
char *buf;
ret = write(fildes, buf, nbyte);
may return
complete: ret = nbyte
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 9 of 11 IEEE 1003.1 N.116
partial: ret < nbyte
This shall never happen if nbyte <=
{PIPE_BUF}. If it does happen (with nbyte
> {PIPE_BUF}), the standard does not
guarantee atomicity, even if ret <=
{PIPE_BUF}, because atomicity is guaranteed
according to the amount requested, not the
amount written.
deferred: ret = -1, errno = [EAGAIN]
This error indicates that a later request
may succeed. It does not indicate that it
shall succeed, even if nbyte <= {PIPE_BUF},
because if no process reads from the pipe
or FIFO, the write will never succeed. An
application could usefully count the number
of times [EAGAIN] is caused by a particular
value of nbyte > {PIPE_BUF} and perhaps do
later writes with a smaller value, on the
assumption that the effective size of the
pipe may have decreased.
Partial and deferred writes are only possible with
O_NONBLOCK set.
Requestable/invalid
If a write request shall never succeed with the value
given for nbyte, the request is invalid, and write()
shall return -1 with errno set to [EINVAL]. This is
only permitted to happen when nbyte > {PIPE_MAX} and
O_NONBLOCK is set, and it is never required to happen.
{PIPE_MAX} is not necessarily a minimum on the
effective size of a pipe or FIFO; if it says anything
about that size, it is that it sometimes varies above
{PIPE_MAX}. Because {PIPE_MAX} specifies the maximum
size write request that shall never cause [EINVAL], it
must be greater than or equal to the maximum atomic
write size, {PIPE_BUF}. {PIPE_BUF} and {PIPE_MAX} may
be equal, which means that [EINVAL] may be produced by
any write of greater than {PIPE_BUF} bytes. {PIPE_MAX}
may be equal to {INT_MAX}, meaning that [EINVAL] shall
never be returned (unless nbyte > {INT_MAX}, when the
result is implementation-defined). All three limits
may be equal, meaning that [EINVAL] shall never be
returned, no partial writes are done, and all completed
writes are atomic. Applications should be prepared for
all these cases.
The relations of these properties are best shown in tables.
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 10 of 11 IEEE 1003.1 N.116
________________________________________________
| Write to a Pipe or FIFO with O_NONBLOCK clear.|
|_____________|_________________________________|
| immediately | |
| writable: | none some nbyte |
|_____________|_________________________________|
| | atomic atomic atomic |
| nbyte <= | blocking blocking immediate|
| {PIPE_BUF} | nbyte nbyte nbyte |
|_____________|_________________________________|
| | atomic? atomic? atomic? |
| nbyte > | blocking blocking immediate|
| {PIPE_BUF} | nbyte nbyte nbyte |
|_____________|_________________________________|
If the O_NONBLOCK flag is clear, a write request shall block
if the amount writable immediately is less than that
requested. If the flag is set (by fcntl()), a write request
shall never block.
__________________________________________________________
| Write to a Pipe or FIFO with O_NONBLOCK set. |
|____________|____________________________________________|
| immediately| |
| writable: | none some nbyte |
|____________|____________________________________________|
| nbyte <= | -1, -1, atomic |
| {PIPE_BUF} | [EAGAIN] [EAGAIN] nbyte |
|____________|____________________________________________|
| | atomic? atomic? |
| | < nbyte <=nbyte |
| nbyte > | -1, or -1, or -1, |
| {PIPE_BUF} | [EAGAIN] [EAGAIN] [EAGAIN] |
|____________|____________________________________________|
| | atomic? atomic? |
| | < nbyte <=nbyte |
| nbyte > | -1, or -1, or -1, |
| {PIPE_MAX} | ([EAGAIN] ([EAGAIN] ([EAGAIN] |
| | or [EINVAL]) or [EINVAL]) or [EINVAL])|
|____________|____________________________________________|
There is no way provided for an application to determine
whether the implementation will ever perform partial writes
to a pipe or FIFO. Every application should be prepared to
handle partial writes when O_NONBLOCK is set and the
requested amount is greater than {PIPE_BUF}, just as every
application should be prepared to handle partial writes on
other kinds of file descriptors.
Where the standard requires -1 returned and errno set to
[EAGAIN], most historical implementations return 0 (with the
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
Pipe Write Problems Page 11 of 11 IEEE 1003.1 N.116
O_NDELAY flag set: that flag is the historical predecessor
of O_NONBLOCK, but is not itself in the standard). The
error indications in the standard were chosen so that an
application can distinguish these cases from end of file.
While write() cannot receive an indication of end of file,
read() can, and the Working Group chose to make the two
functions have similar return values. Also, some existing
systems (e.g., Version 8) permit a write of zero bytes to
mean that the reader should get an end of file indication:
for those systems, a return value of zero from write
indicates a successful write of an end of file indication.
[end replacement]
$Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
CONTENTS
1. Problems............................................. 1
1.1 Ambiguous O_NONBLOCK wording in Draft 11,
6.4.2.2......................................... 1
1.2 Nonblocking partial pipe writes are an option
in Draft 11..................................... 2
1.3 Incorrect error code?........................... 2
1.4 {PIPE_MAX} with O_NONBLOCK clear................ 3
2. Sources of the problems.............................. 3
2.1 Existing systems................................ 3
2.2 Useful behavior................................. 4
3. Rewording............................................ 6
3.1 Standard........................................ 7
3.2 Rationale....................................... 8
- i -
Volume-Number: Volume 12, Number 22