[net.bugs.4bsd] restartable write

ddl@tardis.UUCP (Dan Lanciani) (12/05/85)

[]
	Contrary to popular belief (and the documentation), restartable
write(2)'s to "slow" devices don't really work the way one might hope.
This might not be reason for concern if "slow" devices were limited to
tty's.  (One of the original 3 implementors of the (now)BSD pseudo-job-
control system has pointed out to me that this was the intention.)  Sadly,
recent versions of UNIX have more, new "slow" devices.  In 2.9, the only
real problems are sockets ("If you are hacking on the network, you
deserve what you get...") but in 4.{2,3}, even the lowly pipe is a
"slow" device (actually a socket).  Mere mortals often use pipes.
	Digression:  The implementation of restartable system calls
may not be obvious.  It goes something like this:  An IO request to
a "slow" device needs to block because data is not ready or there
is too much data in an output queue.  It sleep()'s at an interruptable
priority and may be awakened by an arbitrary signal.  After one or
more longjmp(yuck)'s the system call interface (I include the top-level
of the system "function" called.) will notice that it would now be good
to get out of the kernel and let the luser's routine do something.  If
the system call was, for example, a read or a write to one of those
"slow" devices, the following trick is used to allow it to continue
after the signal is processed:
	If (and only if) NO data has been transferred, the luser's
pc is backed up to before the trap instruction and the stack is
set up so that, on return from the signal handler, the system call
will be reexecuted.  This is fine and dandy if NO data was indeed
transfered as will be the case if a read is being processed or if
a write has initially blocked without doing anything.  However,
consider if you will, the poor write that is partly finished when
it runs out of clists or what have you.  It cannot be "restarted"
as data would be duplicated, so the write simply returns.  Although
the number of bytes written is correctly returned, errno is not
set, and some of the bytes to have been written have simply disappeared.
	Even this might not be especially disasterous if programs
(libraries (stdio)) dealt with these situations.  They do not.

Example (4.{2,3}):

% ex BIGFILE | (sleep 30; cat) > foo
1,$p
...wait ten seconds or so...
^Z
Stopped
% fg
q
%
...note that foo doesn't contain all of BIGFILE.  Start to worry.
If this seems rather contrived, consider the effort to reduce to one
line of shell what has probably caused subtle glitches in complicated
C programs...

What happened:  The pipe's (socket's) buffer filled up during ex's
write and ex was blocked.  Some output drained and ex continued but
was again blocked (this can repeat).  Ex caught the TSTP signal and
could not have its (partly completed) write to the pipe restarted.
The pipe lost data.  Nobody noticed.

	Most of this misery stems from my desire to implement a
network file system (I probably wouldn't have looked this closely
from mere curiosity), but the problem is real.  In my case, I wanted
to call so{send, receive}() and be sure to have them return...  preferably
after doing what I asked...  And since I didn't want to alter their
sleeping priorities globally, I couldn't.  Well, one more flag bit,
NOSIG, should fix it real good.  Hack.  Hack.  Hack.

					Dan Lanciani
					ddl@tardis