ddl@tardis.UUCP (Dan Lanciani) (12/05/85)
[] Contrary to popular belief (and the documentation), restartable write(2)'s to "slow" devices don't really work the way one might hope. This might not be reason for concern if "slow" devices were limited to tty's. (One of the original 3 implementors of the (now)BSD pseudo-job- control system has pointed out to me that this was the intention.) Sadly, recent versions of UNIX have more, new "slow" devices. In 2.9, the only real problems are sockets ("If you are hacking on the network, you deserve what you get...") but in 4.{2,3}, even the lowly pipe is a "slow" device (actually a socket). Mere mortals often use pipes. Digression: The implementation of restartable system calls may not be obvious. It goes something like this: An IO request to a "slow" device needs to block because data is not ready or there is too much data in an output queue. It sleep()'s at an interruptable priority and may be awakened by an arbitrary signal. After one or more longjmp(yuck)'s the system call interface (I include the top-level of the system "function" called.) will notice that it would now be good to get out of the kernel and let the luser's routine do something. If the system call was, for example, a read or a write to one of those "slow" devices, the following trick is used to allow it to continue after the signal is processed: If (and only if) NO data has been transferred, the luser's pc is backed up to before the trap instruction and the stack is set up so that, on return from the signal handler, the system call will be reexecuted. This is fine and dandy if NO data was indeed transfered as will be the case if a read is being processed or if a write has initially blocked without doing anything. However, consider if you will, the poor write that is partly finished when it runs out of clists or what have you. It cannot be "restarted" as data would be duplicated, so the write simply returns. Although the number of bytes written is correctly returned, errno is not set, and some of the bytes to have been written have simply disappeared. Even this might not be especially disasterous if programs (libraries (stdio)) dealt with these situations. They do not. Example (4.{2,3}): % ex BIGFILE | (sleep 30; cat) > foo 1,$p ...wait ten seconds or so... ^Z Stopped % fg q % ...note that foo doesn't contain all of BIGFILE. Start to worry. If this seems rather contrived, consider the effort to reduce to one line of shell what has probably caused subtle glitches in complicated C programs... What happened: The pipe's (socket's) buffer filled up during ex's write and ex was blocked. Some output drained and ex continued but was again blocked (this can repeat). Ex caught the TSTP signal and could not have its (partly completed) write to the pipe restarted. The pipe lost data. Nobody noticed. Most of this misery stems from my desire to implement a network file system (I probably wouldn't have looked this closely from mere curiosity), but the problem is real. In my case, I wanted to call so{send, receive}() and be sure to have them return... preferably after doing what I asked... And since I didn't want to alter their sleeping priorities globally, I couldn't. Well, one more flag bit, NOSIG, should fix it real good. Hack. Hack. Hack. Dan Lanciani ddl@tardis