obrien@RAND-UNIX@sri-unix (08/09/82)
Date: Thursday, 22 Jul 1982 17:43-PDT There is a problem with devices which are single-use, when a process which has one open dies on a signal. It would appear that there are cases where the close routine is not called, hence locking the device until reboot (or mucking in /dev/kmem with adb, which amounts to the same thing). I believe Berkeley mentioned this, but did not have a fix. Does anyone out there know this symptom, and have a fix (or at least an explanation)? This has occurred in every version of UNIX I've ever seen, from V6 to 4.1BSD. It's particularly annoying when you gradually lose all of the "/dev/imp?" devices for talking to a network. I've also lost the magtape drive on occasion, though not under 4.1. It doesn't happen every time a process dies on a signal, just sometimes. TTY-generated signals do not seem to cause the problem as much as other signals.
thomas (08/09/82)
I've run into this and finally concluded that it was a result of the close routine calling sleep() with an interruptable priority level. If a signal occurs during the sleep, the close is forcibly exited (with a longjump) and any cleanup following the sleep never occurs. On the other hand, if the sleep is called with a non-interuptable priority level, and the awaited event never occurs, there is no way to kill the process. The best solution I can think of is to sleep at a non-interruptable level, but to invoke a timeout routine to terminate the sleep after some reasonable period. Messy, yes, but device interactions always are. =Spencer
dan@Bbn-Unix@sri-unix (08/20/82)
From: Dan Franklin <dan@Bbn-Unix> Date: 9 Aug 1982 9:10:53 EDT (Monday) One way I know of that locking might not get undone on a signal is the general UNIX open bug. For any device which sleeps at interruptable priority in its open routine, if a signal comes along while it is sleeping there, control jumps directly back into the trap routine, bypassing any little cleanup like deletion of the fd, clearing out the lock bit, etc. This problem was alluded to in an earlier message about file descriptors which suggested the user-level hack of "predicting" the fd before doing the open, and then, if the open failed, closing this fd. A simple solution for the locking problem might be to avoid locking until after the sleep; however, this means that several processes can contend for the device until that time, which might defeat the purpose of locking for your application. A general solution is to have devices which sleep in the open routine at interruptable priority make a copy of the value of u.u_qsav and set up their own handler for interrupts (via a "save" into u.u_qsav) just before going to sleep. Then when an interrupt occurs, the code should clean up as necessary and resume at the saved value of u.u_qsav. I haven't tried it yet, though; in the one place it hurt us, in the network mailer (which would run out of fds), 'predicting' the fd was enough. Dan Franklin