pearce@tycho.yerkes.uchicago.edu (Eric C. Pearce) (03/11/89)
I'll post this since many questions surrounding this bug have appeared recently. > The problem: Due to what I understand to be a bug in the socket code, If > a ^C is sent to the server at an inappropriate time, i.e. when a socket is > connected, the entire Sun goes into rigor mortis. (At least, it appears > that this is when it happens.) It also will halt if, while the socket is > open, I go up to the menu and select "quit." This is a common bug in Unix domain sockets that I have had to work around for some time now. The bug was fixed in BSD but is not in Sun OS 3.5. I have heard claims that it is fixed in 4.0. A quote from some documentation about our package, from the "Bugs" node: If a process that is blocked on an accept(2) while listening for a connection on a Unix domain socket is killed, Sun OS 3.5 will hang. User processes will not respond, and the test lights on the back of the CPU will move very slowly. The only recourse is to abort with L1-A, sync the disks, and reboot. The kernal becomes hung in a very tight loop while attempting to clean up after the socket. This bug can be quite frustrating when developing software when clients fail to connect and you kill the server. Internet domain sockets do not have this problem and may be used if this is a real problem. Hope this helps. Our software even has a compile time flag ALWAYS_ATTACH_INET which fakes Unix domain accept(), connect() and listen() with INet domain sockets to "localhost". That way, when we upgrade to a system with reliable Unix domain sockets, we can go back to using them. While catching the ^C signal will keep the system from hanging, any unexpected death of your server (say a seg. fault) still may crash the machine if the Unix domain sockets are used. I would consider this unexceptable and just use the Inet domain. - Ecp.