clapper@nadc.arpa (09/03/86)
We are running System V Release 2.0 on a VAX 11/780, and we've encountered a very strange problem regarding handling of a hangup signal. Our software is comprised of several concurrently-running background processes, which trap the hangup signal. One of these processes is a "master" process of sorts; this process synchronizes clean-up processing after hangup has occurred. It sends a message to the one remaining process, "telling" it to terminate. Further, it removes a number of IPC structures and files. Now we come to the problem: If I send a hangup signal to this master process using a "kill -1 ...", clean-up proceeds without a hitch. If I log off the system by entering a <ctl-d> at the shell prompt, clean-up also works perfectly. If, however, I disconnect the line (a hardware hangup, if you will), the master process never finishes its clean-up. Files which should be deleted often aren't. IPC structures which should be removed often remain in the system. It's as though the process was killed before it completed its job. I've encountered this problem in three contexts: 1. On a direct-connect line, where disconnect is achieved by shutting off the power to the terminal. 2. On a dial-up line, where disconnect is achieved by physically unplugging the telephone line. 3. From a local cable network, which allows one to call a port on the machine, and where disconnect is achieved by an "escape" command to the network controller. This problem also occurs with Kermit, which doesn't remove its lock-file if the user logs off the system using a "hardware" hangup. To make matters even more confusing, if I run a small test program which does nothing but trap hangup and write a message to a file when hangup occurs, I cannot replicate the problem. The test program traps the hangup signal no matter how it is generated. We've speculated that both our software and Kermit are using some other feature of the system which somehow inhibits proper hangup processing, but we have been unable to gather additional information. Our S.A. is also stumped. Has anyone else encountered this problem? Or, does anyone have a clue as to its cause? Our application MUST be able to trap all forms of hangup, so this problem kind of puts us in a bind. I intend to call AT&T support, but based on past experience with them, I'm don't expect much relief. Please address any replies directly to the E-Mail address below; I often miss issues of UNIX-WIZARDS. Thanks in advance to anyone who can help. Brian M. Clapper clapper@nadc.ARPA Naval Air Development Center Warminster, PA
ron@BRL.ARPA (Ron Natalie) (09/04/86)
After receiving a hang up signal, processes should neither try to write anything on the terminal, nor try to open the tty or "/dev/tty" device. To do so will likely cause a hang on lines with modem control. -Ron
rml@hpfcdc.UUCP (09/04/86)
One possibility that comes to mind is the following scenario: - "hardware hangup" occurs, sending SIGHUP to process group - process group leader (possibly login shell) receives SIGHUP and dies - exit processing sends SIGHUP to process group again - second SIGHUP arrives while signal is set to SIG_DFL (due to receipt of the first SIGHUP). If this is the case, there's not much you can do on V.2. Another (more remote) possibility that occurred to me is that a break is being detected prior to the hangup. In that case you should be able to ignore or catch SIGINT. You may be able to debug the problem by spawning the application from an extra process which ignores SIGHUP, waits for its child(ren) and looks at the exit status. You could also find the exit status in the process accounting data (see acct(4)). In my first scenario the exit status will be 1 (SIGHUP); in the second 2 (SIGINT). Bob Lenk {hplabs, ihnp4}!hpfcla!rml