[net.unix-wizards] System V and the Hangup Signal

clapper@nadc.arpa (09/03/86)

We are running System V Release 2.0 on a VAX 11/780, and we've encountered
a very strange problem regarding handling of a hangup signal.  Our software
is comprised of several concurrently-running background processes, which trap
the hangup signal.  One of these processes is a "master" process of sorts;
this process synchronizes clean-up processing after hangup has occurred.
It sends a message to the one remaining process, "telling" it to terminate.
Further, it removes a number of IPC structures and files.

Now we come to the problem:  If I send a hangup signal to this master
process using a "kill -1 ...", clean-up proceeds without a hitch.  If
I log off the system by entering a <ctl-d> at the shell prompt, clean-up
also works perfectly.  If, however, I disconnect the line (a hardware
hangup, if you will),  the master process never finishes its clean-up.
Files which should be deleted often aren't.  IPC structures which should
be removed often remain in the system.  It's as though the process was
killed before it completed its job.

I've encountered this problem in three contexts:  
   1. On a direct-connect line, where disconnect is achieved by shutting 
      off the power to the terminal.  
   2. On a dial-up line, where disconnect is achieved by physically 
      unplugging the telephone line.  
   3. From a local cable network, which allows one to call a port on the
      machine, and where disconnect is achieved by an "escape" command to 
      the network controller.

This problem also occurs with Kermit, which doesn't remove its lock-file
if the user logs off the system using a "hardware" hangup.  To make matters
even more confusing, if I run a small test program which does nothing
but trap hangup and write a message to a file when hangup occurs, I
cannot replicate the problem.  The test program traps the hangup signal
no matter how it is generated.  We've speculated that both our software
and Kermit are using some other feature of the system which somehow inhibits
proper hangup processing, but we have been unable to gather additional
information.  Our S.A. is also stumped.

Has anyone else encountered this problem?  Or, does anyone have a clue
as to its cause?  Our application MUST be able to trap all forms of hangup,
so this problem kind of puts us in a bind.  I intend to call AT&T support,
but based on past experience with them, I'm don't expect much relief.

Please address any replies directly to the E-Mail address below; I often miss
issues of UNIX-WIZARDS.

Thanks in advance to anyone who can help.

Brian M. Clapper                                    clapper@nadc.ARPA
Naval Air Development Center
Warminster, PA

ron@BRL.ARPA (Ron Natalie) (09/04/86)

After receiving a hang up signal, processes should neither try to
write anything on the terminal, nor try to open the tty or "/dev/tty"
device.  To do so will likely cause a hang on lines with modem control.

-Ron

rml@hpfcdc.UUCP (09/04/86)

One possibility that comes to mind is the following scenario:

	- "hardware hangup" occurs, sending SIGHUP to process group

	- process group leader (possibly login shell) receives
	  SIGHUP and dies

	- exit processing sends SIGHUP to process group again

	- second SIGHUP arrives while signal is set to SIG_DFL
	  (due to receipt of the first SIGHUP).

If this is the case, there's not much you can do on V.2.

Another (more remote) possibility that occurred to me is that a break is
being detected prior to the hangup.  In that case you should be able to
ignore or catch SIGINT.

You may be able to debug the problem by spawning the application from an
extra process which ignores SIGHUP, waits for its child(ren) and looks
at the exit status.  You could also find the exit status in the process
accounting data (see acct(4)).  In my first scenario the exit status
will be 1 (SIGHUP); in the second 2 (SIGINT).

			Bob Lenk
			{hplabs, ihnp4}!hpfcla!rml