[comp.unix.wizards] SIGCONT occurs after a SIGTERM

coleman@cam.nist.gov (Sean Sheridan Coleman X5672) (02/12/91)

Please explain to me why a SIGCONT is sent to a process after
SIGTERM is sent to my process. It doesn't compute because TERM
means to terminate the the process. I catch SIGCONT because I 
do some reconnecting for serial drivers after my process is
stopped from a cntl-Z. Below is a piece of the code and a
some output from the program. 

Here I stop the program with a ^Z and restart using fg %1.
SIGCONT is sent in this situation correctly.

<deputy /home/central/coleman/real_prog/net.dir/net_log> % net l logfile
^Z Signal caught is 18

Stopped (signal)
<deputy /home/central/coleman/real_prog/net.dir/net_log> % jobs
[1]  + Stopped (signal)     net l logfile
<deputy /home/central/coleman/real_prog/net.dir/net_log> % fg %1
net l logfile
 Signal caught is 19
^C Signal caught is 2



From another window, I used kill -TERM  to kill this process.
SIGTERM is received first but then SIGCONT is sent for no known
reason.

<deputy /home/central/coleman/real_prog/net.dir/net_log> % !ne
net l logfile
 Signal caught is 15
 Signal caught is 19
No devices are available to use for logging


Here is the signal handler:

Note: device,device_file and device_name are global

sig_handler(sig)
int sig;
{
	extern int device;
	extern FILE *device_file;
	extern char *device_name;	
	char *strip_add_dev_name();

	printf(" Signal caught is %d\n",sig);
	switch(sig)
	{
	 	case SIGINT:
		case SIGTERM:
			unlock_dev(strip_add_dev_name(ttyname(device),0));
			exit(1);
		case SIGTSTP:			
			unlock_dev(strip_add_dev_name(ttyname(device),0));
			close(device);
			kill(0,SIGSTOP);
			break;
		case SIGCONT:	
			if(device_file != NULL)
			{
				rewind(device_file);
				device = get_device(device_file);
			}
			else
			{
				if((device = chk_device(device_name)) < 0)
				{
					printf("No devices are available to use for logging\n");
					exit(1);
				}
			}

		 default:
			break;
				
	}
}


Thanks

Sean Coleman
coleman@bldrdoc.gov
NIST
Boulder, CO

richard@locus.com (Richard M. Mathews) (02/12/91)

coleman@cam.nist.gov (Sean Sheridan Coleman X5672) writes:

>Please explain to me why a SIGCONT is sent to a process after
>SIGTERM is sent to my process. It doesn't compute because TERM
>means to terminate the the process. I catch SIGCONT because I 
>do some reconnecting for serial drivers after my process is
>stopped from a cntl-Z. Below is a piece of the code and a
>some output from the program. 

This is yet another example of C Shell brain damage.  The shell thinks
it is going to do you a favor.  When it sends SIGTERM or SIGHUP it follows
it with a SIGCONT.  The "problem" that the shell is trying to solve is
that a signal sent to a stopped process won't get processed until the
process resumes -- since you apparently wanted the process to die, the
shell sends a SIGCONT just to make sure the process will be able to get
to your signal right away.  Wrong answer.  It doesn't solve the problem
in general for all signals, and it creates about as much confusion as
it tries to avoid.

The solution is don't catch SIGCONT.  Your SIGTSTP handler knows when
the program resumes anyway because the line after the "kill" which
caused it to suspend itself will not be reached until the program is
resumed.  Taking the code you have for SIGCONT, and putting it there
is the "normal" way to do things.

Richard M. Mathews			 Freedom for Lithuania
richard@locus.com				Laisve!
lcc!richard@seas.ucla.edu
...!{uunet|ucla-se|turnkey}!lcc!richard

conger@hpcupt1.cup.hp.com (Edward Conger) (02/13/91)

/ hpcupt1:comp.unix.wizards / coleman@cam.nist.gov (Sean Sheridan Coleman X5672) /  9:06 am  Feb 11, 1991 /
>Please explain to me why a SIGCONT is sent to a process after
>SIGTERM is sent to my process. It doesn't compute because TERM
>means to terminate the the process.

The distinction is that sending a signal to a process (usually|often)
is implemented by setting a bit in a flag word associated with the
"victim process".  The action of *send*ing the signal doesn't 
terminate the process, rather, it says, "when next you run in the
kernel (either via a system call or a timeslice (usually ~ 1/100 sec)),
you should go handle this signal."  In the case of SIGTERM, the default
behaviour is to TERMinate.

Now suppose the victim process is stopped (either by job control, SIGSTOP,
or via debugging), it will NOT see the bit set in the flag word until it
runs again.  The SIGCONT gets it unstopped and it runs long enough to
terminate.

Your mileage (and implementation) may vary, but this is the general gist of
the problem.

>Thanks

>Sean Coleman
>coleman@bldrdoc.gov
>NIST
>Boulder, CO
>----------

Hope this helps,

-Ed.

===========================================================================
The above is an official statement of MeMyself & I Inc.  It should not
be interpreted to be an official statement of any other likely targets,
including, but not limited to, Hewlett-Packard Co., ACME Rockets, ACME Rubber
Bands, ACME Consolidated Mining Engineering, or the Home for Damaged Coyotes.

src@scuzzy.in-berlin.de (Heiko Blume) (02/14/91)

richard@locus.com (Richard M. Mathews) writes:
>The solution is don't catch SIGCONT.  Your SIGTSTP handler knows when
>the program resumes anyway because the line after the "kill" which
>caused it to suspend itself will not be reached until the program is
>resumed.

which fails miserably when you get the uncatchable SIGSTOP.
*yes*, i do use SIGSTOP, there are programs that disable
the SIGTSTP feature, and i won't let those go unsuspended.
-- 
      Heiko Blume <-+-> src@scuzzy.in-berlin.de <-+-> (+49 30) 691 88 93
                    public source archive [HST V.42bis]:
        scuzzy Any ACU,f 38400 6919520 gin:--gin: nuucp sword: nuucp
                     uucp scuzzy!/src/README /your/home

bhoughto@pima.intel.com (Blair P. Houghton) (02/14/91)

In article <67880001@hpcupt1.cup.hp.com> conger@hpcupt1.cup.hp.com (Edward Conger) writes:
>The SIGCONT gets it unstopped and it runs long enough to
>terminate.
>Your mileage (and implementation) may vary, but this is the general gist of
>the problem.

Not the least of those variances is that signals may be
queued, so that the SIGCONT may simply be waking the
process up only to watch it go to sleep again (unless the
SIGTERM can somehow butt into the queue).

				--Blair
				  "Dave? Dave's not here..."

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (02/15/91)

As quoted from <7103@fs1.cam.nist.gov> by coleman@cam.nist.gov (Sean Sheridan Coleman X5672):
+---------------
| Please explain to me why a SIGCONT is sent to a process after
| SIGTERM is sent to my process. It doesn't compute because TERM
+---------------

Being suspended, it wouldn't execute the signal handler unless it were
continued.  Also, I think the exit processing in the kernel needs this.
(So why does the SIGCONT handler run after the SIGTERM handler?  Because
signal handlers are invoked in signal-number order.)

++Brandon
(BSD folks feel free to correct me.)
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

torek@elf.ee.lbl.gov (Chris Torek) (02/18/91)

In article <2519@inews.intel.com> bhoughto@pima.intel.com
(Blair P. Houghton) writes:
>Not the least of those variances is that signals may be queued ....

Signals are not queued.

As far as I know, there is only one piece of one Unix manual that
claims otherwise (that being the System V SIGCLD documentation), and it
lies.  Signals are never queued.  System V SIGCLD signals use a
different trick that causes properly-coded wait routines to be called
once per exited child, but which causes improperly-coded wait routines
to recurse indefinitely.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

bhoughto@hopi.intel.com (Blair P. Houghton) (02/19/91)

In article <10007@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>In article <2519@inews.intel.com> bhoughto@pima.intel.com
>(Blair P. Houghton) writes:
>>Not the least of those variances is that signals may be queued ....
>
>Signals are not queued.

Something's stacking them up.

I've run into situations more than once where I've tried to
stop a process and the stop has hung, usually due to
something else's being stuck (an NFS access, e.g.) I've
sent the stop again, and when the block clears I see the
process stop.  When I tell the process to continue, the
first thing it does is stop itself again.

Who's doing it?  The kernel or csh(1)?  The tty driver?  Or
is it just a matter of a stuck process queue?  I can't
imagine all the kills not being done by the time I've typed
in the command to continue...

				--Blair
				  "It also happens under VMS,
				   but I'll keep mention of that
				   'Fine' system to a minimum..."

torek@elf.ee.lbl.gov (Chris Torek) (02/22/91)

>In article <10007@dog.ee.lbl.gov> I wrote:
>>Signals are not queued.

In article <2588@inews.intel.com> bhoughto@hopi.intel.com
(Blair P. Houghton) writes:
>Something's stacking them up.

Well, not really:

>I've run into situations more than once where I've tried to
>stop a process and the stop has hung, usually due to
>something else's being stuck (an NFS access, e.g.)

I have no idea what `the stop has hung' means.

>I've sent the stop again, and when the block clears I see the
>process stop.  When I tell the process to continue, the
>first thing it does is stop itself again.

>Who's doing it?  The kernel or csh(1)?  The tty driver?  Or
>is it just a matter of a stuck process queue?

The most likely cause is the program itself.  (This also depends on which
stop signal you use.)

A number of programs contain code resembling the following:

	/* broken function to catch SIGTSTP (^Z) */
	void
	catch_stop()
	{
		/* put the terminal modes back */
		clean_tty();
		/* reenable stops */
		signal(SIGTSTP, SIG_DFL);	/* bug */
		/* stop ourselves */
		kill(0, SIGTSTP);		/* bug */
		/* resumes here */
		signal(SIGTSTP, catch_stop);
		dirty_tty();
	}

This particular catcher is full of race conditions.  The most interesting
problem, however, is the

	kill(0, SIGTSTP);

This sends another stop signal to the entire process group.  If there
are two processes in a pipe, both using code like this, they end up
sending each other barrages of stop signals.  With the code shown above,
it is possible (though unlikely) to have two processes in a pipeline
`trade off' stops, so that you run:

	% foo | bar
	^Z
	Stopped
	% fg
	Stopped
	% fg
	Stopped
	% fg
	Stopped
	%

In each case either foo or bar `wins the race', stops both, then when you
foreground either foo or bar wins again, and stops both, and . . . .

This does not happen with only one process, though a different sort of
race can lead to two stops from two different signals, despite the
SIGCONT description below:  If the process takes a SIGTSTP, and then
stops on a SIGSTOP during, e.g., the clean_tty() call above, it can
then send itself a SIGTSTP as soon as it is resumed.

The (4.3++) kernel implementation of signals is simply four bit vectors:

 - signals currently pending	(p_sig)
 - signals currently held	(p_sigmask)
 - signals being caught		(p_sigcatch)
 - signals being ignored	(p_sigignore)

Some of these fields exist only to optimise signal dispatching.  The
most important thing is that a signal is delivered to a process with:

	p->p_sig |= mask;

and a process takes a signal if:

	(p->p_sig & ~p->p_sigmask) != 0

The signal it takes is the lowest-numbered one that is pending.  When a
signal is taken the corresponding bit is removed from p->p_sig.
Typically, the same bit is set in p->p_sigmask, blocking further
delivery of that particular signal.  (The new mask comes about from the
signal mask in the [4.2-style] sigvec or [POSIX-style] sigaction.)

When a SIGCONT signal is sent (not delivered, merely sent!), p->p_sig
has `stopsigmask' (the masks for SIGTSTP, SIGSTOP, SIGTTIN, and SIGTTOU)
removed, so one SIGCONT clears up to four stops.  This can clear stops
that have never been taken.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov