[comp.unix.internals] Signal Request Not Reaching Child

rmosher () (11/27/90)

A process (P) generates (and can kill) child processes.  Children are
normally given a signal (kill -3) by P and kill themselves after cleaning
up and acknowledging P's request.

The problem is that (very rarely) a child process refuses to die.  No
ack is sent to P.  When in this state, even kill -9 will not kill the
child process.  However, the child process handles terminal i/o and if
the terminal is shut off and on the child can be aborted.  This makes
me think that the child is in a state where it cannot collect a software
signal, perhaps because it is in the middle of a screen write or some
terminal i/o.  Could this be the case?  What else could cause this?
Work around solutions would also be appreciated.  Note: the terminal is
set up in BLOCK mode, if that makes any difference.

Thanks in advance for any suggestions/help.

krader@crg8.sequent.com (Kurtis D. Rader) (11/30/90)

In article <1990Nov26.214748.5977@mdivax1.uucp> mdivax1!rmosher () writes:
%
%A process (P) generates (and can kill) child processes.  Children are
%normally given a signal (kill -3) by P and kill themselves after cleaning
%up and acknowledging P's request.
%
%The problem is that (very rarely) a child process refuses to die.  No
%ack is sent to P.  When in this state, even kill -9 will not kill the
%child process.  However, the child process handles terminal i/o and if
%the terminal is shut off and on the child can be aborted.  This makes
%me think that the child is in a state where it cannot collect a software
%signal, perhaps because it is in the middle of a screen write or some
%terminal i/o.  Could this be the case?  What else could cause this?
%Work around solutions would also be appreciated.  Note: the terminal is
%set up in BLOCK mode, if that makes any difference.

About once every two weeks a customer will call me to report they have
a process that won't respond to a "kill -9".  It always turns out that
the process is in a "fast wait"; i.e., it's waiting for an event at a
non-interruptable priority.  The most common cause is a printer or
terminal that has sent an x-off to the host and was then turned off.
What happens is that the "kill -?" (HUP, QUIT, whatever) causes the
kernel to try closing the serial port.  The close however is not
allowed to complete until the pending output is flushed.  This in
turn can't occur because the port has been blocked by the x-off.  At
this point the process is waiting on the vnode for the device at a
priority less than PZERO for an event that won't occur anytime soon
(if ever).  Turning the device back on and doing whatever is necessary
to cause a x-on to be sent clears things up.  If it's a parallel
printer, tape drive or similar device a reboot is usually required.
--
Kurtis D. Rader, Technical Support Engineer    voice:    503/578-3714
Service Hotline, Sequent Computer Systems      fax:      503/578-3731
15450 SW Koll Parkway, M/S UMP2-502            UUCP:     ...uunet!sequent!krader
Beaverton, OR  97006-6063                      internet: krader@sequent.com

hotte@sunrise.in-berlin.de (Horst Laumer) (11/30/90)

In article <1990Nov26.214748.5977@mdivax1.uucp>:


>A process (P) generates (and can kill) child processes.  Children are
>normally given a signal (kill -3) by P and kill themselves after cleaning
>up and acknowledging P's request.

>The problem is that (very rarely) a child process refuses to die.  No
>ack is sent to P.  When in this state, even kill -9 will not kill the
>child process.  However, the child process handles terminal i/o and if
>the terminal is shut off and on the child can be aborted.  This makes
>me think that the child is in a state where it cannot collect a software
>signal, perhaps because it is in the middle of a screen write or some
>terminal i/o.  Could this be the case?  What else could cause this?
>Work around solutions would also be appreciated.  Note: the terminal is
>set up in BLOCK mode, if that makes any difference.

>Thanks in advance for any suggestions/help.

This seams to be a general bug. At least in SysV, when kill(pid,SIGKILL)
doesn't work, this indicates that the recipient in in kernel mode and
executing (better: hanging) the code of a device driver the kernel relies
on (!). Since I do not know the process' job, assuming P knows the tty of
this child this may be worked around by having P close the line and send
the signal again.

hl
-- 
============================================================================
Horst Laumer, Kantstrasse 107, D-1000 Berlin 12 ! Bang-Adress: 
Domain: hotte@sunrise.in-berlin.de              ! Junk-Food for Autorouters
Bang:   ...unido!fub!geminix!sunrise!hotte      ! -- me --