rmosher () (11/27/90)
A process (P) generates (and can kill) child processes. Children are normally given a signal (kill -3) by P and kill themselves after cleaning up and acknowledging P's request. The problem is that (very rarely) a child process refuses to die. No ack is sent to P. When in this state, even kill -9 will not kill the child process. However, the child process handles terminal i/o and if the terminal is shut off and on the child can be aborted. This makes me think that the child is in a state where it cannot collect a software signal, perhaps because it is in the middle of a screen write or some terminal i/o. Could this be the case? What else could cause this? Work around solutions would also be appreciated. Note: the terminal is set up in BLOCK mode, if that makes any difference. Thanks in advance for any suggestions/help.
krader@crg8.sequent.com (Kurtis D. Rader) (11/30/90)
In article <1990Nov26.214748.5977@mdivax1.uucp> mdivax1!rmosher () writes:
%
%A process (P) generates (and can kill) child processes. Children are
%normally given a signal (kill -3) by P and kill themselves after cleaning
%up and acknowledging P's request.
%
%The problem is that (very rarely) a child process refuses to die. No
%ack is sent to P. When in this state, even kill -9 will not kill the
%child process. However, the child process handles terminal i/o and if
%the terminal is shut off and on the child can be aborted. This makes
%me think that the child is in a state where it cannot collect a software
%signal, perhaps because it is in the middle of a screen write or some
%terminal i/o. Could this be the case? What else could cause this?
%Work around solutions would also be appreciated. Note: the terminal is
%set up in BLOCK mode, if that makes any difference.
About once every two weeks a customer will call me to report they have
a process that won't respond to a "kill -9". It always turns out that
the process is in a "fast wait"; i.e., it's waiting for an event at a
non-interruptable priority. The most common cause is a printer or
terminal that has sent an x-off to the host and was then turned off.
What happens is that the "kill -?" (HUP, QUIT, whatever) causes the
kernel to try closing the serial port. The close however is not
allowed to complete until the pending output is flushed. This in
turn can't occur because the port has been blocked by the x-off. At
this point the process is waiting on the vnode for the device at a
priority less than PZERO for an event that won't occur anytime soon
(if ever). Turning the device back on and doing whatever is necessary
to cause a x-on to be sent clears things up. If it's a parallel
printer, tape drive or similar device a reboot is usually required.
--
Kurtis D. Rader, Technical Support Engineer voice: 503/578-3714
Service Hotline, Sequent Computer Systems fax: 503/578-3731
15450 SW Koll Parkway, M/S UMP2-502 UUCP: ...uunet!sequent!krader
Beaverton, OR 97006-6063 internet: krader@sequent.com
hotte@sunrise.in-berlin.de (Horst Laumer) (11/30/90)
In article <1990Nov26.214748.5977@mdivax1.uucp>: >A process (P) generates (and can kill) child processes. Children are >normally given a signal (kill -3) by P and kill themselves after cleaning >up and acknowledging P's request. >The problem is that (very rarely) a child process refuses to die. No >ack is sent to P. When in this state, even kill -9 will not kill the >child process. However, the child process handles terminal i/o and if >the terminal is shut off and on the child can be aborted. This makes >me think that the child is in a state where it cannot collect a software >signal, perhaps because it is in the middle of a screen write or some >terminal i/o. Could this be the case? What else could cause this? >Work around solutions would also be appreciated. Note: the terminal is >set up in BLOCK mode, if that makes any difference. >Thanks in advance for any suggestions/help. This seams to be a general bug. At least in SysV, when kill(pid,SIGKILL) doesn't work, this indicates that the recipient in in kernel mode and executing (better: hanging) the code of a device driver the kernel relies on (!). Since I do not know the process' job, assuming P knows the tty of this child this may be worked around by having P close the line and send the signal again. hl -- ============================================================================ Horst Laumer, Kantstrasse 107, D-1000 Berlin 12 ! Bang-Adress: Domain: hotte@sunrise.in-berlin.de ! Junk-Food for Autorouters Bang: ...unido!fub!geminix!sunrise!hotte ! -- me --