[net.unix-wizards] killing zombies

harris@imsvax.UUCP (Harris Reavin) (08/11/85)

(bug food)

     Unfortunately no unix wizard on the net had a solution for zombie
processes that disable ports and refuse to die from a "kill -9". I got
mail from a number of other system administrators who had the same problem
and asked for the solution if I found one short of rebooting. We discovered
locally a way to drive a stake through a zombie's heart that I did not
mention in the previous article. It is better than a reboot but is still a
lot of trouble. We disconnect the cable from the controller of the affected
port, change the ttys entry to turn off the port, reconnect the cable,
and reedit the ttys file to turn the port on again. This usually works.
I still hope that someone may come up with a better solution that does
not involve crawling around the back of the computer in search of the
proper cable.


-- 
		                        Harris Reavin

UUCP:	{umcp-cs!eneevax || seismo!rlgvax!elsie}!imsvax!harris

chris@umcp-cs.UUCP (Chris Torek) (08/12/85)

The zombies are left by a bug in the device driver.  Using external
methods to get rid of it is just a kludge (though I must admit, a
useful kludge for those without kernel sources).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

root@bu-cs.UUCP (Barry Shein) (08/12/85)

>From: harris@imsvax.UUCP (Harris Reavin)
>Subject: Re: killing zombies
>
>     Unfortunately no unix wizard on the net had a solution for zombie
>processes that disable ports and refuse to die from a "kill -9". I got
>mail from a number of other system administrators who had the same problem
>and asked for the solution if I found one short of rebooting. We discovered
>locally a way to drive a stake through a zombie's heart that I did not
>mention in the previous article. It is better than a reboot but is still a
>lot of trouble. We disconnect the cable from the controller of the affected
>port, change the ttys entry to turn off the port, reconnect the cable,
>and reedit the ttys file to turn the port on again. This usually works.
>I still hope that someone may come up with a better solution that does
>not involve crawling around the back of the computer in search of the
>proper cable.

Ah, and thus you reveal your 'real' problem: Improperly hooked up
cables (wizards *never* do that :-).

See, my suspicion is that when you gave up the ability to 'hang-up' the
line by running 3-wire circuits you opened the possibility (really,
closed a possible solution to) this and other problems (like getting
your terminal all hung up in RAW mode or some such.) If this isn't the
case then how come unplugging works? It only drops a line.

Yes, you're right, kill oughta work and its a bug but for security's
sake consider going to 'modem' controlled wires where possible (most
terminals support this, asserting some sort of DTR which may need to be
jumpered to other pins but at any rate assure a hang-up if the terminal
is powered off [yet another security bug with three-wire circuits,
turning off a terminal *should* log you off.])

But, don't give up hope, I believe there is still a bug in some drivers
that may leave zombies even if you hang-up!

	-Barry Shein, Boston University

phil@RICE.ARPA (William LeFebvre) (08/12/85)

We have also had the zombie process problem in the past.  And, like
most other sites, we have not found a cure-all way of getting rid of
the culprits.

We have found a method that sometimes works (or used to work---it's
been awhile since I have seen it work).  It was noted that some of the
zombies are only waiting for their output queues to drain before going
away.  So, we would turn on the LFLUSHO bit by typing
"stty flusho >/dev/ttyXX &".  Be *SURE* to put it in the background!
Because if the command does not succeed, you will have another zombie
process.  This would work sometimes, but not always.

What we would usually do is try that method, and if it didn't work,
wait until a sufficient number of them accumulated (or until a rather
inactive time of the day) and just reboot the machine.  It is a *very*
annoying terminal driver bug---one that I wish would be permanently
eradicated!

			William LeFebvre
			Department of Computer Science
			Rice University
			<phil@Rice.arpa>
                        or, for the daring: <phil@Rice.edu>

whp@cbnap.UUCP (W. H. Pollock x4575 3S235) (08/13/85)

I missed the original article for this discussion, but I thought I'd give
my two cents worth anyway.

The reason you can't kill a zombie process is because IT IS ALREADY DEAD!!  A
zombie process ONLY takes up a slot in the proc table (and not even a full
entry!) and no other storage (on SVR2 anyway, if my memory serves).  The only
reason for zombies is that there is no other way for a child process to
return an exit status to the parent.

If the zombie is an orphan (i.e., its parent process id is 1), you
can always clean it up by signaling init (via kill) with SIGCLD (which
wakes up init which then cleans up all orphaned zombies).  This has worked
for me on SRV2 (I think, it was a while ago).

Wayne Pollock

smb@ulysses.UUCP (Steven Bellovin) (08/13/85)

> If the zombie is an orphan (i.e., its parent process id is 1), you
> can always clean it up by signaling init (via kill) with SIGCLD (which
> wakes up init which then cleans up all orphaned zombies).  This has worked
> for me on SRV2 (I think, it was a while ago).

If the parent process is 1, /etc/init should find it by itself; certainly,
that happens on 4.2bsd where you don't get signal races.  But the most
common cause of processes refusing to die is when they're hung in a device
close routine -- called by exit() in the kernel.  Even if you could send
it a signal -- which you can't, since all signals are ignored by that point --
it couldn't do anything except re-enter the driver close routine.

anton@ucbvax.ARPA (Jeff Anton) (08/14/85)

In article <368@imsvax.UUCP> harris@imsvax.UUCP (Harris Reavin) writes:
>     Unfortunately no unix wizard on the net had a solution for zombie
>processes that disable ports and refuse to die from a "kill -9". I got
>mail from a number of other system administrators who had the same problem
>and asked for the solution if I found one short of rebooting. We discovered
>locally a way to drive a stake through a zombie's heart that I did not
>mention in the previous article. It is better than a reboot but is still a
>lot of trouble. We disconnect the cable from the controller of the affected
>port, change the ttys entry to turn off the port, reconnect the cable,
>and reedit the ttys file to turn the port on again. This usually works.
>I still hope that someone may come up with a better solution that does
>not involve crawling around the back of the computer in search of the
>proper cable.

Have you tried 'stty 0 > /dev/ttyhung'?  This usually does the trick with
hung tty lines.
-- 
C knows no bounds.
					Jeff Anton
					U.C.Berkeley
					ucbvax!anton
					anton@berkeley.ARPA

dave@lsuc.UUCP (David Sherman) (08/15/85)

In article <368@imsvax.UUCP> harris@imsvax.UUCP (Harris Reavin) writes:
>     Unfortunately no unix wizard on the net had a solution for zombie
>processes that disable ports and refuse to die from a "kill -9". I got
>mail from a number of other system administrators who had the same problem
>and asked for the solution if I found one short of rebooting. We discovered
>locally a way to drive a stake through a zombie's heart that I did not
>mention in the previous article. It is better than a reboot but is still a
>lot of trouble. We disconnect the cable from the controller of the affected
>port, change the ttys entry to turn off the port, reconnect the cable,
>and reedit the ttys file to turn the port on again. This usually works.
>I still hope that someone may come up with a better solution that does
>not involve crawling around the back of the computer in search of the
>proper cable.

I have a solution which works for us here. We have a parallel set
of devices called acu* which are the same as tty* except that DCD
need not be present to open the device. (I originally installed this
kernel mod so as to be able to cu out on a modem line which wasn't
supporting carrier detect.)

I usually find that doing a "cu -l /dev/acuNN" and out again
with ~. will clear the problem.

Dave Sherman
The Law Society of Upper Canada
Toronto
-- 
{  ihnp4!utzoo  pesnta  utcs  hcr  decvax!utcsri  }  !lsuc!dave