[net.unix-wizards] Unkillable processes

mrl@drutx.UUCP (LongoMR) (04/18/85)

[]

I have seen several times in the past years, processes which cannot
be killed on UNIX. I have been told that one of two things can be
happening. One, the process is waiting on a non-existant wait channel, or
two, the process is running with a negative priority level. This seems
to happen at random on random processes. Has anyone else seen this and,
if so, how is the problem handled?
	Mark Longo		AT&T ISL Denver

gdsd1@homxb.UUCP (M.LAI) (04/18/85)

>	I have seen several times in the past years, processes which cannot
>	be killed on UNIX...

Yep, sure surprised me once when an entire process group shrugged-off
a barrage of nine's.  The spectable occured on a MAXI (Amdahl) running S5R2.
Unfortunately,  I did not have the time to pursue the matter.

--
				Neal Nuckolls
				AT&T Bell Laboratories
				Red Hill, Middletown, NJ
				..!{houxa|vax135}!homxb!gdsd1
				(201) 949-9295

seth@megad.UUCP (Seth H Zirin) (04/24/85)

> I have seen several times in the past years, processes which cannot
> be killed on UNIX. I have been told that one of two things can be
> happening. One, the process is waiting on a non-existant wait channel, or
> two, the process is running with a negative priority level.

A process sleep()ing with a lower (more attractive) priority than the constant
PZERO ignores signals, and cannot be awakened other than with a wakeup() call.
Since kill()ing a process involves sending it a signal, a process is unkillable
if it is sleeping below PZERO.  There are several places in most Unix kernels
where a process will sleep like this.

The solution:

	(1) Trick the process into waking up somehow.
	(2) Use adb on /dev/kmem to change the priority to (PZERO + 1)

Once the process is running or has a higher (less attractive) priority than
PZERO, it will obey a number nine.  Bear in mind that if a process is sleeping
below PZERO there is probably a reason for it, and kludging it awake may
make your kernel nauseous.
-- 
-------------------------------------------------------------------------------
Name:	Seth H Zirin
USmail:	Megadata Corp. 35 Orville Dr., Bohemia, NY 11716
Phone:	516-589-6800 (M-F 9-5 EST)
UUCP:	{decvax, ihnp4}!philabs!sbcs!megad!seth

Keeper of the News for megad

gn@loonam.UUCP (The Unknown one) (04/25/85)

> []
> 
> I have seen several times in the past years, processes which cannot
> be killed on UNIX. I have been told that one of two things can be

  As a side light there was the same problem under Xenix 2.5, a 
process would hang and was unable to be killed short of shutdown.

 This (I understand) was a problem with 2.5's IO drivers, when a
process had output going to a none responding device the process 
would hang. This would also happen if a process generated an 
error that printed on the console, and the console was off.

 This was fixed on Xenix 3.0.

Greg Noel
ihnp4!umn-cs!digi-g!loonam!gn

smb@ulysses.UUCP (Steven Bellovin) (04/28/85)

> > I have seen several times in the past years, processes which cannot
> > be killed on UNIX. I have been told that one of two things can be
> > happening. One, the process is waiting on a non-existant wait channel, or
> > two, the process is running with a negative priority level.
> 
> A process sleep()ing with a lower (more attractive) priority than the constant
> PZERO ignores signals, and cannot be awakened other than with a wakeup() call.
> Since kill()ing a process involves sending it a signal, a process is unkillable
> if it is sleeping below PZERO.  There are several places in most Unix kernels
> where a process will sleep like this.
> 
> The solution:
> 
> 	(1) Trick the process into waking up somehow.
> 	(2) Use adb on /dev/kmem to change the priority to (PZERO + 1)
> 
> Once the process is running or has a higher (less attractive) priority than
> PZERO, it will obey a number nine.  Bear in mind that if a process is sleeping
> below PZERO there is probably a reason for it, and kludging it awake may
> make your kernel nauseous.

Another common cause of unkillable processes is a device close routine.  When
a process dies, exit() (in the kernel) tries to close all open file descriptors.
If the close routine is waiting for something that will never take place --
and tty drivers are notorious for this -- then the process can't exit.  All
signals, including signal 9, are ignored by this point, but that doesn't really
matter much, since even if you did kill the process it would just recurse through
exit() and call the close routine again...

greg@ncr-tp.UUCP (Greg Noel) (05/03/85)

In article <119@loonam.UUCP> gn@loonam.UUCP (The Unknown one) writes:
> [A reasonably accurate technical reply (I'm glad of that!).]
> [And then he signs it....]
>Greg Noel  ihnp4!umn-cs!digi-g!loonam!gn

You've GOT to be kidding!  There CAN'T be two of us!
-- 
-- Greg Noel, NCR Torrey Pines       Greg@ncr-tp.UUCP or Greg@nosc.ARPA

root@wlcrjs.UUCP (Randy Suess) (05/05/85)

In article <119@loonam.UUCP> gn@loonam.UUCP (The Unknown one) writes:
>> I have seen several times in the past years, processes which cannot
>> be killed on UNIX. 
>
>  As a side light there was the same problem under Xenix 2.5, a 
>process would hang and was unable to be killed short of shutdown.
>
> This was fixed on Xenix 3.0.

	Noop, it wasn't..  I run a pair of Altos 586's networked together with
3.0b.  The problem still exists when an i/o device bombs, the process never
terminates and kill -9 pid don't work.

	BTW, I just received the Altos Booster Pak (4.2bsd fast file system) and
it's got a major problem.  The umount command causes a panic crash.  Also, with
25000 blocks(512 bytes) free on 3.0b, the Booster comes up with about 7 megs
free.  Icky...

-- 
.. that's the biz, sweetheart ..
Randy Suess
Chi-Net - Public Access UN*X 
(312) 545 7535 (h) (312) 283 0559 (system)
{ihnp4|ihldt}!wlcrjs!randy