[comp.unix.microport] Trouble killing processes in SysV/AT

wnp@killer.UUCP (Wolf Paul) (04/28/88)

Can anyone enlighten me as to what causes a process to become "immortal"
in System VR2,  or Microport UNIX System V/AT, to be more specific?

I have encountered this a number of times, where it would be impossible
even for root to kill a process; if the parent process of the "immortal"
process is killed, the child attaches itself to init, PID 1.

The only way to get rid of such an immortal process seems to be to reboot,
which is rather drastic.

What causes a process to refuse to die? I thought signal 9 (kill) could
not be intercepted or ignored?

Any comments welcome.

Wolf Paul
-- 
Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101
UUCP:  ihnp4!killer!dcs!wnp                    ESL: 62832882
INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP   TLX: 910-280-0585 EES PLANO UD

wtr@moss.ATT.COM (04/29/88)

In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes:
>Can anyone enlighten me as to what causes a process to become "immortal"
>in System VR2,  or Microport UNIX System V/AT, to be more specific?

basically the way i do it is to have a shell script that runs 
something in the background.  when the shell script puts the
process in the backgroun and goes about it's merry own way,
(which usually means exiting back out), the background
process spawned from the script is given the PPID of 1
because it's former parent is now dead.

i've used this to great advantage when i want to run a background
job from a terminal, and then logoff and go somewhere else.
(yes, I know about nohup, it's priority is too low, and i need
to route my standard output, this ways easier)

>I have encountered this a number of times, where it would be impossible
>even for root to kill a process; if the parent process of the "immortal"
>process is killed, the child attaches itself to init, PID 1.
>
>What causes a process to refuse to die? I thought signal 9 (kill) could
>not be intercepted or ignored?

wait! don't touch that reset button! there is life beyond shells!
you can kill the process (either by root or by the user who started it)
but you MUST kill of not only the process, but also all of it's
children too! (mass genocide!! ;-) if you dont, any child process
under the 'immortal script' is given a new PPID of 1 and thus itself
becomes immortal, and may spawn and produce children itself, etc...
whip out you trusty 'kill -9' and gun down those suckers!

note: the massacre outlined above will produce really nasty effects
if the process was any sort of compile, ESPECIALLY a makefile run.

good hunting!

vandys@hpindda.HP.COM (Andy Valencia) (04/29/88)

	A "classic" way to make an unkillable process is to have it
block on an I/O device which isn't going to finish its I/O.  The trick
is that if it sleep()s with a certain priority or above, signals will
unblock it (and thus you get interruptible system calls), but if it's
below, then signals can't get to the process until it unblocks.  Now
all you need is for some I/O operation to get frozen (say, lose an
interrupt, or mishandle it), and you have the unkillable process.

			We are having fun now, ja?
			Andy Valencia
			vandys%hpindda.UUCP@hplabs.hp.com

dave@micropen (David F. Carlson) (04/30/88)

In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
> Can anyone enlighten me as to what causes a process to become "immortal"
> in System VR2,  or Microport UNIX System V/AT, to be more specific?
> 
This "prblem" is not a Micrport issue at all: it is UNIX all the way.

> I have encountered this a number of times, where it would be impossible
> even for root to kill a process; if the parent process of the "immortal"
> process is killed, the child attaches itself to init, PID 1.
> What causes a process to refuse to die? I thought signal 9 (kill) could
> not be intercepted or ignored?

If you are technically minded and want a real answer read:
	"The Design of the UNIX Operating System" by Maurice Bach.

The quick answer is that any process that is in the kernel with a WCHAN
will not go back to user mode until that channel is awoken.  Who will
awaken it?  Two choices:  a device driver interrupt or a kernel timer
interrupt.  In all likelihood your ill-behaved process is waiting in
a poorly written device driver close().  No close should allow a process
to wait forever on a event that may not come.  Signals (kill -9) are
delivered when a process in kernel mode re-enters user mode.  However,
you process is waiting in kernel mode and won't get those signals til
its done: NEVER! (or until the long sought interrupt allows it's WCHAN
to go again.




-- 
David F. Carlson, Micropen, Inc.
...!{ames|harvard|rutgers|topaz|...}!rochester!ur-valhalla!micropen!dave

"The faster I go, the behinder I get." --Lewis Carroll

hedrick@athos.rutgers.edu (Charles Hedrick) (04/30/88)

You ask about processes that refuse to die.  (Calling them "immortal"
confers a positive aura that is probably undeserved.  Normally these
processes are in a useless state, and might better be referred to as
members of the "undead".)  Unix, along with many other operating
systems, kills processes by telling them to die.  You probably
envision that kill -9 invokes some code that goes through all the
tables ripping out entries for the process.  Unfortunately, the kernel
isn't organized in such a way that this is possible.  Processes may
own resources, locks, mapped memory, etc.  All of these have to be
released validly before the process can safely be removed from the
system.  Thus a kill starts a surprisingly complex series of events,
some of which are executed in the process' own context.  If the
process is in an inconsistent state, it may be unable to complete
these events, and hang in the process of being killed (or killing
itself).  I've seen this sort of thing happen in many different
versions of Unix (including various Berkeley-based Unices), and
similar things afflicted TOPS-20.  By definition it is caused by
a bug in the kernel, typically some sort of race condition or
deadly embrace.

friedl@vsi.UUCP (Stephen J. Friedl) (04/30/88)

In article <468@micropen>, dave@micropen (David F. Carlson) writes:
< In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
< > Can anyone enlighten me as to what causes a process to become "immortal"
< > in System VR2,  or Microport UNIX System V/AT, to be more specific?
< > 
< This "problem" is not a Microport issue at all: it is UNIX all the way.
< 
< The quick answer is that any process that is in the kernel with a WCHAN
< will not go back to user mode until that channel is awoken.  Who will
< awaken it?  Two choices:  a device driver interrupt or a kernel timer
< interrupt.  In all likelihood your ill-behaved process is waiting in
< a poorly written device driver close().

     There is a third choice.  When a driver calls sleep(), one
of the arguments is a sleeping priority.  In addition to entering
into scheduling considerations, it determines whether or not the
sleep() can be interrupted by a signal.  If this priority is less
than or equal to PZERO (defined in <sys/param.h>), then the
driver can't be interrupted, with the converse being true.

     Different drivers use different priorities.  Example from
the 3B2, where PZERO is 25.  In the tty driver, an open(2) on a
port will block until the carrier detect line is seen by the
hardware.  When the process sleeps on this, its priority is
TTOPRI.  Since TTOPRI is #defined in <sys/tty.h> as 29, this call
is interruptible.

     To demonstrate this, find a port (say, tty11) that has no
cables or processes attached to it.  Assuming you have read
permissions, cat the device and hit DELETE:

        # cat < /dev/tty11
        (hit DELETE)
        /dev/tty11: cannot open
        #

     Because TTOPRI > PZERO, your interrupt is heeded.

     Alas, this is not always the case.  In the floppy block
device open() handler, it sleeps with PRIBIO (#defined in
<sys/param.h> to be 20).  When you try to (say) mount the floppy,
you have to wait for it to succeed or timeout; your interrupt is
ignored because PRIBIO < PZERO.

     I would be interested to hear from driver writers who are
more familiar with this: how does one determine whether a sleep
should be interruptible or not?  Why aren't they all this way
(not a plea, just a question)?  The cartridge tape driver on the
3B2 obviously runs at a noninterruptible priority because once I
type a command that deals with it I sometimes have to wait for
the retension pass (usually a couple of  minutes) before the
interrupt is honored :-(.

     A side note: WCHAN is a "wait channel", the address on which
the sleep() awaits awakenment (I just made that word up), and it is
found by the "-l" option to ls.  If you are really industrious,
you can write a program that looks this address up in the /unix
namelist and gives a clue for what the process is waiting.  You
can't always nail it down, as you really need source to get
structure offsets and stuff, but it is instructive to get a clue
whether a program is waiting on disk or a tty or whatever.

-- 
Steve Friedl      V-Systems, Inc. (714) 545-6442   Resident 3B2 hacker
friedl@vsi.com      {backbones}!vsi.com!friedl      attmail!vsi!friedl

limes@sun.uucp (Greg Limes) (04/30/88)

In article <468@micropen> dave@micropen (David F. Carlson) writes:
>In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
>> Can anyone enlighten me as to what causes a process to become "immortal"
>> in System VR2,  or Microport UNIX System V/AT, to be more specific?
>> 
>This "prblem" is not a Micrport issue at all: it is UNIX all the way.
>
>> I have encountered this a number of times, where it would be impossible
>> even for root to kill a process; if the parent process of the "immortal"
>> process is killed, the child attaches itself to init, PID 1.
>> What causes a process to refuse to die? I thought signal 9 (kill) could
>> not be intercepted or ignored?
>
>If you are technically minded and want a real answer read:
>	"The Design of the UNIX Operating System" by Maurice Bach.
>
>The quick answer is that any process that is in the kernel with a WCHAN
>will not go back to user mode until that channel is awoken.  Who will
>awaken it?  Two choices:  a device driver interrupt or a kernel timer
>interrupt.  In all likelihood your ill-behaved process is waiting in
>a poorly written device driver close().  No close should allow a process
>to wait forever on a event that may not come.  Signals (kill -9) are
>delivered when a process in kernel mode re-enters user mode.  However,
>you process is waiting in kernel mode and won't get those signals til
>its done: NEVER! (or until the long sought interrupt allows it's WCHAN
>to go again.

Close, but not quite. It depends on the priority of the process during
the sleep, and what the driver does with the return value. If the
priority is less than PZERO, nothing happens, the sleep continues to
sleep, and all is as you note above; this corresponds to what is called
a "short term disk wait", and is usually used for events that are
expected with some high probability to happen quickly, or times where
cleaning up after an abort is so messy that it cannot be faced.

If the sleep priority is above PZERO, the sleep() will return an error
corresponding to "I was interrupted!". The device driver is then counted
on to clean up, abort whatever it was doing (or set up for a later
completion), and report error status if any to its caller. This
situation corresponds to longer term sleeps, like reading from a socket,
tty, or some other "slow" device.

Note that all processes that are sleeping have a WCHAN, that is how they
are woken up; if a signal is delivered to a process, it is woken up
independent of the value of its WCHAN.
-- 
   Greg Limes [limes@sun.com]				frames to /dev/fb

limes@sun.uucp (Greg Limes) (04/30/88)

In article <625@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes:
>
>     I would be interested to hear from driver writers who are
>more familiar with this: how does one determine whether a sleep
>should be interruptible or not?  Why aren't they all this way
>(not a plea, just a question)?  

Sometimes you sleep in places where it is difficult (or impossible) to
clean up after an abort. You previously verified that the controller is
out there, dammit, and it is *going* to respond, and there is just too
much to clean up in an abort, or there is no way to abort the controller,
or maybe you just do not want to spend the time to make abort work in
*this* instance which is (grin) never going to happen anyway ... well,
you get the picture. Sometimes it is just laziness.

>                               The cartridge tape driver on the
>3B2 obviously runs at a noninterruptible priority because once I
>type a command that deals with it I sometimes have to wait for
>the retension pass (usually a couple of  minutes) before the
>interrupt is honored :-(.

Worse yet, on some SCSI-based systems, when the tape drive starts
rewinding, the SCSI bus is locked until the operation completes. Kind of
messy when your swap disk is out there on the SCSI.

>     A side note: WCHAN is a "wait channel", the address on which
>the sleep() awaits awakenment (I just made that word up), and it is
>found by the "-l" option to ls.  If you are really industrious,
>you can write a program that looks this address up in the /unix
>namelist and gives a clue for what the process is waiting.  You
>can't always nail it down, as you really need source to get
>structure offsets and stuff, but it is instructive to get a clue
>whether a program is waiting on disk or a tty or whatever.

some versions of "ps" do this lookup for you ... at least, SunOS 4.0
does now. shocked the heck out of me the first time the WCHANS started
coming up "socket", "pause", "select", and so on. Seems nowadays that
most everything sleeps on "select" on my workstation.
-- 
   Greg Limes [limes@sun.com]				frames to /dev/fb

pjh@mccc.UUCP (Pete Holsberg) (04/30/88)

In article <468@micropen> dave@micropen (David F. Carlson) writes:
...In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
...> Can anyone enlighten me as to what causes a process to become "immortal"
...> in System VR2,  or Microport UNIX System V/AT, to be more specific?
...> 
...This "prblem" is not a Micrport issue at all: it is UNIX all the way.
...
...> I have encountered this a number of times, where it would be impossible
...> even for root to kill a process; if the parent process of the "immortal"
...> process is killed, the child attaches itself to init, PID 1.
...> What causes a process to refuse to die? I thought signal 9 (kill) could
...> not be intercepted or ignored?
...
...If you are technically minded and want a real answer read:
...	"The Design of the UNIX Operating System" by Maurice Bach.
...
...The quick answer is that any process that is in the kernel with a WCHAN
...will not go back to user mode until that channel is awoken.  Who will
...awaken it?  Two choices:  a device driver interrupt or a kernel timer
...interrupt.  In all likelihood your ill-behaved process is waiting in
...a poorly written device driver close().  No close should allow a process
...to wait forever on a event that may not come.  Signals (kill -9) are
...delivered when a process in kernel mode re-enters user mode.  However,
...you process is waiting in kernel mode and won't get those signals til
...its done: NEVER! (or until the long sought interrupt allows it's WCHAN
...to go again.

This happens frequently on my 3B2/400 when it gets into a deadly embrace
with my modem:  I cannot kill -9 any of the processes associated with
that port!  It takes toggling the modem's ON/OFF switch to break the
embrace.  Surely there must be a better way!  ??

chris@mimsy.UUCP (Chris Torek) (04/30/88)

In article <51443@sun.uucp> limes@sun.uucp (Greg Limes) writes:
>If the sleep priority is above PZERO, the [signalled] sleep() will return
>an error corresponding to "I was interrupted!".

Unless Sun has made some big kernel changes recently, this is not the
case.  See /sys/sys/kern_synch.c, at the label `psig' in sleep().

Returning an error from sleep would be a viable alternative to `catch'
and `throw' routines, although it would entail more work.  Every driver
that now sleeps interruptably might read

	while ((foo->status & READY) == 0) {
		if (sleep((caddr_t)foo, PFOO))
			return (EINTR);
	}

and one would be safe in ignoring the (new) return value from sleep
iff the sleep is uninterruptable.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

kjk@pbhyf.PacBell.COM (Ken Keirnan) (05/01/88)

In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes:
>Can anyone enlighten me as to what causes a process to become "immortal"
>in System VR2,  or Microport UNIX System V/AT, to be more specific?
>
>I have encountered this a number of times, where it would be impossible
>even for root to kill a process; if the parent process of the "immortal"
>process is killed, the child attaches itself to init, PID 1.
>


The only processes (I can think of) that cannot be killed, even with
signal 9, are *DEFUNCT* processes and processes suspended waiting on I/O.
Since the above case is evidently not a DEFUNCT, I would suspect an I/O
problem.  Since there was no mention of what the process is doing, its
tough to determine.  Is the process a hung getty?  How about a little more
info.

					Ken Keirnan
-- 

Ken Keirnan - Pacific Bell - {att,bellcore,sun,ames,pyramid}!pacbell!pbhyf!kjk
  San Ramon, California	                    kjk@pbhyf.PacBell.COM

richardh@killer.UUCP (Richard Hargrove) (05/01/88)

In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
> Can anyone enlighten me as to what causes a process to become "immortal"
> in System VR2,  or Microport UNIX System V/AT, to be more specific?
> 
> I have encountered this a number of times, where it would be impossible
> even for root to kill a process; 

Wolf,

While I've never seen this under Microport SYS V/AT, I have seen it under
Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command

kill -9 pid

executed by root didn't remove the process entry from the ps display. However
repeated invocations of "ps -elf" indicated that the process was always 
inactive and that it had a nice value of 20. Of course this could be subject
to error since my sample rate didn't approach the system's time-slice
quantum ;-). Actually there were two different activations of the same program,
the quite large Intel tool bld386 - both of which had terminated abnormally
due to system errors (ran out of disk space.) Also, there didn't appear to 
be any real system performance degradation (80286-based '(U|Xe)nix' systems 
suffer performance degradation very rapidly as I'm sure you've observed.)

Not having access to source code, I was left to speculate on what I observed.
I came to the conclusion that the actual processes were gone, but some
table or tables maintained by the kernel had been corrupted. I'm assuming
that ps reports only what it finds in the table(s) and that it doesn't check
their validity. As you experienced, rebooting the system cleared up
everything. If my diagnosis is correct, I know of no other way to clear up
the problem, though I would like to more about what was going on.

richard hargrove
...!{ihnp4 | codas | cbosgd}!killer!richardh
--------------------------------------------

jpayne@cs.rochester.edu (Jonathan Payne) (05/02/88)

In article <3967@killer.UUCP> richardh@killer.UUCP (Richard Hargrove) writes:
>In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
>> Can anyone enlighten me as to what causes a process to become "immortal"
>> in System VR2,  or Microport UNIX System V/AT, to be more specific?
>> 
>> I have encountered this a number of times, where it would be impossible
>> even for root to kill a process; 

>Not having access to source code, I was left to speculate on what I observed.
>I came to the conclusion that the actual processes were gone, but some
>table or tables maintained by the kernel had been corrupted. I'm assuming
>that ps reports only what it finds in the table(s) and that it doesn't check
>their validity. As you experienced, rebooting the system cleared up
>everything. If my diagnosis is correct, I know of no other way to clear up
>the problem, though I would like to more about what was going on.
>
>richard hargrove
>...!{ihnp4 | codas | cbosgd}!killer!richardh
>--------------------------------------------


I believe the story goes something like this.  The process is sleeping at
a priority that is too high (or low) to be interrupted by a software
interrupt.  That is, while in kernel mode the process did a sleep(chan1,
PRI), but nothing has come along to wake it up (with wakeup(chan1)).
Sending a signal can't wake up a process that is sleeping in this
manner.  I believe something like this happened to me several years ago
when my pty was expecting a ^Q (because somehow it got a ^S ...) and I
got disconnected but it was sleeping in the tty driver waiting for that
^Q.  Software interrupts, I believe, are checked whenever a process is
schedule for running.  Sending the signal sets some bit in the process
structure, and when the process is next schedule those bits will be
checked.  The problem is that the process is still sleeping, waiting for
some event, like a ^Q or some other kind of interrupt, and unfortunately
that interrupt may never come for some reason (like I was disconnected
from the pty - this bug is fixed, I think).

(I pretty sure about this ...)

vrs@littlei.UUCP (vrs) (05/02/88)

In article <9233@sol.ARPA> jpayne@cs.rochester.edu (Jonathan Payne) writes:
>>In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
>>> Can anyone enlighten me as to what causes a process to become "immortal"
>>> in System VR2,  or Microport UNIX System V/AT, to be more specific?
>>> 
>
>I believe the story goes something like this.  The process is sleeping at
>a priority that is too high (or low) to be interrupted by a software
>interrupt.  That is, while in kernel mode the process did a sleep(chan1,
>PRI), but nothing has come along to wake it up (with wakeup(chan1)).

This is nearly always because a device wants to write output and the connection
has been lost.  The driver fails to flush pending output (and/or new output)
after the connection goes down.

There is another scenario worth worrying about during driver design:  even
if the driver sleeps at a low priority (as it does in the usual tty line
discipline), a kill will cause the process to try to exit().  The exit()
will mask off all signals and close all files.  When it closes the device
with the lost connection, it sleeps AGAIN, this time with signals ignored.

We've done a fair bit of work on our Multibus drivers since XENIX 3.4 :-).

rwb@viusys.UUCP (Rick) (05/02/88)

In article <625@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes:
>In article <468@micropen>, dave@micropen (David F. Carlson) writes:
>< In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
>< > Can anyone enlighten me as to what causes a process to become "immortal"
>< > in System VR2,  or Microport UNIX System V/AT, to be more specific?

< much good info on PZERO, sleep(), etc deleted >

>you can write a program that looks this address up in the /unix
>namelist and gives a clue for what the process is waiting. 

	I'm not familiar with what's distributed with Microport, but if
'crash' is included, the command "ds address", where "address" is the WCHAN,
or event address, will return the name and offset from the nearest symbol
to that address, hopefully the name of the sleep queue on which the process 
is sleeping, e.g. "physio +2".  Of course, this still doesn't allow you to
kill the process; as Steve points out, anything sleeping at a priority less
than (greater than?) PZERO will not be awakened to process a signal.  Only
wakeup() will do that . . .

Rick Butland <rwb@viusys>

carlj@hpcvmb.HP (Carl Johnson) (05/02/88)

>In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes:
>>Can anyone enlighten me as to what causes a process to become "immortal"
>>in System VR2,  or Microport UNIX System V/AT, to be more specific?
>>
>>I have encountered this a number of times, where it would be impossible
>>even for root to kill a process; if the parent process of the "immortal"
>>process is killed, the child attaches itself to init, PID 1.

>The only processes (I can think of) that cannot be killed, even with
>signal 9, are *DEFUNCT* processes and processes suspended waiting on I/O.
>Since the above case is evidently not a DEFUNCT, I would suspect an I/O
>problem.  Since there was no mention of what the process is doing, its
>tough to determine.  Is the process a hung getty?  How about a little more
>info.
>
>Ken Keirnan - Pacific Bell - {att,bellcore,sun,ames,pyramid}!pacbell!pbhyf!kjk
>  San Ramon, California	                    kjk@pbhyf.PacBell.COM

Since I have seen what sounds like the same problem, I'll mention what I've
seen.  Every time I have seen it it has been when I have been using more
(or less or pg) as a filter when listing some output.  Since the process
has always been on the output of a pipe, I have always assumed it was
a problem with pipes.  In every case the virtual console will accept and
echo input, but it doesn't respond to it.  The only solution has been to
switch to another virtual console and kill the parent of the process (which
I think has always been a login shell).  This then allows me to switch
back and log in again, and leaves the offending process in the background
with a PPID of 1.  I doubt this is a common problem with other systems,
since without virtual consoles I don't see any way to get control back
short of re-booting.

Carl Johnson - Hewlett-Packard Co. - ...!hplabs!hp-pcd!carlj

root@uwspan.UUCP (Sue Peru Sr.) (05/03/88)

+---- Wolf Paul writes in <3951@killer.UUCP> 
| Can anyone enlighten me as to what causes a process to become "immortal"
| in System VR2,  or Microport UNIX System V/AT, to be more specific?
+----

Microport users out there with the BETA everex tape driver try this:

	find . -print | cpio -ocv | strm -o /dev/rmt0

	with a WRITE PROTECTED tape!

	after the cpio starts spitting out the filenames, strm will
	hang (because of the R/O tape).  You can kill the find, the cpio,
	but not the strm!  Not only can't you kill it, you can't even
	start up another one cuz it has not released  /dev/rmt0.

	I'd like some more info on this before I bother Microport with it -
	is it just me?... :-)

While I'm at it, would y'all send me any lists of bugs you have found - I'd
like to compile a "user" buglist to contrast with the Microport supplied one...

	(mail to plocher@puff.cs.wisc.edu - this is the most reliable mail
	 site I have access to - Internet, uucp, cs-net, and bitnet mail
	 all can find this spot...)

  -John Plocher

	    ---- ...This bears repeating from time to time... ----

These are the automatic mailings avaliable to you.  To have one mailed
to you automatically, send a mail message to

			    microport@uwspan.uucp
				   -or-
		      ...!uwvax!geowhiz!uwspan!microport

with the the subject described below.

The Subject: field should be	What will get mailed back to you	Size
----------------------------	--------------------------------	----
"Subject: send info"		Introduction and newsgroup guidelines   (~ 7K)
"Subject: send buglist"		"Up to date" lists of all reported bugs	(~20K)
				( Last modified Feb 1988 )
"Subject: send version"		Modification dates of the above lists	(~ 1K)

If you already have a copy of the bug lists, you should request the version
message to see if you really need a new copy of the bug lists.  (note the
size difference!)

i.e.:  To get a message containing the times that the buglists were last
       updated you need to send a message like this:

	% mail microport@uwspan.uucp
	Subject: Please send the version list
	Thanks
	^D
	%

The body of the message ("Thanks") is ignored.

-- 
Comp.Unix.Microport is now unmoderated!  Use at your own risk :-)

wes@obie.UUCP (Barnacle Wes) (05/03/88)

In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes:
| You ask about processes that refuse to die.  (Calling them "immortal"
| confers a positive aura that is probably undeserved.  Normally these
| processes are in a useless state, and might better be referred to as
| members of the "undead".)

The canonical term for such a process is "zombie."
-- 
    /\              -  "Against Stupidity,  -    {backbones}!
   /\/\  .    /\    -  The Gods Themselves  -  utah-cs!uplherc!
  /    \/ \/\/  \   -   Contend in Vain."   -   sp7040!obie!
 / U i n T e c h \  -       Schiller        -        wes

sarima@gryphon.CTS.COM (Stan Friesen) (05/04/88)

In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes:
>Can anyone enlighten me as to what causes a process to become "immortal"
>in System VR2?
>
>I have encountered this a number of times, where it would be impossible
>even for root to kill a process;
>
>The only way to get rid of such an immortal process seems to be to reboot,
>which is rather drastic.
>
>What causes a process to refuse to die? I thought signal 9 (kill) could
>not be intercepted or ignored?
>
	A process that is suspended in the kernal waiting for the completion
of a block I/O request will not continue, even for a signal, until the I/O
has terminated. If a block device fails without generating a failure status
this can result in an immortal process. This is most common with tape drives.
Is there any particular device that is being accessed by all your immortal
processes? If so it may be having hardware problems. 
	There is no way that I know of getting rid of these things short
of rebooting the system. Most of the time these processes are harmless
and can be left alone until you would be rebooting anyway. So, unless
they are causing problems, like locking up your tape drive, don't bother
rebooting. Just try to find the base cause and remove it.
-- 
Sarima Cardolandion			sarima@gryphon.CTS.COM
aka Stanley Friesen			rutgers!marque!gryphon!sarima
					Sherman Oaks, CA

chip@pedsga.UUCP (05/05/88)

In article <3967@killer.UUCP> richardh@killer.UUCP writes:
>In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
>> Can anyone enlighten me as to what causes a process to become "immortal"
>> in System VR2,  or Microport UNIX System V/AT, to be more specific?
>> I have encountered this a number of times, where it would be impossible
>> even for root to kill a process; 
>While I've never seen this under Microport SYS V/AT, I have seen it under
>Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command
>kill -9 pid
>executed by root didn't remove the process entry from the ps display. However

I have seen this too, on Xelos, Concurrent's port of SVR2.  It seems
that when a process is flow controlled off, no amount of killing by
root would remove the process.  I originally had this problem trying
to get an imagen running over NTS, a LAN in our building.  NTS
has the ability to ignore (pass through) or handle flow control (^S, ^Q).  
I originally had the NTS process flow control (other options were wrong
as well).  When the imagen driver filled the NTS buffer, it would flow 
control the driver.  For reasons unknonst to me, it would never flow
control the driver back on.  I was stuck with a process I couldn't kill.
I don't know the kernel software that well, but I guess that even though
signals were arriving for the process, the kernel would not reschedule it.

-- 
         Chip ("My grandmother called me Charles once. ONCE!!") Maurer
     Concurrent Computer Corporation, Tinton Falls, NJ 07724 (201)758-7361
        uucp: {mtune|purdue|rutgers|princeton|encore}!petsd!pedsga!chip
                       arpa: pedsga!chip@UXC.CSO.UIUC.EDU

guy@gorodish.Sun.COM (Guy Harris) (05/07/88)

> | You ask about processes that refuse to die.  (Calling them "immortal"
> | confers a positive aura that is probably undeserved.  Normally these
> | processes are in a useless state, and might better be referred to as
> | members of the "undead".)
> 
> The canonical term for such a process is "zombie."

Wrong.  A "zombie" is a process that has already completed dying, but whose
corpse hasn't been picked up by its parent yet.  The corpse has already been
picked clean (it has no address space, for instance).  This is a misuse of the
term "zombie", but we're stuck with it.

A *live* process that refuses to die, which is what was originally being
discussed, is a different matter.  A very common cause of this is a driver that
blocks for a very long time - possibly forever - with a priority less than or
equal to PZERO.

arosen@eagle.ulowell.edu (MFHorn) (05/07/88)

In article <52288@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>A *live* process that refuses to die, which is what was originally being
>discussed, is a different matter. A very common cause of this is a driver that
>blocks for a very long time - possibly forever - with a priority less than or
>equal to PZERO.

On a Sequent Balance 21K with 6 processors, we recently had a user with a
program that failed to exit properly.  It seemed to get stuck when it tried
to exit.  The annoying thing was each time he ran it, we'd lose one of our
processors (whichever one tried to perform the exit).  Since the process was
in kernel mode, it couldn't receive any signals.  After it was run a few
times, the machine was 6 times slower than usual; we had to reboot.

Would a program that does the following get rid of the process?

1: Gets the process' proc struct from the kernel.
2: Changes fields like the status, priority, cpu usage, wchan, exit status
   and maybe others so the kernel will have good reason to terminate the
   process.
3: Writes the new struct back out (open /dev/mem for write, lseek, write).

If something along these lines would work, it should carry over to most
unixes since they all should have the same or similar fields in the proc
struct.

I've written programs that change a process' proc struct; it's proabably
not a good idea (you should be _very_ careful if you try it), but it
does work.  [it can be pretty fun.  "Ok, let's make this vi privileged..."]

I'd like people's opinions before I start trying to create some immortal
processes to nuke.

Andy Rosen           | arosen@hawk.ulowell.edu | "I got this guitar and I
ULowell, Box #3031   | ulowell!arosen          |  learned how to make it
Lowell, Ma 01854     |                         |  talk" -Thunder Road
                   RD in '88 - The way it should be

guy@gorodish.Sun.COM (Guy Harris) (05/07/88)

> On a Sequent Balance 21K with 6 processors, we recently had a user with a
> program that failed to exit properly.  It seemed to get stuck when it tried
> to exit.  ...
> 
> Would a program that does the following get rid of the process?
> <description of how to whack the process table>

Yes, but it also might get rid of your system as well.  As I said, in many
cases this sort of half-dead process is caused by something such as a driver
blocking non-"interruptably" forever while doing a "close".  The driver might
well have a reason why it *didn't* want to be interrupted by a signal; it might
be holding on to some system resource, for example, and be unwilling to be
interrupted without having a chance to release that resource.  Kicking the
process's priority above PZERO, so that you can terminate it with a signal,
might not be a good idea.  (Also, I'm not certain what happens if you send a
signal to a process that's in the middle of "exit" blocked on a "close"; it
might not unjam the process.)

It may be that the process doesn't exit properly due to an OS bug.  If so, you
should try to get it fixed; if, in the interim, you want a workaround and plan
to dink with the process's process table entry, note that this is intrinsically
very dangerous and be prepared to wedge your system well and truly if you try
to do this.

wnp@dcs.UUCP (Wolf N. Paul) (05/08/88)

In article <216@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes:
 >In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes:
 >| You ask about processes that refuse to die.  (Calling them "immortal"
 >| confers a positive aura that is probably undeserved.  Normally these
 >| processes are in a useless state, and might better be referred to as
 >| members of the "undead".)
 >
 >The canonical term for such a process is "zombie."

I always thought that "zombies" refers to dead processes which have not
been waited for, rather than processes which refuse to die ?!?
-- 
Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101
UUCP:     ihnp4!killer!dcs!wnp                 ESL: 62832882
INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP   TLX: 910-280-0585 EES PLANO UD

rac@jc3b21.UUCP (Roger A. Cornelius) (05/09/88)

In article <3967@killer.UUCP>, richardh@killer.UUCP (Richard Hargrove) writes:
> In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes:
> > Can anyone enlighten me as to what causes a process to become "immortal"
> > in System VR2,  or Microport UNIX System V/AT, to be more specific?
> > 
> > I have encountered this a number of times, where it would be impossible
> > even for root to kill a process; 
> 
> Wolf,
> 
> While I've never seen this under Microport SYS V/AT, I have seen it under
> Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command

We have this problem on our Altos using xenix 3.3a1.  The only way I've found
to free the terminal after this happens is to either re-boot, or null out the
/etc/utmp file (this fools the system into thinking no-one is logged on), so
I can disable and re-enable the terminal.  The latter is better than telling
15-20 users they have to log off while we do a shutdown.  

Does anyone know what problems this could cause (besides 'who' not returning
anything)?  And wouldn't writing a program that removes the entry for the
locked terminal from /etc/utmp work as well?  Or maybe there's a better way?

Roger Cornelius
-- 
                +---------- Roger Cornelius -----------+
                |            (813)347-4399             |
                | ...gatech!codas!usfvax2!jc3b21!rac   |
                +-   ...gatech!usfvax2!jc3b21!rac     -+

mike@turing.UNM.EDU (Michael I. Bushnell) (05/09/88)

In article <6832@swan.ulowell.edu> arosen@hawk.ulowell.edu (MFHorn) writes:

[ Recants story about "It wouldn't die!" process...]

>Would a program that does the following get rid of the process?
>
>1: Gets the process' proc struct from the kernel.
>2: Changes fields like the status, priority, cpu usage, wchan, exit status
>   and maybe others so the kernel will have good reason to terminate the
>   process.
>3: Writes the new struct back out (open /dev/mem for write, lseek, write).
>
>If something along these lines would work, it should carry over to most
>unixes since they all should have the same or similar fields in the proc
>struct.

Ack! no!

The whole reason for a sleep that cannot be interrupted is because the
process has some kernel data structure locked.  If you fake it to get
the process killed, then the inode, text entry, whatever, will remain
locked, and you can't ever get at it again.  You could, perhaps, make
your program even smarter, and have it figure out just what things were
locked and unlock them, but remember, they may be partially modified,
and fixing them makes this an even more daunting prospect.  The *real*
solution is to fix the bug in the kernel.  Failing that, you are, well,
hosed.



-- 
                N u m q u a m   G l o r i a   D e o 

			Michael I. Bushnell
			HASA - "A" division
14308 Skyline Rd NE				Computer Science Dept.
Albuquerque, NM  87123		OR		Farris Engineering Ctr.
	OR					University of New Mexico
mike@turing.unm.edu				Albuquerque, NM  87131
{ucbvax,gatech}!unmvax!turing.unm.edu!mike

dg@lakart.UUCP (David Goodenough) (05/09/88)

From article <81@dcs.UUCP>, by wnp@dcs.UUCP (Wolf N. Paul):
]In article <216@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes:
]>In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes:
]>| You ask about processes that refuse to die.  (Calling them "immortal"
]>| confers a positive aura that is probably undeserved.  Normally these
]>| processes are in a useless state, and might better be referred to as
]>| members of the "undead".)
]>
]>The canonical term for such a process is "zombie."
]
]I always thought that "zombies" refers to dead processes which have not
]been waited for, rather than processes which refuse to die ?!?

Singularly appropriate: sort of conjures up images of processes stumbling
round the kernel, with eyes closed, dropping bits of code (read flesh) off
at every step - just like the stereotype zombie in a "B" horror movie :-)

P.S. In response to all those that replied to my questions Re: <defunct>
processes, I discover the solution is simple. After every fclose(fp), where
fp is the FILE * I got from popen, I do a wait(&j), and the zombies go away.
Just like sprinkling holy water on them :-). Thanks to all who replied.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

ewv@zippy.berkeley.edu (Eric Varsanyi) (05/12/88)

In article <468@micropen> dave@micropen (David F. Carlson) writes:
> [...]  No close should allow a process
>to wait forever on a event that may not come.  Signals (kill -9) are
>delivered when a process in kernel mode re-enters user mode.  However,
>you process is waiting in kernel mode and won't get those signals til
>its done: NEVER! (or until the long sought interrupt allows it's WCHAN
>to go again.

This is only true if the driver has slept with a wakeup priority < PZERO.
If the process sleeps above PZERO a signal will trash the kernel context
and return to the user with a EINTR (Interrupted system call).

ok@quintus.UUCP (Richard A. O'Keefe) (05/14/88)

In article <97@lakart.UUCP>, dg@lakart.UUCP (David Goodenough) writes:
> P.S. In response to all those that replied to my questions Re: <defunct>
> processes, I discover the solution is simple. After every fclose(fp), where
> fp is the FILE * I got from popen, I do a wait(&j), and the zombies go away.
> Just like sprinkling holy water on them :-). Thanks to all who replied.

Surely there is some mistake here?  FILE*s returned by popen() are ONLY
supposed to be closed by pclose().  To quote the manual:
     A stream opened by popen should be closed by pclose, which
     waits for the associated process to terminate and returns
     the exit status of the command.
NEVER close popen()ed files with fclose()!

learn@igloo.UUCP (william vajk) (05/17/88)

In article <379@jc3b21.UUCP>, rac@jc3b21.UUCP (Roger A. Cornelius) writes:
 
> Does anyone know what problems this could cause (besides 'who' not returning
> anything)? 

While not locking a terminal, the following simple shell script creates
an unkillable process (pg) if run in background :


	tail -20 <foo> | pg

I'm sure there are plenty of other examples. This just happens to be one
I found on uport and a 3b2.

Bill Vajk                                                   learn@igloo