wnp@killer.UUCP (Wolf Paul) (04/28/88)
Can anyone enlighten me as to what causes a process to become "immortal" in System VR2, or Microport UNIX System V/AT, to be more specific? I have encountered this a number of times, where it would be impossible even for root to kill a process; if the parent process of the "immortal" process is killed, the child attaches itself to init, PID 1. The only way to get rid of such an immortal process seems to be to reboot, which is rather drastic. What causes a process to refuse to die? I thought signal 9 (kill) could not be intercepted or ignored? Any comments welcome. Wolf Paul -- Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101 UUCP: ihnp4!killer!dcs!wnp ESL: 62832882 INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP TLX: 910-280-0585 EES PLANO UD
wtr@moss.ATT.COM (04/29/88)
In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes: >Can anyone enlighten me as to what causes a process to become "immortal" >in System VR2, or Microport UNIX System V/AT, to be more specific? basically the way i do it is to have a shell script that runs something in the background. when the shell script puts the process in the backgroun and goes about it's merry own way, (which usually means exiting back out), the background process spawned from the script is given the PPID of 1 because it's former parent is now dead. i've used this to great advantage when i want to run a background job from a terminal, and then logoff and go somewhere else. (yes, I know about nohup, it's priority is too low, and i need to route my standard output, this ways easier) >I have encountered this a number of times, where it would be impossible >even for root to kill a process; if the parent process of the "immortal" >process is killed, the child attaches itself to init, PID 1. > >What causes a process to refuse to die? I thought signal 9 (kill) could >not be intercepted or ignored? wait! don't touch that reset button! there is life beyond shells! you can kill the process (either by root or by the user who started it) but you MUST kill of not only the process, but also all of it's children too! (mass genocide!! ;-) if you dont, any child process under the 'immortal script' is given a new PPID of 1 and thus itself becomes immortal, and may spawn and produce children itself, etc... whip out you trusty 'kill -9' and gun down those suckers! note: the massacre outlined above will produce really nasty effects if the process was any sort of compile, ESPECIALLY a makefile run. good hunting!
vandys@hpindda.HP.COM (Andy Valencia) (04/29/88)
A "classic" way to make an unkillable process is to have it block on an I/O device which isn't going to finish its I/O. The trick is that if it sleep()s with a certain priority or above, signals will unblock it (and thus you get interruptible system calls), but if it's below, then signals can't get to the process until it unblocks. Now all you need is for some I/O operation to get frozen (say, lose an interrupt, or mishandle it), and you have the unkillable process. We are having fun now, ja? Andy Valencia vandys%hpindda.UUCP@hplabs.hp.com
dave@micropen (David F. Carlson) (04/30/88)
In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: > Can anyone enlighten me as to what causes a process to become "immortal" > in System VR2, or Microport UNIX System V/AT, to be more specific? > This "prblem" is not a Micrport issue at all: it is UNIX all the way. > I have encountered this a number of times, where it would be impossible > even for root to kill a process; if the parent process of the "immortal" > process is killed, the child attaches itself to init, PID 1. > What causes a process to refuse to die? I thought signal 9 (kill) could > not be intercepted or ignored? If you are technically minded and want a real answer read: "The Design of the UNIX Operating System" by Maurice Bach. The quick answer is that any process that is in the kernel with a WCHAN will not go back to user mode until that channel is awoken. Who will awaken it? Two choices: a device driver interrupt or a kernel timer interrupt. In all likelihood your ill-behaved process is waiting in a poorly written device driver close(). No close should allow a process to wait forever on a event that may not come. Signals (kill -9) are delivered when a process in kernel mode re-enters user mode. However, you process is waiting in kernel mode and won't get those signals til its done: NEVER! (or until the long sought interrupt allows it's WCHAN to go again. -- David F. Carlson, Micropen, Inc. ...!{ames|harvard|rutgers|topaz|...}!rochester!ur-valhalla!micropen!dave "The faster I go, the behinder I get." --Lewis Carroll
hedrick@athos.rutgers.edu (Charles Hedrick) (04/30/88)
You ask about processes that refuse to die. (Calling them "immortal" confers a positive aura that is probably undeserved. Normally these processes are in a useless state, and might better be referred to as members of the "undead".) Unix, along with many other operating systems, kills processes by telling them to die. You probably envision that kill -9 invokes some code that goes through all the tables ripping out entries for the process. Unfortunately, the kernel isn't organized in such a way that this is possible. Processes may own resources, locks, mapped memory, etc. All of these have to be released validly before the process can safely be removed from the system. Thus a kill starts a surprisingly complex series of events, some of which are executed in the process' own context. If the process is in an inconsistent state, it may be unable to complete these events, and hang in the process of being killed (or killing itself). I've seen this sort of thing happen in many different versions of Unix (including various Berkeley-based Unices), and similar things afflicted TOPS-20. By definition it is caused by a bug in the kernel, typically some sort of race condition or deadly embrace.
friedl@vsi.UUCP (Stephen J. Friedl) (04/30/88)
In article <468@micropen>, dave@micropen (David F. Carlson) writes: < In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: < > Can anyone enlighten me as to what causes a process to become "immortal" < > in System VR2, or Microport UNIX System V/AT, to be more specific? < > < This "problem" is not a Microport issue at all: it is UNIX all the way. < < The quick answer is that any process that is in the kernel with a WCHAN < will not go back to user mode until that channel is awoken. Who will < awaken it? Two choices: a device driver interrupt or a kernel timer < interrupt. In all likelihood your ill-behaved process is waiting in < a poorly written device driver close(). There is a third choice. When a driver calls sleep(), one of the arguments is a sleeping priority. In addition to entering into scheduling considerations, it determines whether or not the sleep() can be interrupted by a signal. If this priority is less than or equal to PZERO (defined in <sys/param.h>), then the driver can't be interrupted, with the converse being true. Different drivers use different priorities. Example from the 3B2, where PZERO is 25. In the tty driver, an open(2) on a port will block until the carrier detect line is seen by the hardware. When the process sleeps on this, its priority is TTOPRI. Since TTOPRI is #defined in <sys/tty.h> as 29, this call is interruptible. To demonstrate this, find a port (say, tty11) that has no cables or processes attached to it. Assuming you have read permissions, cat the device and hit DELETE: # cat < /dev/tty11 (hit DELETE) /dev/tty11: cannot open # Because TTOPRI > PZERO, your interrupt is heeded. Alas, this is not always the case. In the floppy block device open() handler, it sleeps with PRIBIO (#defined in <sys/param.h> to be 20). When you try to (say) mount the floppy, you have to wait for it to succeed or timeout; your interrupt is ignored because PRIBIO < PZERO. I would be interested to hear from driver writers who are more familiar with this: how does one determine whether a sleep should be interruptible or not? Why aren't they all this way (not a plea, just a question)? The cartridge tape driver on the 3B2 obviously runs at a noninterruptible priority because once I type a command that deals with it I sometimes have to wait for the retension pass (usually a couple of minutes) before the interrupt is honored :-(. A side note: WCHAN is a "wait channel", the address on which the sleep() awaits awakenment (I just made that word up), and it is found by the "-l" option to ls. If you are really industrious, you can write a program that looks this address up in the /unix namelist and gives a clue for what the process is waiting. You can't always nail it down, as you really need source to get structure offsets and stuff, but it is instructive to get a clue whether a program is waiting on disk or a tty or whatever. -- Steve Friedl V-Systems, Inc. (714) 545-6442 Resident 3B2 hacker friedl@vsi.com {backbones}!vsi.com!friedl attmail!vsi!friedl
limes@sun.uucp (Greg Limes) (04/30/88)
In article <468@micropen> dave@micropen (David F. Carlson) writes: >In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: >> Can anyone enlighten me as to what causes a process to become "immortal" >> in System VR2, or Microport UNIX System V/AT, to be more specific? >> >This "prblem" is not a Micrport issue at all: it is UNIX all the way. > >> I have encountered this a number of times, where it would be impossible >> even for root to kill a process; if the parent process of the "immortal" >> process is killed, the child attaches itself to init, PID 1. >> What causes a process to refuse to die? I thought signal 9 (kill) could >> not be intercepted or ignored? > >If you are technically minded and want a real answer read: > "The Design of the UNIX Operating System" by Maurice Bach. > >The quick answer is that any process that is in the kernel with a WCHAN >will not go back to user mode until that channel is awoken. Who will >awaken it? Two choices: a device driver interrupt or a kernel timer >interrupt. In all likelihood your ill-behaved process is waiting in >a poorly written device driver close(). No close should allow a process >to wait forever on a event that may not come. Signals (kill -9) are >delivered when a process in kernel mode re-enters user mode. However, >you process is waiting in kernel mode and won't get those signals til >its done: NEVER! (or until the long sought interrupt allows it's WCHAN >to go again. Close, but not quite. It depends on the priority of the process during the sleep, and what the driver does with the return value. If the priority is less than PZERO, nothing happens, the sleep continues to sleep, and all is as you note above; this corresponds to what is called a "short term disk wait", and is usually used for events that are expected with some high probability to happen quickly, or times where cleaning up after an abort is so messy that it cannot be faced. If the sleep priority is above PZERO, the sleep() will return an error corresponding to "I was interrupted!". The device driver is then counted on to clean up, abort whatever it was doing (or set up for a later completion), and report error status if any to its caller. This situation corresponds to longer term sleeps, like reading from a socket, tty, or some other "slow" device. Note that all processes that are sleeping have a WCHAN, that is how they are woken up; if a signal is delivered to a process, it is woken up independent of the value of its WCHAN. -- Greg Limes [limes@sun.com] frames to /dev/fb
limes@sun.uucp (Greg Limes) (04/30/88)
In article <625@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes: > > I would be interested to hear from driver writers who are >more familiar with this: how does one determine whether a sleep >should be interruptible or not? Why aren't they all this way >(not a plea, just a question)? Sometimes you sleep in places where it is difficult (or impossible) to clean up after an abort. You previously verified that the controller is out there, dammit, and it is *going* to respond, and there is just too much to clean up in an abort, or there is no way to abort the controller, or maybe you just do not want to spend the time to make abort work in *this* instance which is (grin) never going to happen anyway ... well, you get the picture. Sometimes it is just laziness. > The cartridge tape driver on the >3B2 obviously runs at a noninterruptible priority because once I >type a command that deals with it I sometimes have to wait for >the retension pass (usually a couple of minutes) before the >interrupt is honored :-(. Worse yet, on some SCSI-based systems, when the tape drive starts rewinding, the SCSI bus is locked until the operation completes. Kind of messy when your swap disk is out there on the SCSI. > A side note: WCHAN is a "wait channel", the address on which >the sleep() awaits awakenment (I just made that word up), and it is >found by the "-l" option to ls. If you are really industrious, >you can write a program that looks this address up in the /unix >namelist and gives a clue for what the process is waiting. You >can't always nail it down, as you really need source to get >structure offsets and stuff, but it is instructive to get a clue >whether a program is waiting on disk or a tty or whatever. some versions of "ps" do this lookup for you ... at least, SunOS 4.0 does now. shocked the heck out of me the first time the WCHANS started coming up "socket", "pause", "select", and so on. Seems nowadays that most everything sleeps on "select" on my workstation. -- Greg Limes [limes@sun.com] frames to /dev/fb
pjh@mccc.UUCP (Pete Holsberg) (04/30/88)
In article <468@micropen> dave@micropen (David F. Carlson) writes: ...In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: ...> Can anyone enlighten me as to what causes a process to become "immortal" ...> in System VR2, or Microport UNIX System V/AT, to be more specific? ...> ...This "prblem" is not a Micrport issue at all: it is UNIX all the way. ... ...> I have encountered this a number of times, where it would be impossible ...> even for root to kill a process; if the parent process of the "immortal" ...> process is killed, the child attaches itself to init, PID 1. ...> What causes a process to refuse to die? I thought signal 9 (kill) could ...> not be intercepted or ignored? ... ...If you are technically minded and want a real answer read: ... "The Design of the UNIX Operating System" by Maurice Bach. ... ...The quick answer is that any process that is in the kernel with a WCHAN ...will not go back to user mode until that channel is awoken. Who will ...awaken it? Two choices: a device driver interrupt or a kernel timer ...interrupt. In all likelihood your ill-behaved process is waiting in ...a poorly written device driver close(). No close should allow a process ...to wait forever on a event that may not come. Signals (kill -9) are ...delivered when a process in kernel mode re-enters user mode. However, ...you process is waiting in kernel mode and won't get those signals til ...its done: NEVER! (or until the long sought interrupt allows it's WCHAN ...to go again. This happens frequently on my 3B2/400 when it gets into a deadly embrace with my modem: I cannot kill -9 any of the processes associated with that port! It takes toggling the modem's ON/OFF switch to break the embrace. Surely there must be a better way! ??
chris@mimsy.UUCP (Chris Torek) (04/30/88)
In article <51443@sun.uucp> limes@sun.uucp (Greg Limes) writes: >If the sleep priority is above PZERO, the [signalled] sleep() will return >an error corresponding to "I was interrupted!". Unless Sun has made some big kernel changes recently, this is not the case. See /sys/sys/kern_synch.c, at the label `psig' in sleep(). Returning an error from sleep would be a viable alternative to `catch' and `throw' routines, although it would entail more work. Every driver that now sleeps interruptably might read while ((foo->status & READY) == 0) { if (sleep((caddr_t)foo, PFOO)) return (EINTR); } and one would be safe in ignoring the (new) return value from sleep iff the sleep is uninterruptable. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
kjk@pbhyf.PacBell.COM (Ken Keirnan) (05/01/88)
In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes: >Can anyone enlighten me as to what causes a process to become "immortal" >in System VR2, or Microport UNIX System V/AT, to be more specific? > >I have encountered this a number of times, where it would be impossible >even for root to kill a process; if the parent process of the "immortal" >process is killed, the child attaches itself to init, PID 1. > The only processes (I can think of) that cannot be killed, even with signal 9, are *DEFUNCT* processes and processes suspended waiting on I/O. Since the above case is evidently not a DEFUNCT, I would suspect an I/O problem. Since there was no mention of what the process is doing, its tough to determine. Is the process a hung getty? How about a little more info. Ken Keirnan -- Ken Keirnan - Pacific Bell - {att,bellcore,sun,ames,pyramid}!pacbell!pbhyf!kjk San Ramon, California kjk@pbhyf.PacBell.COM
richardh@killer.UUCP (Richard Hargrove) (05/01/88)
In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: > Can anyone enlighten me as to what causes a process to become "immortal" > in System VR2, or Microport UNIX System V/AT, to be more specific? > > I have encountered this a number of times, where it would be impossible > even for root to kill a process; Wolf, While I've never seen this under Microport SYS V/AT, I have seen it under Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command kill -9 pid executed by root didn't remove the process entry from the ps display. However repeated invocations of "ps -elf" indicated that the process was always inactive and that it had a nice value of 20. Of course this could be subject to error since my sample rate didn't approach the system's time-slice quantum ;-). Actually there were two different activations of the same program, the quite large Intel tool bld386 - both of which had terminated abnormally due to system errors (ran out of disk space.) Also, there didn't appear to be any real system performance degradation (80286-based '(U|Xe)nix' systems suffer performance degradation very rapidly as I'm sure you've observed.) Not having access to source code, I was left to speculate on what I observed. I came to the conclusion that the actual processes were gone, but some table or tables maintained by the kernel had been corrupted. I'm assuming that ps reports only what it finds in the table(s) and that it doesn't check their validity. As you experienced, rebooting the system cleared up everything. If my diagnosis is correct, I know of no other way to clear up the problem, though I would like to more about what was going on. richard hargrove ...!{ihnp4 | codas | cbosgd}!killer!richardh --------------------------------------------
jpayne@cs.rochester.edu (Jonathan Payne) (05/02/88)
In article <3967@killer.UUCP> richardh@killer.UUCP (Richard Hargrove) writes: >In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: >> Can anyone enlighten me as to what causes a process to become "immortal" >> in System VR2, or Microport UNIX System V/AT, to be more specific? >> >> I have encountered this a number of times, where it would be impossible >> even for root to kill a process; >Not having access to source code, I was left to speculate on what I observed. >I came to the conclusion that the actual processes were gone, but some >table or tables maintained by the kernel had been corrupted. I'm assuming >that ps reports only what it finds in the table(s) and that it doesn't check >their validity. As you experienced, rebooting the system cleared up >everything. If my diagnosis is correct, I know of no other way to clear up >the problem, though I would like to more about what was going on. > >richard hargrove >...!{ihnp4 | codas | cbosgd}!killer!richardh >-------------------------------------------- I believe the story goes something like this. The process is sleeping at a priority that is too high (or low) to be interrupted by a software interrupt. That is, while in kernel mode the process did a sleep(chan1, PRI), but nothing has come along to wake it up (with wakeup(chan1)). Sending a signal can't wake up a process that is sleeping in this manner. I believe something like this happened to me several years ago when my pty was expecting a ^Q (because somehow it got a ^S ...) and I got disconnected but it was sleeping in the tty driver waiting for that ^Q. Software interrupts, I believe, are checked whenever a process is schedule for running. Sending the signal sets some bit in the process structure, and when the process is next schedule those bits will be checked. The problem is that the process is still sleeping, waiting for some event, like a ^Q or some other kind of interrupt, and unfortunately that interrupt may never come for some reason (like I was disconnected from the pty - this bug is fixed, I think). (I pretty sure about this ...)
vrs@littlei.UUCP (vrs) (05/02/88)
In article <9233@sol.ARPA> jpayne@cs.rochester.edu (Jonathan Payne) writes: >>In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: >>> Can anyone enlighten me as to what causes a process to become "immortal" >>> in System VR2, or Microport UNIX System V/AT, to be more specific? >>> > >I believe the story goes something like this. The process is sleeping at >a priority that is too high (or low) to be interrupted by a software >interrupt. That is, while in kernel mode the process did a sleep(chan1, >PRI), but nothing has come along to wake it up (with wakeup(chan1)). This is nearly always because a device wants to write output and the connection has been lost. The driver fails to flush pending output (and/or new output) after the connection goes down. There is another scenario worth worrying about during driver design: even if the driver sleeps at a low priority (as it does in the usual tty line discipline), a kill will cause the process to try to exit(). The exit() will mask off all signals and close all files. When it closes the device with the lost connection, it sleeps AGAIN, this time with signals ignored. We've done a fair bit of work on our Multibus drivers since XENIX 3.4 :-).
rwb@viusys.UUCP (Rick) (05/02/88)
In article <625@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes: >In article <468@micropen>, dave@micropen (David F. Carlson) writes: >< In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: >< > Can anyone enlighten me as to what causes a process to become "immortal" >< > in System VR2, or Microport UNIX System V/AT, to be more specific? < much good info on PZERO, sleep(), etc deleted > >you can write a program that looks this address up in the /unix >namelist and gives a clue for what the process is waiting. I'm not familiar with what's distributed with Microport, but if 'crash' is included, the command "ds address", where "address" is the WCHAN, or event address, will return the name and offset from the nearest symbol to that address, hopefully the name of the sleep queue on which the process is sleeping, e.g. "physio +2". Of course, this still doesn't allow you to kill the process; as Steve points out, anything sleeping at a priority less than (greater than?) PZERO will not be awakened to process a signal. Only wakeup() will do that . . . Rick Butland <rwb@viusys>
carlj@hpcvmb.HP (Carl Johnson) (05/02/88)
>In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes: >>Can anyone enlighten me as to what causes a process to become "immortal" >>in System VR2, or Microport UNIX System V/AT, to be more specific? >> >>I have encountered this a number of times, where it would be impossible >>even for root to kill a process; if the parent process of the "immortal" >>process is killed, the child attaches itself to init, PID 1. >The only processes (I can think of) that cannot be killed, even with >signal 9, are *DEFUNCT* processes and processes suspended waiting on I/O. >Since the above case is evidently not a DEFUNCT, I would suspect an I/O >problem. Since there was no mention of what the process is doing, its >tough to determine. Is the process a hung getty? How about a little more >info. > >Ken Keirnan - Pacific Bell - {att,bellcore,sun,ames,pyramid}!pacbell!pbhyf!kjk > San Ramon, California kjk@pbhyf.PacBell.COM Since I have seen what sounds like the same problem, I'll mention what I've seen. Every time I have seen it it has been when I have been using more (or less or pg) as a filter when listing some output. Since the process has always been on the output of a pipe, I have always assumed it was a problem with pipes. In every case the virtual console will accept and echo input, but it doesn't respond to it. The only solution has been to switch to another virtual console and kill the parent of the process (which I think has always been a login shell). This then allows me to switch back and log in again, and leaves the offending process in the background with a PPID of 1. I doubt this is a common problem with other systems, since without virtual consoles I don't see any way to get control back short of re-booting. Carl Johnson - Hewlett-Packard Co. - ...!hplabs!hp-pcd!carlj
root@uwspan.UUCP (Sue Peru Sr.) (05/03/88)
+---- Wolf Paul writes in <3951@killer.UUCP> | Can anyone enlighten me as to what causes a process to become "immortal" | in System VR2, or Microport UNIX System V/AT, to be more specific? +---- Microport users out there with the BETA everex tape driver try this: find . -print | cpio -ocv | strm -o /dev/rmt0 with a WRITE PROTECTED tape! after the cpio starts spitting out the filenames, strm will hang (because of the R/O tape). You can kill the find, the cpio, but not the strm! Not only can't you kill it, you can't even start up another one cuz it has not released /dev/rmt0. I'd like some more info on this before I bother Microport with it - is it just me?... :-) While I'm at it, would y'all send me any lists of bugs you have found - I'd like to compile a "user" buglist to contrast with the Microport supplied one... (mail to plocher@puff.cs.wisc.edu - this is the most reliable mail site I have access to - Internet, uucp, cs-net, and bitnet mail all can find this spot...) -John Plocher ---- ...This bears repeating from time to time... ---- These are the automatic mailings avaliable to you. To have one mailed to you automatically, send a mail message to microport@uwspan.uucp -or- ...!uwvax!geowhiz!uwspan!microport with the the subject described below. The Subject: field should be What will get mailed back to you Size ---------------------------- -------------------------------- ---- "Subject: send info" Introduction and newsgroup guidelines (~ 7K) "Subject: send buglist" "Up to date" lists of all reported bugs (~20K) ( Last modified Feb 1988 ) "Subject: send version" Modification dates of the above lists (~ 1K) If you already have a copy of the bug lists, you should request the version message to see if you really need a new copy of the bug lists. (note the size difference!) i.e.: To get a message containing the times that the buglists were last updated you need to send a message like this: % mail microport@uwspan.uucp Subject: Please send the version list Thanks ^D % The body of the message ("Thanks") is ignored. -- Comp.Unix.Microport is now unmoderated! Use at your own risk :-)
wes@obie.UUCP (Barnacle Wes) (05/03/88)
In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes: | You ask about processes that refuse to die. (Calling them "immortal" | confers a positive aura that is probably undeserved. Normally these | processes are in a useless state, and might better be referred to as | members of the "undead".) The canonical term for such a process is "zombie." -- /\ - "Against Stupidity, - {backbones}! /\/\ . /\ - The Gods Themselves - utah-cs!uplherc! / \/ \/\/ \ - Contend in Vain." - sp7040!obie! / U i n T e c h \ - Schiller - wes
sarima@gryphon.CTS.COM (Stan Friesen) (05/04/88)
In article <3951@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes: >Can anyone enlighten me as to what causes a process to become "immortal" >in System VR2? > >I have encountered this a number of times, where it would be impossible >even for root to kill a process; > >The only way to get rid of such an immortal process seems to be to reboot, >which is rather drastic. > >What causes a process to refuse to die? I thought signal 9 (kill) could >not be intercepted or ignored? > A process that is suspended in the kernal waiting for the completion of a block I/O request will not continue, even for a signal, until the I/O has terminated. If a block device fails without generating a failure status this can result in an immortal process. This is most common with tape drives. Is there any particular device that is being accessed by all your immortal processes? If so it may be having hardware problems. There is no way that I know of getting rid of these things short of rebooting the system. Most of the time these processes are harmless and can be left alone until you would be rebooting anyway. So, unless they are causing problems, like locking up your tape drive, don't bother rebooting. Just try to find the base cause and remove it. -- Sarima Cardolandion sarima@gryphon.CTS.COM aka Stanley Friesen rutgers!marque!gryphon!sarima Sherman Oaks, CA
chip@pedsga.UUCP (05/05/88)
In article <3967@killer.UUCP> richardh@killer.UUCP writes: >In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: >> Can anyone enlighten me as to what causes a process to become "immortal" >> in System VR2, or Microport UNIX System V/AT, to be more specific? >> I have encountered this a number of times, where it would be impossible >> even for root to kill a process; >While I've never seen this under Microport SYS V/AT, I have seen it under >Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command >kill -9 pid >executed by root didn't remove the process entry from the ps display. However I have seen this too, on Xelos, Concurrent's port of SVR2. It seems that when a process is flow controlled off, no amount of killing by root would remove the process. I originally had this problem trying to get an imagen running over NTS, a LAN in our building. NTS has the ability to ignore (pass through) or handle flow control (^S, ^Q). I originally had the NTS process flow control (other options were wrong as well). When the imagen driver filled the NTS buffer, it would flow control the driver. For reasons unknonst to me, it would never flow control the driver back on. I was stuck with a process I couldn't kill. I don't know the kernel software that well, but I guess that even though signals were arriving for the process, the kernel would not reschedule it. -- Chip ("My grandmother called me Charles once. ONCE!!") Maurer Concurrent Computer Corporation, Tinton Falls, NJ 07724 (201)758-7361 uucp: {mtune|purdue|rutgers|princeton|encore}!petsd!pedsga!chip arpa: pedsga!chip@UXC.CSO.UIUC.EDU
guy@gorodish.Sun.COM (Guy Harris) (05/07/88)
> | You ask about processes that refuse to die. (Calling them "immortal" > | confers a positive aura that is probably undeserved. Normally these > | processes are in a useless state, and might better be referred to as > | members of the "undead".) > > The canonical term for such a process is "zombie." Wrong. A "zombie" is a process that has already completed dying, but whose corpse hasn't been picked up by its parent yet. The corpse has already been picked clean (it has no address space, for instance). This is a misuse of the term "zombie", but we're stuck with it. A *live* process that refuses to die, which is what was originally being discussed, is a different matter. A very common cause of this is a driver that blocks for a very long time - possibly forever - with a priority less than or equal to PZERO.
arosen@eagle.ulowell.edu (MFHorn) (05/07/88)
In article <52288@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >A *live* process that refuses to die, which is what was originally being >discussed, is a different matter. A very common cause of this is a driver that >blocks for a very long time - possibly forever - with a priority less than or >equal to PZERO. On a Sequent Balance 21K with 6 processors, we recently had a user with a program that failed to exit properly. It seemed to get stuck when it tried to exit. The annoying thing was each time he ran it, we'd lose one of our processors (whichever one tried to perform the exit). Since the process was in kernel mode, it couldn't receive any signals. After it was run a few times, the machine was 6 times slower than usual; we had to reboot. Would a program that does the following get rid of the process? 1: Gets the process' proc struct from the kernel. 2: Changes fields like the status, priority, cpu usage, wchan, exit status and maybe others so the kernel will have good reason to terminate the process. 3: Writes the new struct back out (open /dev/mem for write, lseek, write). If something along these lines would work, it should carry over to most unixes since they all should have the same or similar fields in the proc struct. I've written programs that change a process' proc struct; it's proabably not a good idea (you should be _very_ careful if you try it), but it does work. [it can be pretty fun. "Ok, let's make this vi privileged..."] I'd like people's opinions before I start trying to create some immortal processes to nuke. Andy Rosen | arosen@hawk.ulowell.edu | "I got this guitar and I ULowell, Box #3031 | ulowell!arosen | learned how to make it Lowell, Ma 01854 | | talk" -Thunder Road RD in '88 - The way it should be
guy@gorodish.Sun.COM (Guy Harris) (05/07/88)
> On a Sequent Balance 21K with 6 processors, we recently had a user with a > program that failed to exit properly. It seemed to get stuck when it tried > to exit. ... > > Would a program that does the following get rid of the process? > <description of how to whack the process table> Yes, but it also might get rid of your system as well. As I said, in many cases this sort of half-dead process is caused by something such as a driver blocking non-"interruptably" forever while doing a "close". The driver might well have a reason why it *didn't* want to be interrupted by a signal; it might be holding on to some system resource, for example, and be unwilling to be interrupted without having a chance to release that resource. Kicking the process's priority above PZERO, so that you can terminate it with a signal, might not be a good idea. (Also, I'm not certain what happens if you send a signal to a process that's in the middle of "exit" blocked on a "close"; it might not unjam the process.) It may be that the process doesn't exit properly due to an OS bug. If so, you should try to get it fixed; if, in the interim, you want a workaround and plan to dink with the process's process table entry, note that this is intrinsically very dangerous and be prepared to wedge your system well and truly if you try to do this.
wnp@dcs.UUCP (Wolf N. Paul) (05/08/88)
In article <216@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes: >In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes: >| You ask about processes that refuse to die. (Calling them "immortal" >| confers a positive aura that is probably undeserved. Normally these >| processes are in a useless state, and might better be referred to as >| members of the "undead".) > >The canonical term for such a process is "zombie." I always thought that "zombies" refers to dead processes which have not been waited for, rather than processes which refuse to die ?!? -- Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101 UUCP: ihnp4!killer!dcs!wnp ESL: 62832882 INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP TLX: 910-280-0585 EES PLANO UD
rac@jc3b21.UUCP (Roger A. Cornelius) (05/09/88)
In article <3967@killer.UUCP>, richardh@killer.UUCP (Richard Hargrove) writes: > In article <3951@killer.UUCP>, wnp@killer.UUCP (Wolf Paul) writes: > > Can anyone enlighten me as to what causes a process to become "immortal" > > in System VR2, or Microport UNIX System V/AT, to be more specific? > > > > I have encountered this a number of times, where it would be impossible > > even for root to kill a process; > > Wolf, > > While I've never seen this under Microport SYS V/AT, I have seen it under > Intel Xenix 3.4 (SYS III based). Like you, I was amazed when the command We have this problem on our Altos using xenix 3.3a1. The only way I've found to free the terminal after this happens is to either re-boot, or null out the /etc/utmp file (this fools the system into thinking no-one is logged on), so I can disable and re-enable the terminal. The latter is better than telling 15-20 users they have to log off while we do a shutdown. Does anyone know what problems this could cause (besides 'who' not returning anything)? And wouldn't writing a program that removes the entry for the locked terminal from /etc/utmp work as well? Or maybe there's a better way? Roger Cornelius -- +---------- Roger Cornelius -----------+ | (813)347-4399 | | ...gatech!codas!usfvax2!jc3b21!rac | +- ...gatech!usfvax2!jc3b21!rac -+
mike@turing.UNM.EDU (Michael I. Bushnell) (05/09/88)
In article <6832@swan.ulowell.edu> arosen@hawk.ulowell.edu (MFHorn) writes: [ Recants story about "It wouldn't die!" process...] >Would a program that does the following get rid of the process? > >1: Gets the process' proc struct from the kernel. >2: Changes fields like the status, priority, cpu usage, wchan, exit status > and maybe others so the kernel will have good reason to terminate the > process. >3: Writes the new struct back out (open /dev/mem for write, lseek, write). > >If something along these lines would work, it should carry over to most >unixes since they all should have the same or similar fields in the proc >struct. Ack! no! The whole reason for a sleep that cannot be interrupted is because the process has some kernel data structure locked. If you fake it to get the process killed, then the inode, text entry, whatever, will remain locked, and you can't ever get at it again. You could, perhaps, make your program even smarter, and have it figure out just what things were locked and unlock them, but remember, they may be partially modified, and fixing them makes this an even more daunting prospect. The *real* solution is to fix the bug in the kernel. Failing that, you are, well, hosed. -- N u m q u a m G l o r i a D e o Michael I. Bushnell HASA - "A" division 14308 Skyline Rd NE Computer Science Dept. Albuquerque, NM 87123 OR Farris Engineering Ctr. OR University of New Mexico mike@turing.unm.edu Albuquerque, NM 87131 {ucbvax,gatech}!unmvax!turing.unm.edu!mike
dg@lakart.UUCP (David Goodenough) (05/09/88)
From article <81@dcs.UUCP>, by wnp@dcs.UUCP (Wolf N. Paul): ]In article <216@obie.UUCP> wes@obie.UUCP (Barnacle Wes) writes: ]>In article <Apr.29.21.45.24.1988.6900@athos.rutgers.edu>, hedrick@athos.rutgers.edu (Charles Hedrick) writes: ]>| You ask about processes that refuse to die. (Calling them "immortal" ]>| confers a positive aura that is probably undeserved. Normally these ]>| processes are in a useless state, and might better be referred to as ]>| members of the "undead".) ]> ]>The canonical term for such a process is "zombie." ] ]I always thought that "zombies" refers to dead processes which have not ]been waited for, rather than processes which refuse to die ?!? Singularly appropriate: sort of conjures up images of processes stumbling round the kernel, with eyes closed, dropping bits of code (read flesh) off at every step - just like the stereotype zombie in a "B" horror movie :-) P.S. In response to all those that replied to my questions Re: <defunct> processes, I discover the solution is simple. After every fclose(fp), where fp is the FILE * I got from popen, I do a wait(&j), and the zombies go away. Just like sprinkling holy water on them :-). Thanks to all who replied. -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!adelie!cfisun!lakart!dg +-+-+ | +---+
ewv@zippy.berkeley.edu (Eric Varsanyi) (05/12/88)
In article <468@micropen> dave@micropen (David F. Carlson) writes: > [...] No close should allow a process >to wait forever on a event that may not come. Signals (kill -9) are >delivered when a process in kernel mode re-enters user mode. However, >you process is waiting in kernel mode and won't get those signals til >its done: NEVER! (or until the long sought interrupt allows it's WCHAN >to go again. This is only true if the driver has slept with a wakeup priority < PZERO. If the process sleeps above PZERO a signal will trash the kernel context and return to the user with a EINTR (Interrupted system call).
ok@quintus.UUCP (Richard A. O'Keefe) (05/14/88)
In article <97@lakart.UUCP>, dg@lakart.UUCP (David Goodenough) writes: > P.S. In response to all those that replied to my questions Re: <defunct> > processes, I discover the solution is simple. After every fclose(fp), where > fp is the FILE * I got from popen, I do a wait(&j), and the zombies go away. > Just like sprinkling holy water on them :-). Thanks to all who replied. Surely there is some mistake here? FILE*s returned by popen() are ONLY supposed to be closed by pclose(). To quote the manual: A stream opened by popen should be closed by pclose, which waits for the associated process to terminate and returns the exit status of the command. NEVER close popen()ed files with fclose()!
learn@igloo.UUCP (william vajk) (05/17/88)
In article <379@jc3b21.UUCP>, rac@jc3b21.UUCP (Roger A. Cornelius) writes: > Does anyone know what problems this could cause (besides 'who' not returning > anything)? While not locking a terminal, the following simple shell script creates an unkillable process (pg) if run in background : tail -20 <foo> | pg I'm sure there are plenty of other examples. This just happens to be one I found on uport and a 3b2. Bill Vajk learn@igloo