mark@cygnet.UUCP (Mark Quattrocchi) (03/09/90)
Someone out there must have the answer to this little annoyance... I occasionally get both a <defunct> and <exiting> process after having tip croak on me. This happens when once in a while when I manually try logging into other systems to generate chat scripts. Somehow the remote system sends me some garbage and tip just freaks out. The only way out is to kill my shell and log in again. So now I'm back in and I have two processes left over from the dead tip, one <exiting> on my tty, and a <defunct> one belonging to nobody (these processes exist even before I kill my shell). Even as root I can't kill these damn processes and it leaves me with an unusable modem until I reboot the system. BTW this is on a Sun 3/280 using 4.0.3. Any ideas on how to get out of this state would be great. Mark Quattrocchi {3comvax|oliveb|hplabs}!cygnet!mark
pratap@hpcllcm.HP.COM (Pratap Subrahmanyam) (03/10/90)
Here's my 2 cents worth on this topic. When an process that has a lot of children (like a shell) dies for some reason (like due a kill signal), the OS takes it upon itself to walk down the list of children and reparent them to the root process ( or the init process ). This works fine in most cases. When a child dies it send a signal to its parent (I think it's called a "death_of_child_signal"). When the parent recieves this signal, it resets the process PID table, after doing several other cleanup operations (like closing opne files, pipes etc.. ). Now the PID table, will not contain an entry for the child process. (This is why ps -ef will not show it). However, if there is a race condition, like this .. The child dies soon after the parent is "killed", that is the child dies before it can be reparented. Then the signal that the child sends out, will be lost in space. No process exists to recieve it. Hence it will be there, an orphan. I don't believe that such orphan processes cause a overhead, because evan the OS will not know of their existance. This means that the process never gets scheduled again, etc. I'm not sure what happens to that space allocated to the process image. In any case, in this situation, the PID table, doesn't get updated. That is why you see <defunct> processes with ps -ef. If any one has better (or if this is a bogus ) answers, please post. I'll be interested. - Pratap pratap@hpcllcm.hp.hplabs.com.
ray@ctbilbo.UUCP (Ray Ward) (03/10/90)
In article <1805@cygnet.UUCP> mark@cygnet.UUCP (Mark Quattrocchi) writes: >I occasionally get both a <defunct> and <exiting> process after having >tip croak on me. [...] > Even as root I can't kill these damn processes and it >leaves me with an unusable modem until I reboot the system. From the symptoms you describe, it seems that a likely explanation is that the driver you are using is sleeping, waiting for a high-level interrupt to wake him up. Unfortunately, the communications have somehow croaked, so the interrupt will never come. The driver has set the interrupt level so high that a normal signal will not be able to break him out of his sleep. Rebooting is the only general, reliable method I know of to remedy the problem. Perhaps there should be another command to allow su to interrupt sleeping beauty? (As opposed to ad hoc hacking with the kernel...) -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ray Ward Email: uunet!ctbilbo!ray Voice: (214) 991-8338x226, (800) 331-7032 Fax : (214) 991-8968 =-=-=-=- There _are_ simple answers, just no _easy_ ones. -- R.R. -=-=-=-=
chris@mimsy.umd.edu (Chris Torek) (03/10/90)
In article <6840005@hpcllcm.HP.COM> pratap@hpcllcm.HP.COM (Pratap Subrahmanyam) writes: >... race condition ... The child dies soon after the parent is "killed", >that is the child dies before it can be reparented. Then the signal that >the child sends out, will be lost in space. That would be a kernel bug. Fortunately, those who wrote the kernel were not that sloppy. When a parent exits, its children are passed over to /etc/init (process 1). If they try to exit while they are moving, nothing happens until they finish moving; then they finish exiting and init wait()s for them. Then then go away. >That is why you see <defunct> processes with ps -ef. No. There are two reasons for <defunct> or <exiting> processes: kernel bugs (typically in device drivers), and parent processes that do not wait() for children. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
cpcahil@virtech.uucp (Conor P. Cahill) (03/11/90)
In article <6840005@hpcllcm.HP.COM> pratap@hpcllcm.HP.COM (Pratap Subrahmanyam) writes: [long story deleted] >In any case, in this situation, the PID table, doesn't get updated. That is >why you see <defunct> processes with ps -ef. No. <defunct> processes are simply processes that have died, but have not yet been waited on by thier parent. These processes have an entry in the process table, but no associated data space,etc. BTW- The reason that they stay around in the process table is so that the process exit status, and other such information can be reported to the parent. Since the process do not really exist, there is no way to deliver a signal to them and therefore killing such a process has no effect. The other "unkillable" processes, those that are stuck somewhere in the kernel (usually,if not always, in device driver code) sleeping with a priority < PZERO, are usually stuck there due to some hardware problem, or a device driver bug. /* Disclaimer - this next part may be me smoking some rope, I can't create the problem to test it */ I believe that once stuck there they may get changed to a <defunct> by sending a kill -9. However, they still will not go away until the condition that got them stuck is cleared. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
allbery@NCoast.ORG (Brandon S. Allbery) (03/12/90)
As quoted from <6840005@hpcllcm.HP.COM> by pratap@hpcllcm.HP.COM (Pratap Subrahmanyam): +--------------- | When a child dies it send a signal to its parent (I think it's called a | "death_of_child_signal"). When the parent recieves this signal, it resets the | process PID table, after doing several other cleanup operations (like closing | opne files, pipes etc.. ). Now the PID table, will not contain an entry for | the child process. (This is why ps -ef will not show it). +--------------- The parent process does not do this; the kernel does. +--------------- | However, if there is a race condition, like this .. The child dies soon after | the parent is "killed", that is the child dies before it can be reparented. +--------------- It will still be reparented, to 1 (init). I don't believe the race condition you describe exists. In any case, this does not explain <defunct>; those processes are trapped in process tear-down because an open file (usually a device, sometimes a socket in buggy TCP/IP implementations) can't be closed. It usually indicates a buggy device driver. And SIGCLD/SIGCHLD (depending on religious affiliation ;-) is not the trigger for process cleanup; it's part of that cleanup. The kernel sends it, on behalf of the process, to its parent. In System V, it is sent *only if the parent is expecting to receive it*; I suspect BSD is similar, since most processes could care less about child-death signals whatever the system. +--------------- | If any one has better (or if this is a bogus ) answers, please post. I'll | be interested. +--------------- Done. Although I expect Chris Torek will have some words to say on the subject as well. ;-) ++Brandon -- Brandon S. Allbery (human), allbery@NCoast.ORG (Inet), BALLBERY (MCI Mail) ALLBERY (Delphi), uunet!cwjcc.cwru.edu!ncoast!allbery (UUCP), B.ALLBERY (GEnie) BrandonA (A-Online) ("...and a partridge in a pear tree!" ;-)
deastman@pilot.njin.net (Don Eastman) (03/12/90)
Conor P. Cahill writes: a lot of useful information and the following speculative comment. > /* Disclaimer - this next part may be me smoking some rope, I can't create > the problem to test it */ > > I believe that once stuck there they may get changed to a <defunct> by > sending a kill -9. However, they still will not go away until the > condition that got them stuck is cleared. > This appears to me to be exceedingly unlikely. A process becomes <defunct> as a result of going through the exiting sequence where kernel resources are relinquished. It is very likely that the device driver stuck at an noninterruptable priority is reliant upon some of these resources. It is also not obvious what benefits accrue from making a special case of SIGKILL in this scenario. Thoughts? Don Eastman deastman@pilot.njin.net or ...!rutgers!pilot!deastman