kthompso@ptolemy.arc.nasa.gov (Kevin Thompson) (11/22/89)
[SunOS 4.0.1, 3/60, usually tcsh but have tried with csh and sh also] Under what conditions does kill -9 not kill a process? I have some processes that are completely un-killable, I've tried /bin/kill and built-in kill, with just about any accepted signal including -9 and -TERM, with no effect at all. I don't want to reboot (though I'm coming close). And yes I RTFM'ed, no indication that -9 should ever fail. If it matters, I started these processes with the shell script: ======================================== #!/bin/csh # Usage: labother machine file1 [file2] ... set machine = $argv[1] shift rsh $machine -n "(cd $cwd; labtests $argv)" ======================================== Where labtests is another shell script, whose meat is a command of the form: echo '(test-lab "'$file'")' | nice labyrinth -batch > /dev/null where labyrinth is a lisp dump. I've guessed (no I'm no unix wizard): -- it could have something to do with the way I call rsh, I've since put in I/O redirections to /dev/null, too early to determine if that's more robust, since this problem is sporadic. -- something to do with the franz lisp dump??? Dubious, I can't kill the csh process either. -- something to do with the process being nice'd -- something to do with the process having output put to /dev/null. -- something weird about our network set-up, our support group is 'in transition'. Ok, I'm grasping. At any rate, I now have from ps -ux this: ======================================== USER PID %CPU %MEM SZ RSS TT STAT START TIME COMMAND kthompso 11847 16.6 5.8 64 440 p2 S 10:39 0:00 /bin/csh -c ps -ux kthompso 11555 0.0 3.1 64 232 ? D 19:41 0:00 csh /usr/kthompso kthompso 11568 0.0 1.9 64 144 ? D 19:56 0:00 csh /usr/kthompso kthompso 11580 0.0 1.9 64 144 ? D 19:56 0:00 csh /usr/kthompso kthompso 11579 0.0 0.0 0 0 ? Z Nov 7 0:00 <defunct> kthompso 11854 0.0 5.5 136 416 p2 R 10:39 0:00 ps -ux kthompso 11554 0.0 0.0 0 0 ? Z Nov 7 0:00 <defunct> kthompso 11845 0.0 12.1 424 912 p2 S 10:33 0:10 emacs kthompso 11536 0.0 0.0 0 0 ? Z Nov 7 0:00 <defunct> kthompso 11561 0.0 0.0 0 0 ? Z Nov 7 0:00 <defunct> kthompso 11543 0.0 3.1 64 232 ? D 19:41 0:00 csh /usr/kthompso kthompso 11797 0.0 3.6 128 272 p2 I 09:46 0:04 -tcsh (tcsh) ======================================== and 11555,11568,11580,11543 are completely un-killable. The common feature they have is being 'STAT=D' (disk wait), so maybe that's a problem. I had no problem killing the 'in.rshd' process. Sorry for the length, I was trying to be complete. Posted or mail replies equally welcome, hope this isn't a common one, I've never seen discussion about it. Kevin Thompson -- kthompso@ptolemy.arc.nasa.gov Sterling Software/Nasa-Ames Research Center
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/23/89)
In article <3046@bayes.ptolemy.arc.nasa.gov> kthompso@ptolemy.arc.nasa.gov (Kevin Thompson) writes: >Under what conditions does kill -9 not kill a process? When they're sleeping (in the KERNEL) with priority less than P_ZERO. Many device drivers do not time out when an operation blocks forever; I consider that a bug in those device drivers, although I understand why they tend to be written that way.