gatesl@mist.cs.orst.edu (Lee Gates) (02/19/91)
As a class project, I am working on a modification to the BSD 4.3 source code to allow one to kill uninteruptable processes. It would seem that we are at a bit of a standpoint. Initially, I thought that I could have the kernel raise the priority of the suspect process in the psignal() call, which after setting it to run, would allow the process to release the resources it was sleeping with, and exit gracefully, as I would post the kill signal before letting it run again. The others in my group have questioned this, and now I have begun to wonder if it will work. Will the above method cause a race condition resulting from the fact that the process probably assumes that the next time it runs it will have the resource it was sleeping on? And if so, I would appreciate some other suggestions as to how to solve this problem. thanx -- lee gatesl@prism.cs.orst.edu "having fun watching Oregonians rust" ------------randomly-chosen-drink/quote/simpsons'-quote--------------- "If you choose not to decide you still have made a choice." - Geddy Lee
neil@pio.gid.co.uk (Neil Todd) (02/20/91)
In the referenced article gatesl@mist.cs.orst.edu (Lee Gates) writes: | | As a class project, I am working on a modification to the | BSD 4.3 source code to allow one to kill uninteruptable processes. (rest deleted) A while back (~4-5 yrs) Chris Torek (I think) produced a nice little patch to the 4.3 kernel to kill groups of run away (and rapidly spawning) processes - this was the 'zonk' system call. You could probably gain an insight into your problem by looking at this. The catch is that I don't have access to the machine that I installed the patch on. Zonk was very useful, especially on Student/teaching machines - one could guarantee that some bright spark would experiement with self spawining processes, zonk would kill all jobs owned by a particular UID stone dead. Neil
rock@cbnews.att.com (Y. Rock Lee) (02/21/91)
In article <19065@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: >Just for kicks, imagine some real slow device has been set up to >do a DMA transfer to some physical address that is held by the >process which is unkillable. Imagine that you kill that process >and it exits. Imagine the I/O completes and someone elses >memory gets trashed. All that and more ... Please excuse my ignorance on the block devices (most of the time I work on character/streams device). Forcibly awaking a process doing read/write (DMA transter) will either give the process a buffer with garbage data or throw away a buffer containing valid data. How can this trash someone else's memory? Y. Rock Lee, att!cblph!rock rock@cblph.ATT.COM
rock@cbnews.att.com (Y. Rock Lee) (02/21/91)
In article <1991Feb20.232118.11035@odin.diku.dk> thorinn@diku.dk (Lars Henrik Mathiesen) writes: >The first class you probably shouldn't mess with. If you're lucky, >removing the sleeping process will only result in the loss of some >buffer. In worse cases, you get permanently un-openable devices or >crashes. The real cure for these is to rewrite _each_case_ to sleep at >interruptible priority and clean up properly (more than a class >project, I think). [this is a guess, not an argument] The "permanently un-openable devices" can only happen in the case of open. Because open wasn't "complete" so the close call in the exit cannot do a correct clean up. Please correct me if I miss something. Y. Rock Lee, att!cblph!rock rock@cblph.ATT.COM
rock@cbnews.att.com (Y. Rock Lee) (02/21/91)
In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes: > Will the above method cause a race condition resulting from >the fact that the process probably assumes that the next time it runs >it will have the resource it was sleeping on? And if so, I would >appreciate some other suggestions as to how to solve this problem. Yes, the process will think it has the resource it was sleeping on. But, it will be killed and release the resource during its exit before it has a chance to "think". This part looks OK to me. My only concern is that the driver of the particular device which the process is waiting for may react crazily when it is misinformed (a good driver should guard against this). Y. Rock Lee, att!cblph!rock rock@cblph.ATT.COM
torek@elf.ee.lbl.gov (Chris Torek) (02/21/91)
In article <4066@stl.stc.co.uk> "Neil Todd" <neil@pio.gid.co.uk> writes: >A while back (~4-5 yrs) Chris Torek (I think) produced a nice little >patch to the 4.3 kernel to kill groups of run away (and rapidly >spawning) processes - this was the 'zonk' system call. ``Not I,'' said the pig. (Since I just ate half a dozen chocolate chip cookies, I think I qualify. :-) ) Seriously: I never produced this particular bletcherous hack. (I am responsible for a number of other, different bletcherous hacks, but not this one.) If (A) you have SIGSTOP and (B) signals work correctly, the super-user can stop everything, pick out the bad processes, kill them, and then resume everything. (This is a bit tricky to get right, admittedly.) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/22/91)
In article <10112@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: > If (A) you have SIGSTOP and (B) signals work correctly, > the super-user can stop everything, pick out the bad processes, kill > them, and then resume everything. (This is a bit tricky to get right, > admittedly.) What's tricky about it? #include <sys/time.h> #include <sys/resource.h> #include <signal.h> #include <stdio.h> #include <errno.h> extern int errno; main(argc,argv,envp) /* invoke as, e.g., zonk /bin/csh csh -f; untested */ int argc; char *argv[]; char *envp[]; { if (getuid()) { fprintf(stderr,"zonk: fatal: uid not 0\n"); exit(1); } if (geteuid()) { fprintf(stderr,"zonk: fatal: euid not 0\n"); exit(2); } if (setpriority(PRIO_PROCESS,0,-20)) fprintf(stderr,"zonk: weird: can't set my priority to -20\n"); if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: first kill failed"); if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: second kill failed"); if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: good-luck kill failed"); for (;;) { (void) execve(argv[1],argv + 2,envp); perror("zonk: critical: exec failed, will try again"); sleep(60); } } ---Dan
jfh@rpp386.cactus.org (John F Haugh II) (02/22/91)
In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes: >Forcibly awaking a process doing read/write (DMA transter) will either >give the process a buffer with garbage data or throw away a buffer >containing valid data. How can this trash someone else's memory? DMA addresses typically refer to physical memory. The process requesting the DMA transfer normally is locked in memory before the transfer is requested so that the physical address the controller was told to send the data to will remain valid. If the process dies and the physical memory is reallocated (or page out or swap out or ... occurs), that physical address will be allocated to some other process which isn't expecting to have your DMA transfer sent its way. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "I've never written a device driver, but I have written a device driver manual" -- Robert Hartman, IDE Corp.
pat@orac.pgh.pa.us (Pat Barron) (02/22/91)
In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes: >In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes: >> Will the above method cause a race condition resulting from >>the fact that the process probably assumes that the next time it runs >>it will have the resource it was sleeping on? And if so, I would >>appreciate some other suggestions as to how to solve this problem. > >Yes, the process will think it has the resource it was sleeping on. Uhh, nope. When a particular even occurs, *all* processes waiting on that event are awakened. By the time you run again, someone else may have snarfed up the resource you were waiting for. This has been the case forever (well, at least since V7) - when you come out of a sleep(), you *must* check that the reason you were sleeping is no longer true.... --Pat.
sarima@tdatirv.UUCP (Stanley Friesen) (02/23/91)
In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes: >Forcibly awaking a process doing read/write (DMA transter) will either >give the process a buffer with garbage data or throw away a buffer >containing valid data. How can this trash someone else's memory? Scenario - forcibly wake-up a process waiting on a slow device, process mucks about with unfilled buffer, and *releases* it. process probably dies due to signal the woke it up. another process acquires the buffer from the free pool (Perhaps intending to use it for paging) slow device finally finishes, putting results into buffer. new process returns the buffer or (worse) uses it as a new page. VOILA - corrupted stuff in the new process. -- --------------- uunet!tdatirv!sarima (Stanley Friesen)
jfh@greenber.austin.ibm.com (John F Haugh II) (02/23/91)
In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes: >Yes, the process will think it has the resource it was sleeping on. >But, it will be killed and release the resource during its exit >before it has a chance to "think". This part looks OK to me. >My only concern is that the driver of the particular device which >the process is waiting for may react crazily when it is misinformed >(a good driver should guard against this). This just isn't true. A typical sleep loop looks something like while (some_status & some_busy_flag) sleep (&some_status, PRI_O_MINE); some_status |= some_busy_flag; If your only concern is getting this process to ignore the setting of "some_busy_flag", you might be doing the right thing - but remember - "some_status" still has the "some_busy_flag" set. Killing the process will not get that bit clear and if that bit being set is what is hanging the process, the next process to enter that loop is also going to hang. What is needed is an exception routine that understands =exactly= what to do to reset the resource to some well-defined state for any possible state the resource may be in. -- John F. Haugh II | I've Been Moved | MaBellNet: (512) 838-4340 SneakerNet: 809/1D064 | AGAIN ! | VNET: LCCB386 at AUSVMQ BangNet: ..!cs.utexas.edu!ibmchs!auschs!snowball.austin.ibm.com!jfh (e-i-e-i-o)
cks@hawkwind.utcs.toronto.edu (Chris Siebenmann) (02/24/91)
pat@orac.pgh.pa.us (Pat Barron) writes: | This has been the case forever (well, at least since V7) - when you come | out of a sleep(), you *must* check that the reason you were sleeping is | no longer true.... Unless your code is careful to make sure that it is the only one which can consume the particular resource which it is sleeping on; consider, for example, a system call tracing facility, where you have a sepperate process that drains the trace buffers into a disk file. Also, one common strategy for dealing with the 'other people may have taken this resource already' is while (!resource) sleep(resource, priority); If you whack the process and get it to come out of the sleep, and the resource isn't available, the process will just go back to sleep, which isn't particularly helpful for the original poster's purpose. -- V9: the kernel where you can do fgrep <something> */*.[ch] and not get "Arguments too long". cks@hawkwind.utcs.toronto.edu ...!{utgpu,utzoo,watmath}!utgpu!cks