[comp.unix.wizards] Help with 4.3 mod to kill uninteruptable procs.

gatesl@mist.cs.orst.edu (Lee Gates) (02/19/91)

	As a class project, I am working on a modification to the 
BSD 4.3 source code to allow one to kill uninteruptable processes.
	
	It would seem that we are at a bit of a standpoint.  Initially,
I thought that I could have the kernel raise the priority of the suspect
process in the psignal() call, which after setting it to run, would
allow the process to release the resources it was sleeping with, and
exit gracefully, as I would post the kill signal before letting it run
again.  The others in my group have questioned this, and now I have
begun to wonder if it will work.

	Will the above method cause a race condition resulting from
the fact that the process probably assumes that the next time it runs 
it will have the resource it was sleeping on?  And if so, I would
appreciate some other suggestions as to how to solve this problem.

	thanx
-- lee
gatesl@prism.cs.orst.edu	 "having fun watching Oregonians rust"
------------randomly-chosen-drink/quote/simpsons'-quote---------------
"If you choose not to decide you still have made a choice."
		- Geddy Lee

neil@pio.gid.co.uk (Neil Todd) (02/20/91)

In the referenced article gatesl@mist.cs.orst.edu (Lee Gates) writes:
| 
| 	As a class project, I am working on a modification to the 
| BSD 4.3 source code to allow one to kill uninteruptable processes.

(rest deleted)

A while back (~4-5 yrs) Chris Torek (I think) produced a nice little
patch to the 4.3 kernel to kill groups of run away (and rapidly
spawning) processes - this was the 'zonk' system call. You could probably
gain an insight into your problem by looking at this. The catch is that
I don't have access to the machine that I installed the patch on.

Zonk was very useful, especially on Student/teaching machines - one could
guarantee that some bright spark would experiement with self spawining
processes, zonk would kill all jobs owned by a particular UID stone dead.

Neil

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <19065@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes:
>Just for kicks, imagine some real slow device has been set up to
>do a DMA transfer to some physical address that is held by the
>process which is unkillable.  Imagine that you kill that process
>and it exits.  Imagine the I/O completes and someone elses
>memory gets trashed.  All that and more ...

Please excuse my ignorance on the block devices (most of the time
I work on character/streams device). 

Forcibly awaking a process doing read/write (DMA transter) will either
give the process a buffer with garbage data or throw away a buffer
containing valid data. How can this trash someone else's memory?


Y. Rock Lee, att!cblph!rock
	     rock@cblph.ATT.COM

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <1991Feb20.232118.11035@odin.diku.dk> thorinn@diku.dk (Lars Henrik Mathiesen) writes:
>The first class you probably shouldn't mess with. If you're lucky,
>removing the sleeping process will only result in the loss of some
>buffer. In worse cases, you get permanently un-openable devices or
>crashes. The real cure for these is to rewrite _each_case_ to sleep at
>interruptible priority and clean up properly (more than a class
>project, I think).

[this is a guess, not an argument]

The "permanently un-openable devices" can only happen in the case of open.
Because open wasn't "complete" so the close call in the exit cannot do a
correct clean up. Please correct me if I miss something.


Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes:
>	Will the above method cause a race condition resulting from
>the fact that the process probably assumes that the next time it runs 
>it will have the resource it was sleeping on?  And if so, I would
>appreciate some other suggestions as to how to solve this problem.

Yes, the process will think it has the resource it was sleeping on.
But, it will be killed and release the resource during its exit
before it has a chance to "think". This part looks OK to me.
My only concern is that the driver of the particular device which
the process is waiting for may react crazily when it is misinformed
(a good driver should guard against this).


Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

torek@elf.ee.lbl.gov (Chris Torek) (02/21/91)

In article <4066@stl.stc.co.uk> "Neil Todd" <neil@pio.gid.co.uk> writes:
>A while back (~4-5 yrs) Chris Torek (I think) produced a nice little
>patch to the 4.3 kernel to kill groups of run away (and rapidly
>spawning) processes - this was the 'zonk' system call.

``Not I,'' said the pig.  (Since I just ate half a dozen chocolate
chip cookies, I think I qualify. :-) )

Seriously: I never produced this particular bletcherous hack.  (I am
responsible for a number of other, different bletcherous hacks, but
not this one.)  If (A) you have SIGSTOP and (B) signals work correctly,
the super-user can stop everything, pick out the bad processes, kill
them, and then resume everything.  (This is a bit tricky to get right,
admittedly.)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/22/91)

In article <10112@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
> If (A) you have SIGSTOP and (B) signals work correctly,
> the super-user can stop everything, pick out the bad processes, kill
> them, and then resume everything.  (This is a bit tricky to get right,
> admittedly.)

What's tricky about it?

  #include <sys/time.h>
  #include <sys/resource.h>
  #include <signal.h>
  #include <stdio.h>
  #include <errno.h>
  extern int errno;

  main(argc,argv,envp) /* invoke as, e.g., zonk /bin/csh csh -f; untested */
  int argc;
  char *argv[];
  char *envp[];
  {
   if (getuid()) { fprintf(stderr,"zonk: fatal: uid not 0\n"); exit(1); }
   if (geteuid()) { fprintf(stderr,"zonk: fatal: euid not 0\n"); exit(2); }
   if (setpriority(PRIO_PROCESS,0,-20))
     fprintf(stderr,"zonk: weird: can't set my priority to -20\n");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: first kill failed");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: second kill failed");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: good-luck kill failed");
   for (;;)
    {
     (void) execve(argv[1],argv + 2,envp);
     perror("zonk: critical: exec failed, will try again");
     sleep(60);
    }
  }

---Dan

jfh@rpp386.cactus.org (John F Haugh II) (02/22/91)

In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Forcibly awaking a process doing read/write (DMA transter) will either
>give the process a buffer with garbage data or throw away a buffer
>containing valid data. How can this trash someone else's memory?

DMA addresses typically refer to physical memory.  The process requesting
the DMA transfer normally is locked in memory before the transfer is
requested so that the physical address the controller was told to send
the data to will remain valid.  If the process dies and the physical
memory is reallocated (or page out or swap out or ... occurs), that
physical address will be allocated to some other process which isn't
expecting to have your DMA transfer sent its way.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.

pat@orac.pgh.pa.us (Pat Barron) (02/22/91)

In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes:
>>	Will the above method cause a race condition resulting from
>>the fact that the process probably assumes that the next time it runs 
>>it will have the resource it was sleeping on?  And if so, I would
>>appreciate some other suggestions as to how to solve this problem.
>
>Yes, the process will think it has the resource it was sleeping on.

Uhh, nope.  When a particular even occurs, *all* processes waiting on
that event are awakened.  By the time you run again, someone else may
have snarfed up the resource you were waiting for.

This has been the case forever (well, at least since V7) - when you come
out of a sleep(), you *must* check that the reason you were sleeping is
no longer true....

--Pat.

sarima@tdatirv.UUCP (Stanley Friesen) (02/23/91)

In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Forcibly awaking a process doing read/write (DMA transter) will either
>give the process a buffer with garbage data or throw away a buffer
>containing valid data. How can this trash someone else's memory?

Scenario - forcibly wake-up a process waiting on a slow device,
	process mucks about with unfilled buffer, and *releases* it.
	process probably dies due to signal the woke it up.
	another process acquires the buffer from the free pool
		(Perhaps intending to use it for paging)
	slow device finally finishes, putting results into buffer.
	new process returns the buffer or (worse) uses it as a new page.

	VOILA - corrupted stuff in the new process.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

jfh@greenber.austin.ibm.com (John F Haugh II) (02/23/91)

In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Yes, the process will think it has the resource it was sleeping on.
>But, it will be killed and release the resource during its exit
>before it has a chance to "think". This part looks OK to me.
>My only concern is that the driver of the particular device which
>the process is waiting for may react crazily when it is misinformed
>(a good driver should guard against this).

This just isn't true.

A typical sleep loop looks something like

	while (some_status & some_busy_flag)
		sleep (&some_status, PRI_O_MINE);

	some_status |= some_busy_flag;

If your only concern is getting this process to ignore the setting
of "some_busy_flag", you might be doing the right thing - but
remember - "some_status" still has the "some_busy_flag" set.  Killing
the process will not get that bit clear and if that bit being set
is what is hanging the process, the next process to enter that loop
is also going to hang.

What is needed is an exception routine that understands =exactly=
what to do to reset the resource to some well-defined state for any
possible state the resource may be in.
-- 
John F. Haugh II      |      I've Been Moved     |    MaBellNet: (512) 838-4340
SneakerNet: 809/1D064 |          AGAIN !         |      VNET: LCCB386 at AUSVMQ
BangNet: ..!cs.utexas.edu!ibmchs!auschs!snowball.austin.ibm.com!jfh (e-i-e-i-o)

cks@hawkwind.utcs.toronto.edu (Chris Siebenmann) (02/24/91)

pat@orac.pgh.pa.us (Pat Barron) writes:
| This has been the case forever (well, at least since V7) - when you come
| out of a sleep(), you *must* check that the reason you were sleeping is
| no longer true....

 Unless your code is careful to make sure that it is the only one
which can consume the particular resource which it is sleeping on;
consider, for example, a system call tracing facility, where you have
a sepperate process that drains the trace buffers into a disk file.
Also, one common strategy for dealing with the 'other people may have
taken this resource already' is
	while (!resource)
		sleep(resource, priority);
If you whack the process and get it to come out of the sleep, and the
resource isn't available, the process will just go back to sleep,
which isn't particularly helpful for the original poster's purpose.

--
		V9: the kernel where you can do
			fgrep <something> */*.[ch]
		and not get "Arguments too long".
cks@hawkwind.utcs.toronto.edu	           ...!{utgpu,utzoo,watmath}!utgpu!cks