[comp.unix.internals] Help with 4.3 mod to kill uninteruptable procs.

gatesl@mist.cs.orst.edu (Lee Gates) (02/19/91)

	As a class project, I am working on a modification to the 
BSD 4.3 source code to allow one to kill uninteruptable processes.
	
	It would seem that we are at a bit of a standpoint.  Initially,
I thought that I could have the kernel raise the priority of the suspect
process in the psignal() call, which after setting it to run, would
allow the process to release the resources it was sleeping with, and
exit gracefully, as I would post the kill signal before letting it run
again.  The others in my group have questioned this, and now I have
begun to wonder if it will work.

	Will the above method cause a race condition resulting from
the fact that the process probably assumes that the next time it runs 
it will have the resource it was sleeping on?  And if so, I would
appreciate some other suggestions as to how to solve this problem.

	thanx
-- lee
gatesl@prism.cs.orst.edu	 "having fun watching Oregonians rust"
------------randomly-chosen-drink/quote/simpsons'-quote---------------
"If you choose not to decide you still have made a choice."
		- Geddy Lee

neil@pio.gid.co.uk (Neil Todd) (02/20/91)

In the referenced article gatesl@mist.cs.orst.edu (Lee Gates) writes:
| 
| 	As a class project, I am working on a modification to the 
| BSD 4.3 source code to allow one to kill uninteruptable processes.

(rest deleted)

A while back (~4-5 yrs) Chris Torek (I think) produced a nice little
patch to the 4.3 kernel to kill groups of run away (and rapidly
spawning) processes - this was the 'zonk' system call. You could probably
gain an insight into your problem by looking at this. The catch is that
I don't have access to the machine that I installed the patch on.

Zonk was very useful, especially on Student/teaching machines - one could
guarantee that some bright spark would experiement with self spawining
processes, zonk would kill all jobs owned by a particular UID stone dead.

Neil

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <19065@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes:
>Just for kicks, imagine some real slow device has been set up to
>do a DMA transfer to some physical address that is held by the
>process which is unkillable.  Imagine that you kill that process
>and it exits.  Imagine the I/O completes and someone elses
>memory gets trashed.  All that and more ...

Please excuse my ignorance on the block devices (most of the time
I work on character/streams device). 

Forcibly awaking a process doing read/write (DMA transter) will either
give the process a buffer with garbage data or throw away a buffer
containing valid data. How can this trash someone else's memory?

Y. Rock Lee, att!cblph!rock
	     rock@cblph.ATT.COM

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <1991Feb20.232118.11035@odin.diku.dk> thorinn@diku.dk (Lars Henrik Mathiesen) writes:
>The first class you probably shouldn't mess with. If you're lucky,
>removing the sleeping process will only result in the loss of some
>buffer. In worse cases, you get permanently un-openable devices or
>crashes. The real cure for these is to rewrite _each_case_ to sleep at
>interruptible priority and clean up properly (more than a class
>project, I think).

[this is a guess, not an argument]

The "permanently un-openable devices" can only happen in the case of open.
Because open wasn't "complete" so the close call in the exit cannot do a
correct clean up. Please correct me if I miss something.


Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

rock@cbnews.att.com (Y. Rock Lee) (02/21/91)

In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes:
>	Will the above method cause a race condition resulting from
>the fact that the process probably assumes that the next time it runs 
>it will have the resource it was sleeping on?  And if so, I would
>appreciate some other suggestions as to how to solve this problem.

Yes, the process will think it has the resource it was sleeping on.
But, it will be killed and release the resource during its exit
before it has a chance to "think". This part looks OK to me.
My only concern is that the driver of the particular device which
the process is waiting for may react crazily when it is misinformed
(a good driver should guard against this).

Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

torek@elf.ee.lbl.gov (Chris Torek) (02/21/91)

In article <4066@stl.stc.co.uk> "Neil Todd" <neil@pio.gid.co.uk> writes:
>A while back (~4-5 yrs) Chris Torek (I think) produced a nice little
>patch to the 4.3 kernel to kill groups of run away (and rapidly
>spawning) processes - this was the 'zonk' system call.

``Not I,'' said the pig.  (Since I just ate half a dozen chocolate
chip cookies, I think I qualify. :-) )

Seriously: I never produced this particular bletcherous hack.  (I am
responsible for a number of other, different bletcherous hacks, but
not this one.)  If (A) you have SIGSTOP and (B) signals work correctly,
the super-user can stop everything, pick out the bad processes, kill
them, and then resume everything.  (This is a bit tricky to get right,
admittedly.)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/22/91)

In article <10112@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
> If (A) you have SIGSTOP and (B) signals work correctly,
> the super-user can stop everything, pick out the bad processes, kill
> them, and then resume everything.  (This is a bit tricky to get right,
> admittedly.)

What's tricky about it?

  #include <sys/time.h>
  #include <sys/resource.h>
  #include <signal.h>
  #include <stdio.h>
  #include <errno.h>
  extern int errno;

  main(argc,argv,envp) /* invoke as, e.g., zonk /bin/csh csh -f; untested */
  int argc;
  char *argv[];
  char *envp[];
  {
   if (getuid()) { fprintf(stderr,"zonk: fatal: uid not 0\n"); exit(1); }
   if (geteuid()) { fprintf(stderr,"zonk: fatal: euid not 0\n"); exit(2); }
   if (setpriority(PRIO_PROCESS,0,-20))
     fprintf(stderr,"zonk: weird: can't set my priority to -20\n");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: first kill failed");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: second kill failed");
   if (kill(-1,SIGSTOP) == -1) perror("zonk: warning: good-luck kill failed");
   for (;;)
    {
     (void) execve(argv[1],argv + 2,envp);
     perror("zonk: critical: exec failed, will try again");
     sleep(60);
    }
  }

---Dan

jfh@rpp386.cactus.org (John F Haugh II) (02/22/91)

In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Forcibly awaking a process doing read/write (DMA transter) will either
>give the process a buffer with garbage data or throw away a buffer
>containing valid data. How can this trash someone else's memory?

DMA addresses typically refer to physical memory.  The process requesting
the DMA transfer normally is locked in memory before the transfer is
requested so that the physical address the controller was told to send
the data to will remain valid.  If the process dies and the physical
memory is reallocated (or page out or swap out or ... occurs), that
physical address will be allocated to some other process which isn't
expecting to have your DMA transfer sent its way.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.

pat@orac.pgh.pa.us (Pat Barron) (02/22/91)

In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>In article <1991Feb19.001941.29928@lynx.CS.ORST.EDU> gatesl@mist.cs.orst.edu (Lee Gates) writes:
>>	Will the above method cause a race condition resulting from
>>the fact that the process probably assumes that the next time it runs 
>>it will have the resource it was sleeping on?  And if so, I would
>>appreciate some other suggestions as to how to solve this problem.
>
>Yes, the process will think it has the resource it was sleeping on.

Uhh, nope.  When a particular even occurs, *all* processes waiting on
that event are awakened.  By the time you run again, someone else may
have snarfed up the resource you were waiting for.

This has been the case forever (well, at least since V7) - when you come
out of a sleep(), you *must* check that the reason you were sleeping is
no longer true....

--Pat.

sarima@tdatirv.UUCP (Stanley Friesen) (02/23/91)

In article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Forcibly awaking a process doing read/write (DMA transter) will either
>give the process a buffer with garbage data or throw away a buffer
>containing valid data. How can this trash someone else's memory?

Scenario - forcibly wake-up a process waiting on a slow device,
	process mucks about with unfilled buffer, and *releases* it.
	process probably dies due to signal the woke it up.
	another process acquires the buffer from the free pool
		(Perhaps intending to use it for paging)
	slow device finally finishes, putting results into buffer.
	new process returns the buffer or (worse) uses it as a new page.

	VOILA - corrupted stuff in the new process.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

jfh@greenber.austin.ibm.com (John F Haugh II) (02/23/91)

In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>Yes, the process will think it has the resource it was sleeping on.
>But, it will be killed and release the resource during its exit
>before it has a chance to "think". This part looks OK to me.
>My only concern is that the driver of the particular device which
>the process is waiting for may react crazily when it is misinformed
>(a good driver should guard against this).

This just isn't true.

A typical sleep loop looks something like

	while (some_status & some_busy_flag)
		sleep (&some_status, PRI_O_MINE);

	some_status |= some_busy_flag;

If your only concern is getting this process to ignore the setting
of "some_busy_flag", you might be doing the right thing - but
remember - "some_status" still has the "some_busy_flag" set.  Killing
the process will not get that bit clear and if that bit being set
is what is hanging the process, the next process to enter that loop
is also going to hang.

What is needed is an exception routine that understands =exactly=
what to do to reset the resource to some well-defined state for any
possible state the resource may be in.
-- 
John F. Haugh II      |      I've Been Moved     |    MaBellNet: (512) 838-4340
SneakerNet: 809/1D064 |          AGAIN !         |      VNET: LCCB386 at AUSVMQ
BangNet: ..!cs.utexas.edu!ibmchs!auschs!snowball.austin.ibm.com!jfh (e-i-e-i-o)

rock@cbnews.att.com (Y. Rock Lee) (02/23/91)

In article <5558@awdprime.UUCP> jfh@greenber.austin.ibm.com (John F Haugh II) writes:
>In article <1991Feb21.152845.29019@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>>Yes, the process will think it has the resource it was sleeping on.
>>But, it will be killed and release the resource during its exit
>>before it has a chance to "think". This part looks OK to me.
>
>A typical sleep loop looks something like
>
>	while (some_status & some_busy_flag)
>		sleep (&some_status, PRI_O_MINE);
>
>	some_status |= some_busy_flag;

This was what I had in mind (which was wrong after I double checked it):

	A signal puts this sleeping process back into the run queue (its 
	priority has been set to higher than PZERO). sleep doesn't return;
	it does a longjmp back to syscall. Before the system call returns,
	it checks if there is a signal. There is. So, it handles the signal
	and exits (no signal handling routine set).

The catch is that the process went to sleep before we change its priority.
In this case sleep goes different route and does a simple return. 
Therefore, we will continue execute the driver code, which may be dangerous!

>What is needed is an exception routine that understands =exactly=
>what to do to reset the resource to some well-defined state for any
>possible state the resource may be in.

That's the reason why system priority is chosen to begin with.
So, don't mess with it IF you can convince your professor not to do 
this project, :-)  But, I guess, it is OK to do "experiment" in school.

On the other hand, this utility can be very useful. If a process is hanging 
but cannot be killed (sleeping in uninterruptable priority), you have two
ways to get rid of it: use this utility or reboot the system. That is,
this utility can be useful, but is DANGEROUS in general!

Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

gatesl@prism.cs.orst.edu (Lee Gates) (02/23/91)

In article <1991Feb23.025958.9914@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
>
>This was what I had in mind (which was wrong after I double checked it):
>
>	A signal puts this sleeping process back into the run queue (its 
>	priority has been set to higher than PZERO). sleep doesn't return;
>	it does a longjmp back to syscall. Before the system call returns,
>	it checks if there is a signal. There is. So, it handles the signal
>	and exits (no signal handling routine set).
>
>The catch is that the process went to sleep before we change its priority.
>In this case sleep goes different route and does a simple return. 
>Therefore, we will continue execute the driver code, which may be dangerous!
>
>>What is needed is an exception routine that understands =exactly=
>>what to do to reset the resource to some well-defined state for any
>>possible state the resource may be in.
>
>That's the reason why system priority is chosen to begin with.
>So, don't mess with it IF you can convince your professor not to do 
>this project, :-)  But, I guess, it is OK to do "experiment" in school.
>
>On the other hand, this utility can be very useful. If a process is hanging 
>but cannot be killed (sleeping in uninterruptable priority), you have two
>ways to get rid of it: use this utility or reboot the system. That is,
>this utility can be useful, but is DANGEROUS in general!
>
	Since I posted, I have discussed it with him, and we have narrowed
the field considerably.  The only check the modification will do is to see
if a serial device driver is locked.  If so, it will release the serial
device, and kill the process.  Otherwise, it will have a double check
to see if it really should kill the proc, then actually kill it.

	I feel I understand everything much better now, all I have to do
is figure out where in the code to look...  Thanx to all for your help!

	lee

cks@hawkwind.utcs.toronto.edu (Chris Siebenmann) (02/24/91)

pat@orac.pgh.pa.us (Pat Barron) writes:
| This has been the case forever (well, at least since V7) - when you come
| out of a sleep(), you *must* check that the reason you were sleeping is
| no longer true....

 Unless your code is careful to make sure that it is the only one
which can consume the particular resource which it is sleeping on;
consider, for example, a system call tracing facility, where you have
a sepperate process that drains the trace buffers into a disk file.
Also, one common strategy for dealing with the 'other people may have
taken this resource already' is
	while (!resource)
		sleep(resource, priority);
If you whack the process and get it to come out of the sleep, and the
resource isn't available, the process will just go back to sleep,
which isn't particularly helpful for the original poster's purpose.

--
		V9: the kernel where you can do
			fgrep <something> */*.[ch]
		and not get "Arguments too long".
cks@hawkwind.utcs.toronto.edu	           ...!{utgpu,utzoo,watmath}!utgpu!cks

tif@doorstop.austin.ibm.com (Paul Chamberlain) (02/25/91)

In article <5558@awdprime.UUCP> jfh@greenber.austin.ibm.com (John F Haugh II) writes:
>A typical sleep loop looks something like
>
>	while (some_status & some_busy_flag)
>		sleep (&some_status, PRI_O_MINE);
>
>	some_status |= some_busy_flag;

John, perhaps it obvious, but I've seen several places that neglect to do
this right, and I don't want it to become anymore widespread.  Unless you
like drivers that hang sometimes, this is the way it should be:

	DISABLE_INTERRUPTS;
	while (some_status & some_busy_flag)
		sleep (&some_status, PRI_O_MINE);
	ENABLE_INTERRUPTS;

	some_status |= some_busy_flag;

Paul Chamberlain | I do NOT speak for IBM.          IBM VNET: PAULCC AT AUSTIN
512/838-9662     | ...!cs.utexas.edu!ibmchs!auschs!doorstop.austin.ibm.com!tif

dbc@cimage.com (David Caswell) (02/26/91)

.n article <1991Feb21.145705.27763@cbnews.att.com> rock@cbnews.att.com (Y. Rock Lee) writes:
.In article <19065@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes:
.>Just for kicks, imagine some real slow device has been set up to
.>do a DMA transfer to some physical address that is held by the
.>process which is unkillable.  Imagine that you kill that process
.>and it exits.  Imagine the I/O completes and someone elses
.>memory gets trashed.  All that and more ...
.
.Please excuse my ignorance on the block devices (most of the time
.I work on character/streams device). 

It's the character devices that are doing the DMA transfer.  Normal I/O
even if it is character-at-a-time is block I/O.

rock@cbnews.att.com (Y. Rock Lee) (02/27/91)

In article <1991Feb25.184853.10487@cimage.com> dbc@dgsi.UUCP (David Caswell) writes:
>It's the character devices that are doing the DMA transfer.  Normal I/O
>even if it is character-at-a-time is block I/O.

I "browsed" through a disk driver over the weekend. The followings are what 
I've learned. Please comment if you see anything wrong.

Normal block I/O uses the system buffer pool. Inside the disk strategy 
routine, the physical address of the granted system buffer is given to 
the disk controller for DMA data trnasfer. Since the kernel is not pageable,
there is no need to lock this buffer in memory (does BSD have a pageable
kernel?).

The character disk driver, on the other hand, uses the address pointed to
by the u_base directly. Since this user page may be paged out, the disk
read/write routine goes through a physical I/O function to lock this user
page in memory and uses its physical address for the DMA tranfer followed. 
"Raw disk I/O" is the common term used for this driver interface.

Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (02/27/91)

In article <5583@awdprime.UUCP>, tif@doorstop.austin.ibm.com (Paul Chamberlain) writes:
> In article <5558@awdprime.UUCP> jfh@greenber.austin.ibm.com (John F Haugh II) writes:
>> A typical sleep loop looks something like

>>	while (some_status & some_busy_flag)
>>		sleep (&some_status, PRI_O_MINE);
>>	some_status |= some_busy_flag;

> John, perhaps it obvious, but I've seen several places that neglect
> to do this right, and I don't want it to become anymore widespread.
> Unless you like drivers that hang sometimes, this is the way it
> should be:

> 	DISABLE_INTERRUPTS;
> 	while (some_status & some_busy_flag)
> 		sleep (&some_status, PRI_O_MINE);
> 	ENABLE_INTERRUPTS;
> 	some_status |= some_busy_flag;

I would tend to move the enable after the bit set.  I think at present
it doesn't make any difference, but only because kernel code can't be
preempted, except in a limited way by interrupts, and the interrupt
handler never sets or clears the busy bit.  If either of these changes,
you'll be glad you have the enable after the setting of the busy bit!

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

jfh@rpp386.cactus.org (John F Haugh II) (02/28/91)

In article <1991Feb27.105436.1554@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
>> 	DISABLE_INTERRUPTS;
>> 	while (some_status & some_busy_flag)
>> 		sleep (&some_status, PRI_O_MINE);
>> 	ENABLE_INTERRUPTS;
>> 	some_status |= some_busy_flag;
>
>I would tend to move the enable after the bit set.  I think at present
>it doesn't make any difference, but only because kernel code can't be
>preempted, except in a limited way by interrupts, and the interrupt
>handler never sets or clears the busy bit.  If either of these changes,
>you'll be glad you have the enable after the setting of the busy bit!

If any of the other bits are modified at interrupt level you have to
make =any= modifications to =any= bits in the status word with interrupts
disabled for the highest level the word is modified at.

Consider the execution of a process off interrupt level which is then
interrupted by the device being serviced.  Thread "A" is the non-interrupt
level execution, and thread "B" is the instruction stream executed at
interrupt time.  It isn't a big window, but with Murphy at the controls ...

Thread A				Thread B

wakes up, checks bits and
sees that resource is free

loads the word at 'some_status'
so it can set the busy bit and
write it back

					POING! interrupt occurs and
					device driver loads word at
					'some_status' so it can set
					some bit.  the bit gets set
					and the word gets written
					back

execution resumes after the
interrupt with the original
value of 'some_status' still
in the register it was loaded
in to - without the bit set
from the interrupt service
routine

the 'some_busy_bit' is set and
the word written back to
'some_status'.  the action taken
during the interrupt service has
been overwritten.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.