[net.unix-wizards] deadlock caused by error in sys1.c, exit

johan (04/22/83)

     Lately we found two deadlock  situations  in  the  UNIX
kernel  while  testing our own 68000 system.  Both deadlocks
are caused by a glitch in the exit() routine in the  kernel.
Let me first describe the deadlocks.

Scenario 1

 - process A, actually LPD, with shared text,  is  'exec'ed.
   xalloc() is called and a new text segment is created with
   XWRIT on, because it is not yet swapped out.

 - process A forks and creates process B  running  the  same
   code.   x_count and x_ccount are incremented to 2.  There
   is not enough core for process B, so it  is  swapped  out
   decrementing x_ccount to 1.

 - At that very moment another process C wants to  grow  but
   this  cannot  be  done in core, so swaps itself out using
   swbuf1.

 - process A decides to exit, calls xfree() (p_textp in  the
   proc[]  entry  for process A is cleared and x_count drops
   to 1).  xfree() calls xccdec() (x_ccount drops to 0), but
   because  XWRIT is on it starts swapping out the text seg-
   ment using swbuf2.

 - The scheduler needs memory and decides to swap  out  pro-
   cess  A,  but  needs a swap buffer, so sleeps waiting for
   swbuf2 after setting B_WANTED.

Well, here we are: the swap transfer for the text of process
A  is done and process A is made runnable, but cannot be run
because it is being swapped out.   The  scheduler,  however,
cannot  be  run either, because it is sleeping on swbuf2 and
will never be woken up  because  process  A  will  never  be
swapped in.

     Another scenario goes as follows:

Scenario 2

 - That same process  A  and  its  child  process  B  exist:
   x_count==2, x_ccount==1 since process B is swapped out.

 - Again, process A exits, clearing x_ccount,  locking  that
   text  segment  with  XLOCK  and  swapping out the 'dirty'
   text.

 - The scheduler, again, decides to swap out process A,  but
   succeeds  this  time. There is no need for xlock(), since
   the reference to that text segment is cleared in  xfree()
   called by exit().

 - Process B is swapped in by the  scheduler,  but  now  the
   scheduler wants to swap in the text segment of process B.
   So xlock() is called,  finding  the  text  locked,  which
   causes  the  scheduler  to  sleep waiting for XLOCK to be
   cleared.

Here we have our second deadlock: the scheduler waiting  for
process A to clear XLOCK, process A waiting to be swapped in
by the scheduler.

     In my opinion the problem is  caused  by  the  code  in
exit().  The process should be SLOCK'ed during xfree().  The
code should be:

        p->p_flag |= SLOCK;
        xfree();
        p->p_flag &= ~SLOCK;

This problem exists in  UNIX-V7,  BSD 2.X  and  SYSTEM- III,
at least for the PDP-11.


				Johan W. Stevenson
				Philips, S&I, PMDS
				decvax!mcvax!philmds!johan