[comp.os.minix] Fatal error in MM and pseudofix

ast@cs.vu.nl (Andy Tanenbaum) (06/01/88)

Sort of by accident I just discovered a fatal error in MINIX.  Try this:

  sync
  cp /usr/bin/sleep x
  chmem =60000 x
  for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  do x 10 &
  done

The shell keeps forking off 'x' until it can fork no more, and then stops.
Within 10 seconds all the children exit, but because the last command to the
shell was &, the shell is reading tty, not doing a wait.  As a result, all
the children have become zombies and are still tying up resouces (memory
and proc table slots).  Since the kernel is told that processes are gone at
the moment they become zombies (line 5802 in the book), the F1 key does not
show them any more.

Now type

  sync

You will see that the shell can't fork and the system is totally hung.  To
dehang the shell, type

  exec sync

This causes the shell to exec instead of fork, the exec succeeds, the sync
succeeds and exits.  At this point the zombies are orphans, and are 
inherited by the shell's parent, init, which is doing a wait, and which
cleans them all up.  This gets the system back to normal.

There is one fix that helps part of the problem, namely, by having zombies
release memory when they exit, not at cleanup time.  This can be
accomplished by moving lines 5874 - 5878 just after line 5802.  You also
have to change 'child' to 'rmp' declare 's' at the top of mm_exit. Now you
won't have the situation where the shell fails to fork due to lack of
memory.  But it still fails due to lack of proc table slots.  This bug is
hard to fix, but in practice, it doesn't happen except in test programs.
You can always make NR_PROCS larger if you want.

Andy Tanenbaum (ast@cs.vu.nl)