[comp.os.minix] Bug in lib/system.c - New deadlocks in PC Minix

rtregn@faui44.informatik.uni-erlangen.de (Robert Regn) (11/30/88)

I have detected a bug in lib/system.c (V1.3b == latest)
A failing fork exits the entire process - very bad for editors
with modified buffers !

*** 1.3/b.fertig/lib/system.c	Tue Oct  4 15:06:02 1988
--- /tmp/system.c	Wed Nov 30 12:08:13 1988
***************
*** 12,18 ****
      }
  
      /* Check to see if fork failed. */
!     if (procid < 0) exit(1);
  
      while ( (waitstat = wait(&retstat)) != procid && waitstat != -1 ) ;
      if (waitstat == -1) retstat = -1;
--- 12,18 ----
      }
  
      /* Check to see if fork failed. */
!     if (procid < 0) return (-1);
  
      while ( (waitstat = wait(&retstat)) != procid && waitstat != -1 ) ;
      if (waitstat == -1) retstat = -1;


==========================================================
I'am working with Minix very often connected with tty1
(remote login from a Sun ).
If two or more processes write simultaneously on tty1, one of them hangs
on FS. Trying to kill it hangs the system.

Try (logged in over tty1) :
		ps -lax &
		ls
or :		make &
		ls -l	(repeated until make says anything)

Has Anyone else this problems ? Are fixes ?

		Robert Regn							rtregn@faui32.uucp

tholm@uvicctr.UUCP (Terrence W. Holm) (12/06/88)

In article <753@faui10.informatik.uni-erlangen.de> rtregn@faui44.informatik.uni-erlangen.de (Robert Regn) writes:
>
>I have detected a bug in lib/system.c (V1.3b == latest)
>A failing fork exits the entire process - very bad for editors
>with modified buffers !


A patch was already posted as EFTH MINIX report #52 in October.
That patch also disabled sigint/sigquit in the parent.
Note that MINIX-ST users must use execl( ..., (char *) 0 ).



>I'am working with Minix very often connected with tty1
>(remote login from a Sun ).
>If two or more processes write simultaneously on tty1, one of them hangs
>on FS. Trying to kill it hangs the system.
>


Yes, the problem has been around for awhile. It is a real pain in
the axx. I have been trying to get some non-MINIX work done, so
haven't looked at it for a few weeks.


			Terrence W. Holm
			  uunet!uw-beaver!uvicctr!tholm
			  tholm%uvunix.bitnet
			  tholm%sirius.UVic.ca@relay.ubc.ca

brucee@runx.ips.oz (Bruce Evans) (12/17/88)

>In <568@uvicctr.UUCP> tholm@uvicctr.UUCP (Terrence W. Holm) writes:
>In article <753@faui10.informatik.uni-erlangen.de> rtregn@faui44.informatik.uni-erlangen.de (Robert Regn) writes:
>>
>>I have detected a bug in lib/system.c (V1.3b == latest)
>>A failing fork exits the entire process - very bad for editors
>>with modified buffers !
>
>
>A patch was already posted as EFTH MINIX report #52 in October.

The same bug (call to exit()) is in popen.c. Its effect can be seen by
running cdiff when memory is low. popen() is supposed to return 0 on failure,
but exits if execl() fails. cdiff checks the error return properly but never
gets it.

popen() is cross-referenced in the EFTH report but not patched. There was a
discussion by the Atari users about popen() failing to link because it called
the wrong version of exit()/_exit(). I don't see how one of these could be
missing, but the proper fix is to call neither.

There may be something important destroyed by failed execl()'s to excuse the
forced exit(). I have seen cc crash badly trying to recover in the same
situation. This is probably just from the allocation of MAX_ISTACK_BYTES
on the stack in execve() without a stack check.

Someone should try to get rid of this exec limit. The difficulty is that
dynamic allocation is needed and malloc() is impractical. Perhaps alloca()
could be used in lib/exec.c and the stack could be rebuilt in mm/exec.c
without copying through mm. Then only individual commands would need big
stacks for big arguments.

>>If two or more processes write simultaneously on tty1, one of them hangs
>>on FS. Trying to kill it hangs the system.
>
>Yes, the problem has been around for awhile. It is a real pain in

I have been doing a lot of work on TTY recently. There were plenty of old
bugs in it (mainly deadlocks and races). However, I thought this deadlock
was fixed in 1.3b by refusing the write request in tty.c. It doesn't seem
to be present in my system (but "cat file &" often produces null output
when the shell just wins the race to write). The restriction to one writer
is more fundamental. Hannam's driver queues i/o internally but I think FS
should do it (much like suspension). The bug is masked for the console
in 1.3c because output suspension is not done. FS is left hanging so no
second write can get through it to TTY.

Bruce Evans
Internet: brucee@runx.ips.oz.au    UUCP: uunet!runx.ips.oz.au!brucee