[comp.os.minix] MINIX suspected bug

yoshiya@titisa.is.titech.ac.jp (UM8101 yoshiya eiji) (09/19/89)

  I think I found a rather serious bug in MINIX as published in the Prentice-
Hall book.  I would appreciate any reactions from the minix community.
The problem is as follows:

  Suppose that, while a process A waits for a tty READ, another process B 
issues a tty READ on the same terminal.  Now if you hit the return key 
before the two system calls are completed, the MINIX system hangs. 

  If you hit, after the hang up, the function key F1 to look at the process
status, you get

	fs : waiting for a message from tty.

  A more detailed description of the sequence of events is as follows:

  1. When the process A issues a READ for a tty, a READ message is sent to fs
     by sendrec.  Fs receives this message at line 9042.

  2. Fs processes this message, and sends a READ message to the tty driver by
     sendrec at line 12356.

  3. The tty driver receives this message at line 3509.  Assuming that there
     is no pending keyboard input, it builds a SUSPEND message, see line 3830,
     and sends it to fs at line 3806.

  4. Tty waits for its next message at line 3509, whereupon fs restarts.

  5. Having received a SUSPEND message at line 12356, fs goes to line 9042, 
     and waits there for another message without sending a reply to the 
     procees A.

  6. Now suppose that the process B issues a READ for tty.  It sends a READ
     message to fs by sendrec.  Fs receives this message at line 9042.

  7. Further suppose that the return key is hit while fs is still processing
     this READ message.  An interrupt occurs, and keyboard() is called. 

  8. keyborad() uses interrupt() to send a CHAR_INT meesage to the tty driver
     at line 4167.

  9. At the end of the interrupt processing, the tty driver restarts, taking
     precedence over fs. 

 10. The tty task receives the CHAR_INT message at line 3509, and sends a
     REVIVE message to fs at line 3571.  Here, fs is NOT waiting for a
     reply.  Therefore, the tty task supends.

 11. Now the fs is allowed to run.  It sends a READ message to tty task by
     sendrec at line 12356.  The kernel queues this message by mini_send() 
     at line 1957, but when mini_rec() is called subsequently, the message
     is overwritten at line 2055 by the REVIVE message, which was sent 
     previously from the tty task.  The tty task now becomes ready, and
     restarts.
  
 12. The tty task tries to receive another message from fs at line 3509,
     but, as noted above, the message has been overwritten by a REVIVE
     message.  Not expecting one, tty selects the default clause, line 
     3518, of the switch statement, and sends EINVAL message to fs there.
     At this moment, however, fs is not waiting for a message.  Therefore,
     tty suspends. 

 13. Fs restarts, receives this REVIVE message from the tty task, and 
     revives process A at line 12361.

 14. The processing of the READ request issued by A is now complete.  Fs
     then tries to obtain a reply to the message sent previously in step 7 
     from the tty task, and fs receives EINVAL message. But tty task becomes
     ready, so fs suspends at line 12362, and the tty task restarts.

 15. The tty task tries to receive a message by receive at line 3509, and
     suspends.  Fs restarts, receives an EINVAL message from tty at line 
     12362, and attempts to revive this invalid process.  The attempt of
     revival fails.  Whereupon fs makes another attemps to obtain a reply 
     from the tty task.  However, process B's READ message has been lost.
     Threfore, the tty task never replies.  The system hangs up.

  Other situations where the system may hang up are as follows:

  a. Fs receiveing an UNPAUSE message from mm.  The sequence of events 
     could be simillar to the above.

  b. Fs sending a SIGPIPE message to mm.  Demonstration--the following
     command causes the system to hang up:

     $ (cat /dev/ram & cat /dev/ram) | sleep 10

  To sum up, in the MINIX system, in any situation where a message is sent
without a corresponding a reply message, the system could hang up.  For 
example, keyboard interrupts and signals cause this situation.

  Note that the problem does not occur in the communication between the kernel
and the mm, because the function inform() ensures that the signal messages are
sent to mm only when mm is free.

  Two solutions to this problem occurs to the mind:

  1. Buffering messages; or
  2. Using a function similar to inform() between the fs and the kernel, and
     between the fs and the mm.

However, none of these seem to conform to the MINIX philosopy.  The solution 
1 needs more than the rendezvous principle.  The solution 2 precludes a file
system on a remote machine.

							Eiji Yoshiya.



          proc A       proc B        fs           tty          H/W
            |
            |                        main
            |            1          (recv)
            +------------------------>+
	  read                        |
	(sendrec)                     |
	                              |      2
	                              +----------->+
                                   dev_read        |
                                   (sendrec)       |
                                             3     |
                                            /------+ send
                                           / 4     | (SUSPEND)
	                              +<--/........+
                               5      |           main
                         +............+          (recv)
                         |           main
                         |          (recv)
                         |     6
                         +----------->+
                       read           |
                     (sendrec)        |
                                      |            7
                                      *.........................+ K/B INT
                                                                |
                                                          8     |
                                                         /------+ send
                                                        / 9     |  (CHAR_INT)
                                                   +<--/........+
                                             10    |          restart
                                      *........../-+ send
                                      |      11 /    (REVIVE)
                                      +---\..../...+
                                  dev_read \ 12    |
                                  (sendrec) \----->+ main
                                            /      | (recv)
                                           / 13    |
                                      +<--/....../-+ send
                                      |         /    (EINVAL)
                                      |      14
                                      +............+
                                     dev    /      |
                                    (recv) /       |
                                          /  15    |
                                      +<-/.........+
                                      |           main
                                      +          (recv)
                                     dev
                                    (recv)

                                   HANG UP

					( ----> message flow)
					( ..... process switch)

---------
yoshiya%is.titech.ac.jp@relay.cc.u-tokyo.ac.jp
Eiji Yoshiya.	Dept. of Info. Sci. Tokyo Inst. of Tech.