[comp.os.minix] Bug fix to signal handling within fs

arthur@warwick.UUCP (John Vaudin) (06/02/87)

We have discovered a couple of bugs in the Minix file system, in the
'do_unpause' function in the file 'pipe.c'.  Do_unpause has to get the
minor device number of the tty which the suspended process (the one
which is to be unpaused) is suspended on.  In the distribution, the code
looks like this:    (line numbers from the book -- page 645)

10591   if (task != XPIPE) {
10592         f = get_filp(rfp->fp_fd);
10593         dev = f->filp_ino->izone[0];    /* device on which.........

There are two bugs (at least!) in line 10592; Firstly, the saved value of
the suspended processes file descriptor must be right-shifted by eight to
get rid of 'fs_call' which was saved with it (line 10499).  For an example
of the correct usage see line 10525.  Secondly, the 'get_filp' function
returns the (filp *) corresponding to the given file descriptor in the
CURRENT process, not the suspended process it is trying to unpause!  A
simple solution is to copy the 'get_filp' code (there is not very much of
it) into 'do_unpause', and substitute 'rfp' (the suspended process pointer)
for 'fp' (the current process pointer).

Thus, line 10592 should be replaced by

              fild = rfp->fp_fd>>8;            /* declare int fild at top */
              if (fild < 0 || fild >= NR_FDS)  /* and do some error handling */
                                               /* here... */
              f = rfp->fp_filp[fild];

Also in do_exit on line 12093 there is a bug in the code in the distribution
but which is not in the book. The code on line 12110 that goes

	if(fp->fp_suspended == SUSPENDED){
		if(fp->fp_tack==XPIPE) susp_count--;
		pro=exitee;
		do_unpause();
		^^^^^^^^^^^^^
		fp->fp_suspended=NOT_SUSPENDED;
	}

contains a call to unpause() to clear up any pending input if the process
was SUSPENDED on a tty read or pipe read. Unfortunately the code on line
12106 has already closed all the file descriptors so un_pause() will not
be able to find the device to clear. The fix is to move the above section
of code to line 12104 BEFORE the file descriptors are closed.

I'm afraid I do not know what symptoms this bug causes. On our ns32000 
version of MINIX it causes the filesystem to memory fault, but I guess
8088's don't do that sort of thing not having MMU's (smug :-)
 
John Vaudin             arthur@warwick.UUCP
Tim Bissell             donald@warwick.UUCP

rmtodd@uokmax.UUCP (06/04/87)

In article <539@ubu.warwick.UUCP>, arthur@warwick.UUCP (John Vaudin) writes:
> We have discovered a couple of bugs in the Minix file system, in the
> 'do_unpause' function in the file 'pipe.c'.  Do_unpause has to get the
Just spent the past two days tracking down the same bug in fs--your patch
is essentially identical to mine.
> minor device number of the tty which the suspended process (the one
> which is to be unpaused) is suspended on.  In the distribution, the code
> looks like this:    (line numbers from the book -- page 645)
> 
> 10591   if (task != XPIPE) {
> 10592         f = get_filp(rfp->fp_fd);
> 10593         dev = f->filp_ino->izone[0];    /* device on which.........
> 
> There are two bugs (at least!) in line 10592; Firstly, the saved value of
> the suspended processes file descriptor must be right-shifted by eight to
> get rid of 'fs_call' which was saved with it (line 10499).  For an example
> of the correct usage see line 10525.  Secondly, the 'get_filp' function
> returns the (filp *) corresponding to the given file descriptor in the
> CURRENT process, not the suspended process it is trying to unpause!  A
For those interested, the current process is always MM (MM unpausing
the process because of signal).  Since MM almost never has files open,
you usually get a NIL_FILP. 
> simple solution is to copy the 'get_filp' code (there is not very much of
> it) into 'do_unpause', and substitute 'rfp' (the suspended process pointer)
> for 'fp' (the current process pointer).
What I did was just set fp = rfp; and left the get_filp call
as-is.  Either solution will work.
I also anded the fild with 255 after the right shift; strictly speaking this
shouldn't ever be needed, but alas, Aztec C v3.40 on occasion produces code
for right shift by 8 that gets the sign-extend BACKWARDS! (It only happens
when the compiler puts the operand in bx; this compiler bug bit me on another
location in pipe.c; I've seen it nowhere else before.)
I also added a line after the
	f=get_filp(fild); 
line to check if the pointer returned is NIL_FILP and panic if it is.  A
little paranoia that helped me find the other bug you talk about in
do_exit.
Anybody know of any more places where the Book deviates seriously from the
disks?
> I'm afraid I do not know what symptoms this bug causes. On our ns32000 
> version of MINIX it causes the filesystem to memory fault, but I guess
> 8088's don't do that sort of thing not having MMU's (smug :-)
On fs as compiled under MINIX, thru sheer good luck it doesn't cause too
much problem--occasionaly fs crashes or acts weird when you send a signal,
but not often.  The luck occurs in the null pointer NIL_FILP that gets
assigned to f and is dereferenced thus:
      dev = f->filp_ino->izone[0];
If I remember correctly, the structure offsets of filp_ino and izone[0]
are 4 and 14, respectively.  The first dereference (f->filp_ino) thus reads
location 0+4, which, since the MINIX cc isn't split I&D, is in the group
of zero words at the beginning of the CODE segment.  Thus f->filp_ino==0.
The second dereference also yields a zero, or at least a value whose lower
byte is zero.  Since we are almost always paused on tty0 (minor device 0),
everything works usually as expected.  (I'd put debugging code in the
system and the dev value returned from the buggy code was almost always
0).  I have had the fs crash when I hit DEL, though it doesn't happen often.
  When you try, as I did, to compile fs on a foreign compiler (Aztec's in
my case), and get a split I&D version, things are different.  Now the
lookup of f->filp_ino retrieves location 0+4 from the DATA segment, which
holds the size of some part of INIT.  This makes a very poor address.  The
MINIX-compiled version crashes on rare occasions; it only took two DELs
to bring the Aztec-compiled version to its knees. 
--------------------------------------------------------------------------
Richard Todd
USSnail:820 Annie Court,Norman OK 73069
UUCP: {allegra!cbosgd|ihnp4}!okstate!uokmax!rmtodd