[comp.os.minix] Bug report

va@jolnet.ORPK.IL.US (Vincent Archer) (12/13/89)

Hello *.

I've been studying Minix on my ST computer for a few months now, and have
found some bugs i'd like to report (and correct on the fly :-))

1) Defining a tty device not supported by the current kernel and using it
   results in a system hang. (I believe I've seen somebody complaining about
   this one some time ago, on the PC version...)

The problem is in kernel/tty.c, tty_task()

the line:
         tp = &tty_struc[tty_mess.TTY_LINE];

should be replaced by:
         if (tty_mess.TTY_LINE >= NR_TTYS) tty_mess.m_type = 999;
         else tp = &tty_struc[tty_mess.TTY_LINE];

more robust, isn't it? Now defining, say /dev/tty19, and using it reports a
EINVAL error.

2) In a similar way, defining a device with a wrong major number, and trying
   to use it result in a nasty and unpleasant panic message (and system hang
   too :-))

The solution is in fs/device.c, dev_open()

The lines:

         major = (dev >> MAJOR) & BYTE;
         if (major == 0 || major >= max_major) return(EINVAL);

shoud be inserted before:

         find_dev(dev);

no sense trying a device you can't open. A more robust way would be to remove
the panic in find_dev() and return an EINVAL. If that's done, the various
calls to find_dev() should check this, and then report the error (if any) to
the caller.

3) Once the above correction is done, trying cat >/dev/baddev fails, but a
   cat >>/dev/baddev still panic!

this leads to an investigation in fs/open.c, where you find that, in the
do_open() system call,

the line:
         dev_open((dev_nr) rip->i_zone[0], (int)bits);

would be better as:
         if ((r = dev_open((dev_nr) rip->i_zone[0], (int)bits)) != OK) {
              put_inode(rip);
              return(r);
         }

anyway, this is (nearly) how it's done in do_creat(). No sense having such a
different behaviour between two similar system calls.

4) In <424@eds1.UUCP>, jdm1@eds1.UUCP (Jon McCown) reported problems trying to
   cat the /dev/null into a file. Well, on 1.1-ST, it does not fill the disk
   it merely displays "invalid errno". The errno is 104, E_EOF, reported by
   the memory driver when trying to read /dev/null.
   This one is not bug, it's merely a disagreement over a fine point: How
   should /dev/null behave? Report EOF in the usual way, saying that 0 bytes
   were read, or report an error, saying that NO bytes may be read... If, like
   me, you prefer the first method, here's a quick fix:

In kernel/memory.c, the do_mem() function says:

    if (device == NULL_DEV)
          return(m_ptr->m_type == DISK_READ ? EOF : m_ptr->COUNT);

you simply replace the EOF constant by 0, and you've won. Still, I don't know
the effects of this modification on other commands and/or system calls. A
better way of doing this would be to alter fs's read_write() so that an EOF
reported becomes a 0 bytes read, but I've not done this (yet).


I'm currently hunting another bug in piping, which causes an xd (link to od,
behaves as od -hx) piped into more to stop, both process waiting for FS, in
the middle of the output. I'm investigating a potential trouble in do_close(),
where shared file descriptors (i.e. filp->filp_count > 1) seems to be handled
in a straight-forward manner: invalidate cache entries, call dev_close(),
free any pipe waiting process (yes?), and THEN check wether or not some other
user file descriptor still uses this slot. Strange, strange...

+-----------------------------------------------------------------------------+
| Vincent Archer                        | Email: va@jolnet.orpk.il.us         |
|                                       | Primenet: archer@SEGIN4.Segin.FR    |
|                                       | X400: NAME=SERVEUR,ORG=SEGIN        |
| "Hunt the bugs, lest they haunt you!" |       ADMD=ATLAS,COUNTRY=FR         |
+-----------------------------------------------------------------------------+

archer%segin4.segin.fr@prime.com (Vincent Archer) (03/23/90)

    I've been out of touch for a while, so I don't know wether or not the
1.5 version does this or not. Try the following program (logged in as root):

# cat test.c
main()
{
if (chroot("/usr/archer/test")) printf("No chroot!\n");
execle("/bin/ls", "ls", "/", (char *)0);
}
# ls
test
test.c
test.o
# test
test
test.c
test.o
#

Uh? What's the matter? Where on earth did execle find a /bin/ls, if / is
the directory show above? (it is, since ls / displays it!).
The test above works (erroneously) on a 1.1 ST version. Fix is in fs/stadir.c

    in do_chroot(), before the put_inode(fp->workdir);, you add

    put_inode(fp->rootdir);
    fp->rootdir = ch_dir ? get_inode(ROOT_DEV, ROOT_INODE) : rfp->rootdir;
    dup_inode(fp->rootdir);


    Another (harmless) bug is the kill function, that does a magnificent
sig=11 when used on a process that you can't send signals to. In kill.c, you
add the declaration:

    extern char *itoa();

and everything works fine! Otherwise, any error code trash the process. It
did work on PCs, because pointer were 16bits integers, but fails on Atari
because ints and pointer are not of the same size! Damn these 8086 and
their small segments! :-)