[net.sources.bugs] More RFS bugs

toddb@tekcrl.UUCP (Todd Brunhoff) (01/24/86)
Wayne Power writes:

>1.  RFS doesn't seem to care about x bits or file type.
>
>	        % /remote_host/etc/passwd
>
>    ...tries to run the password file as a shell script, even though its mode
>    644.  Even more off the wall...
>
>	        % /remote_host/etc
>
>    ...does the same thing.
>
Yep.  A bug.  Basically, the shell is starting up the interpreter, but
that is probably because it thinks it is the right mode.  The bug is
probably in the server in s_access() in serversyscall.c.

>2.  Intentional circumfornication like...
>
>	        % cd /host1/host2
>
>    ...hangs the server on 'host1'.
>
RFS is designed not to allow hopping hosts like this example.
It should just fail, but instead causes the server on 'host1' to hang.  I
think it is because the server is hung in syscall() in
/usr/sys/machine/trap.c doing the system call over and over.  I think
the fix for this is in remote/rmt_general.c around line 147:

	/*
	 * finally, if the user has turned off remote access for himself,
	 * then just return.
	 */
	p = u.u_procp;
	if (p->p_flag & SNOREMOTE)
		return(FALSE);  <<<< this should be TRUE?
	u.u_error = 0;

>3.  If a remote host goes down, it seems that every process that chdir'ed to
>    or opened a file on that machine's file system must die before it can be
>    unmounted.  This gets in the way of a machine crashing and coming back
>    without disrupting the rest of the participating machines.
This was listed in the bugs at the end of the installation doc.
Whenever a connection goes down, there may be processes still up
``using'' that connection.  However, if there are no open files or
remote chdir(2)'s, the kernel could just as well restart the connection
without waiting for those processes to die. The fix would probably go
in the kernel routine rmt_getconnection().  If there are indeed no open
files or chdir's, then before returning the open socket, just test for

	so->so_state & SS_CANTSENDMORE

If set, then clean up the connection and start a new one.  If there are
open files or current directories, then things get more complicated.
You could traverse the file table, finding any descriptors pointing to
the dead host and assign them to some "invalid" entry in remote_info[].
This way, the file descriptors will fail gracefully.  Another
possibility is just to ignore the old remote file descriptors and fix
the server to handle it.  The directories are not quite so bad... the
user would just suddenly find himself in ``/'' when the connection is
restarted.

----------

Another problem related to #1 above is that sometimes real live binary
files on a remote host fail in the kernel execve(), and so the shell
starts to interpret the file, and you get many occurences of:

	gobbledygook: not found

Well, the interpretation of the file is the same problem as above, but
the real binary file failing on the execve() is another.  The way it
works is that the client kernel sends a request to the server; the server
opens the binary, reads the a.out header, and sends it back to the client,
and then waits for one of two things: 1) a request for the entire binary
to be sent or 2) a "forget it" message saying that the client doesn't
want the binary.  For whatever reason, the server is not getting
the "forget it" message, and hence all the file descriptors get
used up, and then NOTHING will exec (and then the shell starts interpreting
binaries).

This problem has only shown up with ATT's ksh (korn shell).

Thank you all for the great feedback, and special thanks to Terry Laskodi
(here at TEKLABS) for many great ideas for the design of RFS.  Many hours
he listened to me babble on about the design making great suggestions.

Again, I cannot spend any time on RFS, but I will repost bug reports sent
to me with comments attached about suggested fixes.  Naturally, I would
also like to see the fixes if any of you have the time.

			Todd Brunhoff
			{decvax,ucbvax}!tektronix!crl!toddb
			toddb%crl@tektronix.csnet