toddb@tekcrl.UUCP (Todd Brunhoff) (01/24/86)
Wayne Power writes: >1. RFS doesn't seem to care about x bits or file type. > > % /remote_host/etc/passwd > > ...tries to run the password file as a shell script, even though its mode > 644. Even more off the wall... > > % /remote_host/etc > > ...does the same thing. > Yep. A bug. Basically, the shell is starting up the interpreter, but that is probably because it thinks it is the right mode. The bug is probably in the server in s_access() in serversyscall.c. >2. Intentional circumfornication like... > > % cd /host1/host2 > > ...hangs the server on 'host1'. > RFS is designed not to allow hopping hosts like this example. It should just fail, but instead causes the server on 'host1' to hang. I think it is because the server is hung in syscall() in /usr/sys/machine/trap.c doing the system call over and over. I think the fix for this is in remote/rmt_general.c around line 147: /* * finally, if the user has turned off remote access for himself, * then just return. */ p = u.u_procp; if (p->p_flag & SNOREMOTE) return(FALSE); <<<< this should be TRUE? u.u_error = 0; >3. If a remote host goes down, it seems that every process that chdir'ed to > or opened a file on that machine's file system must die before it can be > unmounted. This gets in the way of a machine crashing and coming back > without disrupting the rest of the participating machines. This was listed in the bugs at the end of the installation doc. Whenever a connection goes down, there may be processes still up ``using'' that connection. However, if there are no open files or remote chdir(2)'s, the kernel could just as well restart the connection without waiting for those processes to die. The fix would probably go in the kernel routine rmt_getconnection(). If there are indeed no open files or chdir's, then before returning the open socket, just test for so->so_state & SS_CANTSENDMORE If set, then clean up the connection and start a new one. If there are open files or current directories, then things get more complicated. You could traverse the file table, finding any descriptors pointing to the dead host and assign them to some "invalid" entry in remote_info[]. This way, the file descriptors will fail gracefully. Another possibility is just to ignore the old remote file descriptors and fix the server to handle it. The directories are not quite so bad... the user would just suddenly find himself in ``/'' when the connection is restarted. ---------- Another problem related to #1 above is that sometimes real live binary files on a remote host fail in the kernel execve(), and so the shell starts to interpret the file, and you get many occurences of: gobbledygook: not found Well, the interpretation of the file is the same problem as above, but the real binary file failing on the execve() is another. The way it works is that the client kernel sends a request to the server; the server opens the binary, reads the a.out header, and sends it back to the client, and then waits for one of two things: 1) a request for the entire binary to be sent or 2) a "forget it" message saying that the client doesn't want the binary. For whatever reason, the server is not getting the "forget it" message, and hence all the file descriptors get used up, and then NOTHING will exec (and then the shell starts interpreting binaries). This problem has only shown up with ATT's ksh (korn shell). Thank you all for the great feedback, and special thanks to Terry Laskodi (here at TEKLABS) for many great ideas for the design of RFS. Many hours he listened to me babble on about the design making great suggestions. Again, I cannot spend any time on RFS, but I will repost bug reports sent to me with comments attached about suggested fixes. Naturally, I would also like to see the fixes if any of you have the time. Todd Brunhoff {decvax,ucbvax}!tektronix!crl!toddb toddb%crl@tektronix.csnet