liam@cs.qmc.ac.uk (William Roberts) (09/29/88)
There follows a (-ms) troff document porposing a third type of mount, part way between soft mounts and hard mounts. FLAME ON - I sent this to the appropriate Sun Vice-Pres who eventually passed it to someone else who said "very interesting" and presumably binned it. "Listen to our customers" indeed... FLAME OFF Any comments, similar experiences etc? ------------------------------------------------------------- .NH Too Soft or Too Hard? .LP We experience a number of problems at QMC which lead us to propose a third form of NFS mounting: a .I medium mount which has semantics midway between soft mounting and hard mounting. .NH 2 Problems with Soft Mounting .LP It is wrong to assert that information which is never written to is un-important. We have file servers providing manual pages etc, but we also have file servers providing binaries: binaries are read-only, but it is a serious problem if after 3 hours of computation a read error makes a single page of the binary unavailable. For this reason we have to hard mount all of our file servers. .NH 2 Problems with Hard Mount .LP The key problem with hard-mounting binaries is the undesirable interaction with programs such as csh, which scan all of the path directories to construct a hash table of executable names. If one of those directories is on a hard mounted file server which happens to be down, csh will hang indefintely until the server comes back up. As this normally happens during login, the machine is rendered un-useable and drastic action (such as rebooting) is necessary to lose the corresponding mount table entry. .NH 3 Failure Semantics .LP The behaviour of hard and soft mounted file systems differ only after a specified number of attempts have failed for some operation. The behaviour is then: .IP "Soft Mount" Print "\c .I "NFS Server X not responding, giving up" " and abort the operation with a suitable error code. .IP "Hard Mount" Print "\c .I "NFS Server X not responding, still trying" " and start a new series of attempts. .LP To counter the problem with csh in particular, we suggest adding a third type of mount whose behaviour is between the two, neither giving up as easily as soft mounts, nor persisting as doggedly as hard mounts. Its behaviour would be the same as soft and hard up to the point at which the retry limit is reached (as above). .IP "Medium Mount" Ping the server using an ICMP echo packet (maybe several). If the server kernel appears to be alive and does reply, then print "\c .I "NFS server X not responding but still alive, still trying" " and start a new series of attempts. If the server does not respond then print "\c .I "NFS server X appears to be down, giving up" " and abort in the same way as for a soft mount. .LP This approach would overcome an additional obstacle to soft mounts at QMC, the occasional overflow of the UDP socket receive queue used by the nfs daemons on a very busy filestore. As an example, our local SmallTalk implementation writes its image as two 1 Megabyte writes. This causes the file server to be pounded with 8K UDP packets containing data to be written. As the UDP receive queue can hold only 9000 bytes, one outstanding write effectively clogs the server and causes the server kernel to discard further requests. This can happen on any server, and we almost always get one or more "NFS server X not responding, still trying" messages while writing a SmallTalk image. Using a medium mount as proposed above would enable us to mount the server in a way which wouldn't jam the client if the server is down, but also won't trash these images just because of temporary server overload. .LP A specific suggestion for SunOS 4.0 is that the nfsd program use the appropriate BSD 4.3 ioctl to increase its socket receive buffer size, though a limit based on the number of packets would also be useful in relating the number of outstanding requests to the number of nfsd server processes. While on the subject of these ioctls, it is possible to get into a race condition with the mount daemon (the user-level RPC stuff has a different semantics about handling timeouts) so that could usefully crack its recv buffer down to something small (like 4-5 requests). -- William Roberts ARPA: liam@cs.qmc.ac.uk (gw: cs.ucl.edu) Queen Mary College UUCP: liam@qmc-cs.UUCP LONDON, UK Tel: 01-975 5250