[comp.protocols.nfs] Suggested 3rd type of NFS mount

liam@cs.qmc.ac.uk (William Roberts) (09/29/88)

There follows a (-ms) troff document porposing a third type of
mount, part way between soft mounts and hard mounts.

FLAME ON - I sent this to the appropriate Sun Vice-Pres who
eventually passed it to someone else who said "very
interesting" and presumably binned it. "Listen to our
customers" indeed...
FLAME OFF

Any comments, similar experiences etc?
-------------------------------------------------------------
.NH
Too Soft or Too Hard?
.LP
We experience a number of problems at QMC which lead us to propose a
third form of NFS mounting: a
.I medium
mount which has semantics midway between soft mounting and hard
mounting.
.NH 2
Problems with Soft Mounting
.LP
It is wrong to assert that information which is never written to is
un-important. We have file servers providing manual pages etc, but we
also
have file servers providing binaries: binaries are read-only, but it is
a serious problem if after 3 hours of computation a read error makes a
single page of the binary unavailable. For this reason we have to hard
mount all of our file servers.
.NH 2
Problems with Hard Mount
.LP
The key problem with hard-mounting binaries is the undesirable
interaction with programs such as csh, which scan all of the path
directories to construct a hash table of executable names. If one of
those directories is on a hard mounted file server which happens to be
down, csh will hang indefintely until the server comes back up. As this
normally happens during login, the machine is rendered un-useable and
drastic action (such as rebooting) is necessary to lose the
corresponding mount table entry.
.NH 3
Failure Semantics
.LP
The behaviour of hard and soft mounted file systems differ only after a
specified number of attempts have failed for some operation. The
behaviour is then:
.IP "Soft Mount"
Print "\c
.I "NFS Server X not responding, giving up" "
and abort the operation
with a suitable error code.
.IP "Hard Mount"
Print "\c
.I "NFS Server X not responding, still trying" "
and start a new series of attempts.
.LP
To counter the problem with csh in particular, we suggest adding a third
type of mount whose behaviour is between the two, neither giving up as
easily as soft mounts, nor persisting as doggedly as hard mounts. Its
behaviour would be the same as soft and hard up to the point at which
the retry limit is reached (as above).
.IP "Medium Mount"
Ping the server using an ICMP echo packet (maybe several). If the server
kernel appears to be alive and does reply, then print "\c
.I "NFS server X not responding but still alive, still trying" "
and start a new series of
attempts. If the server does not respond then print "\c
.I "NFS server X appears to be down, giving up" "
and abort in the same way as for a soft mount.
.LP
This approach would overcome an additional obstacle to soft mounts at
QMC, the occasional overflow of the UDP socket receive queue used by the
nfs daemons on a very busy filestore. As an example, our local SmallTalk
implementation writes its image as two 1 Megabyte writes. This causes
the file server to be pounded with 8K UDP packets containing data to be
written. As the UDP receive queue can hold only 9000 bytes, one
outstanding write effectively clogs the server and causes the server
kernel to discard further requests. This can happen on any server, and
we almost always get one or more "NFS server X not responding, still
trying" messages while writing a SmallTalk image. Using a medium mount
as proposed above would enable us to mount the server in a way which
wouldn't jam the client if the server is down, but also won't trash
these images just because of temporary server overload.
.LP
A specific suggestion for SunOS 4.0 is that the nfsd program use the
appropriate BSD
4.3 ioctl to increase its socket receive buffer size, though a
limit based on the number of packets would also be useful in relating
the number of outstanding requests to the number of nfsd server
processes. While on the subject of these ioctls, it is possible
to get into a race condition with the mount daemon (the
user-level RPC stuff has a different semantics about handling
timeouts) so that could usefully crack its recv buffer down to
something small (like 4-5 requests).

-- 

William Roberts         ARPA: liam@cs.qmc.ac.uk  (gw: cs.ucl.edu)
Queen Mary College      UUCP: liam@qmc-cs.UUCP
LONDON, UK              Tel:  01-975 5250