[comp.protocols.tcp-ip] RFS vs NFS

hedrick@TOPAZ.RUTGERS.EDU.UUCP (08/17/87)

The distinction involved in "statelessness" is how much "state
information" is kept around by file servers about who is using them.
If the protocol requires the server to keep control information about
each user, then if the server goes down and is rebooted, the clients
will not be able to continue.  State information is also needed in
order to remember who has what file locked.  When it is said that NFS
is stateless, what this means is that in principle a given request can
be processed without any reference to any previous requests, or any
side effects created by such previous requests.  The advantage of this
approach is that clients and servers can randomly crash and come up,
or be rebooted, and no special action is needed to keep the network
file system consistent.  The previous discussion points out that this
distinction is at best a slippery one.  What a client sees most
certainly does depend upon the state of the file system: that's the
whole point of doing a remote access, after all.  As somebody
mentioned, if you dump and restore a file system, all your files have
new inode numbers, and all the clients are going to have to restart
whatever they are doing.  (This situation will be detected by NFS if
you do things correctly.  However recovery will require you to kill
all the processes that are using the file system remotely, umount it
and then mount it again.)  However except for things that completely
rebuild a file system, NFS in practice does recover from crashes and
reboots.  The only operational problems we have seen with it are that
when a server is down, its clients have a tendency to behave badly in
various ways, even when what they are doing does not really require
that particular server.  This does not seem to have anything to do
with the NFS protocol per se, but is an implementation issue.  (In
general that situation is improving release by release.)  People
around here have gotten started typing "df &", just to avoid having
our jobs hang if any servers are down.  (To be fair, this is largely
because of machines running out of date NFS implementations.  On
a Sun, you can eventually get out of df in this situation, though it
is still annoying.)

File locking is handled by a separate lock daemon, which is not part
of NFS, though it is implemented using similar mechanisms.  File
locking obviously involves state information.  The server has to
remember that the file is locked across crashes.  It also has to have
a way to get rid of the lock if the client crashes.  Sun uses two
daemons, lockd and statd, to keep track of things.  Statd seems to be
responsible for keeping track of what machines are up, and
reestablishing connections after a crash and reload.  It uses a set of
disk directories that contain hosts that it is to monitor and hosts
that it is to notify when it comes up after a crash.  Like NFS, there
don't seem to be any administrative tools involved in maintaining or
restoring consistency to the locking system.  However I do recall one
time where we had to do some manual cleanup after permanently removing
a machine.  I suspect statd had been told to keep track of the
machine.  There was no big damage, just an ocassional console error
message we wanted to get rid of.

There are no adminstrative issues involved with maintaining NFS per
se.  However you certainly do want to think carefully about what
machines mount what disks, and what attributes you will use (e.g.
readonly and whether root access is preserved across the network).
Our primary planning problem is setting things up so that we don't
have every machine having to mount every other disk in the world.

I make no comparisons with other remote file system protocols, since I
don't know them.  You can imagine protocols that would keep internal
state information about every file that is open, and which would leave
the world in a horribly inconsistent state after a crash and reload.
But whether any such things actually exist, I couldn't say.  The only
bad examples Sun pointed to back when they introduced NFS had to do
with file locking.  But locking is a special situation, where even Sun
maintains state information.