hedrick@TOPAZ.RUTGERS.EDU.UUCP (08/17/87)
The distinction involved in "statelessness" is how much "state information" is kept around by file servers about who is using them. If the protocol requires the server to keep control information about each user, then if the server goes down and is rebooted, the clients will not be able to continue. State information is also needed in order to remember who has what file locked. When it is said that NFS is stateless, what this means is that in principle a given request can be processed without any reference to any previous requests, or any side effects created by such previous requests. The advantage of this approach is that clients and servers can randomly crash and come up, or be rebooted, and no special action is needed to keep the network file system consistent. The previous discussion points out that this distinction is at best a slippery one. What a client sees most certainly does depend upon the state of the file system: that's the whole point of doing a remote access, after all. As somebody mentioned, if you dump and restore a file system, all your files have new inode numbers, and all the clients are going to have to restart whatever they are doing. (This situation will be detected by NFS if you do things correctly. However recovery will require you to kill all the processes that are using the file system remotely, umount it and then mount it again.) However except for things that completely rebuild a file system, NFS in practice does recover from crashes and reboots. The only operational problems we have seen with it are that when a server is down, its clients have a tendency to behave badly in various ways, even when what they are doing does not really require that particular server. This does not seem to have anything to do with the NFS protocol per se, but is an implementation issue. (In general that situation is improving release by release.) People around here have gotten started typing "df &", just to avoid having our jobs hang if any servers are down. (To be fair, this is largely because of machines running out of date NFS implementations. On a Sun, you can eventually get out of df in this situation, though it is still annoying.) File locking is handled by a separate lock daemon, which is not part of NFS, though it is implemented using similar mechanisms. File locking obviously involves state information. The server has to remember that the file is locked across crashes. It also has to have a way to get rid of the lock if the client crashes. Sun uses two daemons, lockd and statd, to keep track of things. Statd seems to be responsible for keeping track of what machines are up, and reestablishing connections after a crash and reload. It uses a set of disk directories that contain hosts that it is to monitor and hosts that it is to notify when it comes up after a crash. Like NFS, there don't seem to be any administrative tools involved in maintaining or restoring consistency to the locking system. However I do recall one time where we had to do some manual cleanup after permanently removing a machine. I suspect statd had been told to keep track of the machine. There was no big damage, just an ocassional console error message we wanted to get rid of. There are no adminstrative issues involved with maintaining NFS per se. However you certainly do want to think carefully about what machines mount what disks, and what attributes you will use (e.g. readonly and whether root access is preserved across the network). Our primary planning problem is setting things up so that we don't have every machine having to mount every other disk in the world. I make no comparisons with other remote file system protocols, since I don't know them. You can imagine protocols that would keep internal state information about every file that is open, and which would leave the world in a horribly inconsistent state after a crash and reload. But whether any such things actually exist, I couldn't say. The only bad examples Sun pointed to back when they introduced NFS had to do with file locking. But locking is a special situation, where even Sun maintains state information.