trt@rti.UUCP (Thomas Truscott) (01/22/89)
> I became a firm believer in NFS's stateless server approach the very first > time I saw a server crash and reboot. The client applications using it just > froze. When the server came back to life, they picked up and continued as > if nothing had happened. But statefull approaches do this too! It has nothing do with the server (which crashed after all), it has to do with whether or not the *client* can recover the server's state. Many people are firm believers in a statefull client approach, though perhaps they are not aware of it. Okay, what about servers, should they be statefull too? Of course! UNIX semantics demand it (and so do those of VMS, MS/DOS, and every other operating system I can think of). An obvious example is remote "mkdir" (or unlink, exclusive open, etc.) which succeeds but the ACK is dropped and then the retry fails. Another is access to special devices, such as autodialers and tape drives. What happens when a statefull server crashes and reboots? The (statefull) client applications freeze, and when the server comes up they reestablish their state and continue as if nothing had happened. Of course reestablishing an ordinary file's state is much easier than reestablishing that of an modem or graphics device, so applications using the latter will probably fail. But that is better than using stateless services which do not allow such things in the first place! Keep in mind that crashes are rare, even on NFS systems (whose users seem to talk about crashes alot). With a statefull server, correct UNIX (or perhaps MVS, AOS, whatever) semantics are easy, natural, and the norm. Recovering state lost due to a crash is, however, only an approximation, and is called "error recovery". With stateless servers, approximation is the whole idea. While on the subject of crash recovery, please note that fault tolerant systems provide same by replicating their state, not by placing it in a single basket. So how does NFS survive with stateless servers? The first line of defense is that the network rarely drops a packet. The second line of defense is to introduce server state (oops!) in the form of retry caches, and of course the infamous "inode generation numbers". This works well except under heavy load or when the system crashes. The third line of defense to introduce a suite of "network services" that provide statefull support for a few "most wanted" features lacking in NFS such as file locking. (I think vendors have to pay SUN extra $$ to get them.) The fourth line of defense is that "NFS is not UNIX" and that this is a small price to pay for crash recovery and for operation in non-UNIX environments. Well, it has always seemed to me that "NFS is not UNIX" because "UNIX is not stateless", that stateless servers are not necessary for crash recovery, and that stateless servers hinder operation in non-UNIX environments. Please consider the theorem: There is nothing a stateless system can do that cannot be done by a statefull system. The proof is left as an exercise for the reader. Tom Truscott
guy@auspex.UUCP (Guy Harris) (01/24/89)
>The third line of defense to introduce a suite of "network services" >that provide statefull support for a few "most wanted" features >lacking in NFS such as file locking. >(I think vendors have to pay SUN extra $$ to get them.) No. The lock daemon, and other ONC services, come with the NFS tape, at least for the latest ONC/NFS source tape....