liam@cs.qmc.ac.uk (William Roberts) (12/05/88)
The NFS protocol requests are largely idempotent, so retries do not need to be distinguished from first attempts (though unlink is not idempotent and so needs a chace of recent attempts). However, for tuning NFS it would be very useful to know how many retries your servers were processing in order to experiment with more nfsds, fewer biods or longer timeouts. All RPC requests are marked with an identifier (xid) so that incoming replies can be matched with requests. Currently most NFS clients require that the reply xid exactly match the request xid, and don't change the xid on retries, but there is no compelling reason for this to be so. I suggest the following way of marking retries: 1) All original xids have most & least significant byte = zero 2) All retry xids have most and least significant byte = ones The reason for using both the most and least significant bytes rather than just a single bit is that I want to detect these on the server, and there is no reason why the client should waste time putting its xids into network order - this means that the least significant bit (bit 0) might turn up as bit 24 on some systems. DOes anyone think this is a good idea? Do you want to be able to identify retries on the server? -- William Roberts ARPA: liam@cs.qmc.ac.uk (gw: cs.ucl.edu) Queen Mary College UUCP: liam@qmc-cs.UUCP LONDON, UK Tel: 01-975 5250
jim@cs.strath.ac.uk (Jim Reid) (12/06/88)
In article <773@sequent.cs.qmc.ac.uk> liam@cs.qmc.ac.uk (William Roberts) writes: >... Suggesting a scheme for determining retried NFS request packets. >1) All original xids have most & least significant byte = zero >2) All retry xids have most and least significant byte = ones >Does anyone think this is a good idea? Do you want to be able >to identify retries on the server? I don't think this is a bad idea, but I don't think it would help much. The real problem with NFS is the absence of almost any flow-control at any stage of the protocol stack - ISO Reference Model or ARPA Reference Model, call it what you like. The NFS protocol should *somewhere* provide a means for clients and servers to say "slow down, you're going too fast for me". Where this should go is a religious issue - my preference would be for something at the UDP/transport level. Naturally, any notion of flow control further complicates the 'statelessness' of an NFS server. I see that Version 3 of the NFS protocol has now been published. I've not had a chance to look at it in detail yet. It does appear to go some way towards dealing with this problem. Here's one of the error codes (what a glorious name!) defined in the new edition of the protocol: NFSERR_JUKEBOX = 30 Slow down, buddy. The server has detected a retransmission of a request that is already in progress. As far as I can see, the spec. doesn't say what a client should do (or not do) when it gets this error returned. Worse, there doesn't appear to be any way that a client can ask the server to slow down, though an overloaded NFS client should not be as much of a problem as an overloaded server. To get back to William's question, I don't think his idea will need to see the light of day. To properly adopt it, the NFS protocol would need to be redefined and Sun have just done that. The new version of NFS has something about flow control. [Though it appears at first glance that clients are free to ignore "slow down" messages from the servers.] Hopefully the new protocol will go some way to make NFS more usable in a genuinely heterogeneous environment. Administrators should just have to fire up the NFS service and let the protocol figure out for itself just how fast or slow it should go. [If TCP can do it, why not NFS?] It is just not reasonable for users or system administrators to have to tinker with things like numbers of biod and nfsd processes or experiment with the increasingly more baroque options that are kludged into NFS mounts. Jim -- ARPA: jim%cs.strath.ac.uk@ucl-cs.arpa, jim@cs.strath.ac.uk UUCP: jim@strath-cs.uucp, ...!uunet!mcvax!ukc!strath-cs!jim JANET: jim@uk.ac.strath.cs "JANET domain ordering is swapped around so's there'd be some use for rev(1)!"
barmar@think.COM (Barry Margolin) (12/09/88)
In article <1287@stracs.cs.strath.ac.uk> jim@cs.strath.ac.uk writes: >Worse, there doesn't appear to >be any way that a client can ask the server to slow down, though an >overloaded NFS client should not be as much of a problem as an overloaded >server. A client should never be overloaded. NFS is based on RPC, so a server never sends unrequested data. If a client is overloaded it is because it is sending requests faster than it can handle the responses. This might happen because it sends several requests in sequence, without waiting for each to get a response, in an attempt to increase throughput. If it is sending too many requests this way, and the server is much faster than the client, it may indeed get responses back faster than it can handle. But instead of asking the server to slow down, the client can simply slow down its requests; instead of asking for 10 blocks of a file at a time, it could ask for only five. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
cs@kanawha.Sun.COM (Carl Smith) (12/09/88)
> All RPC requests are marked with an identifier (xid) so that > incoming replies can be matched with requests. Currently most > NFS clients require that the reply xid exactly match the > request xid, and don't change the xid on retries, but there is > no compelling reason for this to be so. Although I can imagine situations in which a client might partition its XID space to encode things like retransmissions, I'd be most surprised if it weren't true that ALL clients require an RPC reply XID to exactly match the request XID. Also, it's not true that the XID isn't changed on retries. In ports derived from NFSSRC, the XID is (wrongly) changed when the RPC level times out and returns to the NFS caller, which then retries. We've recently fixed this in SunOS. The philosophy behind it is that it's more important to have correct behavior than to keep good statistics. :-) Since most servers keep a cache of recent successful non-idempotent transactions (including the RPC XIDs associated with those transactions), we'd like to make it easy for them to detect retransmissions, and they may do that only by doing bit-for-bit comparisons on RPC XIDs (after all, an RPC server doesn't know anything of the XID space partitioning its clients may or may not be using). Changing the XIDs on them only makes their job more difficult. > I suggest the following way of marking retries: > 1) All original xids have most & least significant byte = zero > 2) All retry xids have most and least significant byte = ones > > The reason for using both the most and least significant bytes > rather than just a single bit is that I want to detect these on > the server, and there is no reason why the client should waste > time putting its xids into network order - this means that the > least significant bit (bit 0) might turn up as bit 24 on some > systems. > > DOes anyone think this is a good idea? Do you want to be able > to identify retries on the server? This still won't allow you to tell how many retransmissions are occurring, and that may be interesting. Moreover, the simple detection that an RPC request is a retransmission is useless to the server. Let's use your unlink example. Suppose an NFSPROC_REMOVE operation is retried, that the server sees the retransmission but not the original request, and that the file doesn't exist. To return an error (NFSERR_NOENT) would be appropriate if the server knew that the file had never existed. To return no error (NFSERR_OK) would be appropriate if the server knew that the file had existed and had been removed by the original request. To know only that the request is a retransmission doesn't help in the least. Carl
cs@kanawha.Sun.COM (Carl Smith) (12/09/88)
> Here's one of the error codes > (what a glorious name!) defined in the new edition of the protocol: > > NFSERR_JUKEBOX = 30 > Slow down, buddy. The server has detected a retransmission > of a request that is already in progress. The error is named after a machine made by Epoch, which uses WORMs for backing store for normal disks, and which they call a jukebox. The delays one encounters when having to fetch data from a CD motivated it. > To get back to William's question, I don't think his idea will need to > see the light of day. To properly adopt it, the NFS protocol would need > to be redefined and Sun have just done that. Actually, the previous suggestion was about detecting RPC retries. The connection with NFS is illusory. > Administrators should just have to > fire up the NFS service and let the protocol figure out for itself just > how fast or slow it should go. [If TCP can do it, why not NFS?] Right. And it now does exactly that. The NFS client code in SunOS 4.1 does dynamic adjustment of retransmission times and sizes of read, readdir, and write requests. We've been able to mount NFS file systems over the ARPANET and do operations without once seeing the dreaded ``NFS server foo not responding'' messages. Carl
mre@beatnix.UUCP (Mike Eisler) (12/10/88)
In article <773@sequent.cs.qmc.ac.uk> liam@cs.qmc.ac.uk (William Roberts) writes: ... >However, for tuning NFS it would be very useful to know how >many retries your servers were processing in order to >experiment with more nfsds, fewer biods or longer timeouts. >All RPC requests are marked with an identifier (xid) so that >incoming replies can be matched with requests. Currently most >NFS clients require that the reply xid exactly match the >request xid, and don't change the xid on retries, but there is >no compelling reason for this to be so. I suggest the following Well, there is a compelling reason to somehow bind retries with requests, and matching all or part of the transaction ids (xids), is one way. You want to do this so that old response packets from a previous operation don't appear on the client's UDP/IP port, and produce a bad response to a new request. >way of marking retries: >1) All original xids have most & least significant byte = zero >2) All retry xids have most and least significant byte = ones >The reason for using both the most and least significant bytes >rather than just a single bit is that I want to detect these on >the server, and there is no reason why the client should waste >time putting its xids into network order - this means that the >least significant bit (bit 0) might turn up as bit 24 on some >systems. You've lost me here. If the client doesn't put the the xid into network order, then how does the server know where the logical most and least significant bits are? The client and server have no idea about each other's byte ordering. >DOes anyone think this is a good idea? Do you want to be able >to identify retries on the server? It would be nice to keep statistics about retries on the server, especially on a per client basis. (Note that in most NFS implementations, the client keeps these kind of statistics, but not on a per server basis). Your method of using the xid amounts to the server unilaterally enforcing a format to a client generated cookie. This violates the current NFS spec, if not the RPC spec. If you would like to see this in the next NFS protocol spec, you ought to contact the NFS people at Sun. The email address for sending comments on the next protocol (version 3), is sun!cs, or cs@sun.com. >William Roberts ARPA: liam@cs.qmc.ac.uk (gw: cs.ucl.edu) >Queen Mary College UUCP: liam@qmc-cs.UUCP >LONDON, UK Tel: 01-975 5250 -Mike Eisler {sun,uunet}!elxsi!mre