barmar@think.com (Barry Margolin) (06/07/90)
I was having trouble with a Symbolics Lisp Machine that was trying to write a file on a SunOS 4.0.3 file server using NFS. The I/O board on the LispM was a bit flaky (I've since replaced it) so it was dropping some incoming packets. In particular, it frequently dropped the reply to its NFS Write RPC packets. It retransmitted the packet, but the Sun never replied to the retransmitted packet at all (I was watching with a network monitor), and eventually the LispM would report a timeout. This doesn't seem like appropriate behavior for the server. I assume the server is maintaining a cache of recent writes in order to improve idempotency, but the right thing for it to do when it sees a duplicate write request is to send out another reply without actually rewriting the block, but it shouldn't ignore the request completely. Does anyone know if my suspicion about the cause of the problem is correct. Does 4.1 still have this behavior? -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
brent@terra.Eng.Sun.COM (Brent Callaghan) (06/08/90)
In article <37135@think.Think.COM>, barmar@think.com (Barry Margolin) writes: > I was having trouble with a Symbolics Lisp Machine that was trying to write > a file on a SunOS 4.0.3 file server using NFS. The I/O board on the LispM > was a bit flaky (I've since replaced it) so it was dropping some incoming > packets. In particular, it frequently dropped the reply to its NFS Write > RPC packets. It retransmitted the packet, but the Sun never replied to the > retransmitted packet at all (I was watching with a network monitor), and > eventually the LispM would report a timeout. This doesn't seem like > appropriate behavior for the server. I assume the server is maintaining a > cache of recent writes in order to improve idempotency, but the right thing > for it to do when it sees a duplicate write request is to send out another > reply without actually rewriting the block, but it shouldn't ignore the > request completely. > > Does anyone know if my suspicion about the cause of the problem is correct. > Does 4.1 still have this behavior? There is no dup request cache for writes in 4.0.3 NFS. There is a cache of sorts that is used to return a useful response to failures caused by duplicate non-idempotent requests, but this is not consulted for writes and in any case a response is always returned. If you`re talking to a SunOS 4.0.3 server then you'll get a response to every request (unless the server is so overloaded that it's dropping requests). SunOS 4.1 utilizes a "Chet" cache (See Chet Juszczak's paper in Winter '89 Usenix proceedings). This cache not only reduces the chances of incorrect results from duplicate non-idempotent requests, but also allows the server to avoid unnecessary work generated by non-idempotent requests. The Chet cache maintains an "in-progress" bit. All incoming requests are checked against the cache. If a request is a duplicate and the "in-progress" bit is on then the request is ignored since a response is forthcoming anyway. You should eventually see a response to the original request and retries will get a response if the request has been completed at the server i.e. no longer in progress. In short: The problem is not likely to be due to caching. -- Made in New Zealand --> Brent Callaghan @ Sun Microsystems uucp: sun!bcallaghan phone: (415) 336 1051