[comp.protocols.nfs] No reply to duplicate NFS Write

barmar@think.com (Barry Margolin) (06/07/90)

I was having trouble with a Symbolics Lisp Machine that was trying to write
a file on a SunOS 4.0.3 file server using NFS.  The I/O board on the LispM
was a bit flaky (I've since replaced it) so it was dropping some incoming
packets.  In particular, it frequently dropped the reply to its NFS Write
RPC packets.  It retransmitted the packet, but the Sun never replied to the
retransmitted packet at all (I was watching with a network monitor), and
eventually the LispM would report a timeout.  This doesn't seem like
appropriate behavior for the server.  I assume the server is maintaining a
cache of recent writes in order to improve idempotency, but the right thing
for it to do when it sees a duplicate write request is to send out another
reply without actually rewriting the block, but it shouldn't ignore the
request completely.

Does anyone know if my suspicion about the cause of the problem is correct.
Does 4.1 still have this behavior?
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

brent@terra.Eng.Sun.COM (Brent Callaghan) (06/08/90)

In article <37135@think.Think.COM>, barmar@think.com (Barry Margolin) writes:
> I was having trouble with a Symbolics Lisp Machine that was trying to write
> a file on a SunOS 4.0.3 file server using NFS.  The I/O board on the LispM
> was a bit flaky (I've since replaced it) so it was dropping some incoming
> packets.  In particular, it frequently dropped the reply to its NFS Write
> RPC packets.  It retransmitted the packet, but the Sun never replied to the
> retransmitted packet at all (I was watching with a network monitor), and
> eventually the LispM would report a timeout.  This doesn't seem like
> appropriate behavior for the server.  I assume the server is maintaining a
> cache of recent writes in order to improve idempotency, but the right thing
> for it to do when it sees a duplicate write request is to send out another
> reply without actually rewriting the block, but it shouldn't ignore the
> request completely.
> 
> Does anyone know if my suspicion about the cause of the problem is correct.
> Does 4.1 still have this behavior?

There is no dup request cache for writes in 4.0.3 NFS.  There is a cache
of sorts that is used to return a useful response to failures caused
by duplicate non-idempotent requests, but this is not consulted for
writes and in any case a response is always returned.  If you`re
talking to a SunOS 4.0.3 server then you'll get a response to every
request (unless the server is so overloaded that it's dropping
requests).

SunOS 4.1 utilizes a "Chet" cache (See Chet Juszczak's paper in
Winter '89 Usenix proceedings).  This cache not only reduces the
chances of incorrect results from duplicate non-idempotent requests,
but also allows the server to avoid unnecessary work generated by
non-idempotent requests.  The Chet cache maintains an "in-progress" bit.
All incoming requests are checked against the cache.  If a request is
a duplicate and the "in-progress" bit is on then the request is 
ignored since a response is forthcoming anyway.  You should eventually
see a response to the original request and retries will get a response
if the request has been completed at the server i.e. no longer in
progress.

In short: The problem is not likely to be due to caching.
--

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051