[comp.protocols.nfs] Suggestion for improved NFS monitoring

liam@cs.qmc.ac.uk (William Roberts) (12/05/88)

The NFS protocol requests are largely idempotent, so retries do
not need to be distinguished from first attempts (though unlink
is not idempotent and so needs a chace of recent attempts).
However, for tuning NFS it would be very useful to know how
many retries your servers were processing in order to
experiment with more nfsds, fewer biods or longer timeouts.

All RPC requests are marked with an identifier (xid) so that
incoming replies can be matched with requests. Currently most
NFS clients require that the reply xid exactly match the
request xid, and don't change the xid on retries, but there is
no compelling reason for this to be so. I suggest the following
way of marking retries:

1) All original xids have most & least significant byte = zero
2) All retry xids have most and least significant byte = ones

The reason for using both the most and least significant bytes
rather than just a single bit is that I want to detect these on
the server, and there is no reason why the client should waste
time putting its xids into network order - this means that the
least significant bit (bit 0) might turn up as bit 24 on some
systems.

DOes anyone think this is a good idea? Do you want to be able
to identify retries on the server?

-- 

William Roberts         ARPA: liam@cs.qmc.ac.uk  (gw: cs.ucl.edu)
Queen Mary College      UUCP: liam@qmc-cs.UUCP
LONDON, UK              Tel:  01-975 5250

jim@cs.strath.ac.uk (Jim Reid) (12/06/88)

In article <773@sequent.cs.qmc.ac.uk> liam@cs.qmc.ac.uk (William Roberts) writes:
>... Suggesting a scheme for determining retried NFS request packets.

>1) All original xids have most & least significant byte = zero
>2) All retry xids have most and least significant byte = ones

>Does anyone think this is a good idea? Do you want to be able
>to identify retries on the server?

I don't think this is a bad idea, but I don't think it would help much.

The real problem with NFS is the absence of almost any flow-control at
any stage of the protocol stack - ISO Reference Model or ARPA Reference
Model, call it what you like. The NFS protocol should *somewhere*
provide a means for clients and servers to say "slow down, you're going
too fast for me". Where this should go is a religious issue - my
preference would be for something at the UDP/transport level. Naturally,
any notion of flow control further complicates the 'statelessness' of an
NFS server.

I see that Version 3 of the NFS protocol has now been published. I've
not had a chance to look at it in detail yet. It does appear to go some
way towards dealing with this problem. Here's one of the error codes
(what a glorious name!) defined in the new edition of the protocol:

	NFSERR_JUKEBOX = 30
        	Slow down, buddy. The server has detected a retransmission
        	of a request that is already in progress.

As far as I can see, the spec. doesn't say what a client should do (or
not do) when it gets this error returned. Worse, there doesn't appear to
be any way that a client can ask the server to slow down, though an
overloaded NFS client should not be as much of a problem as an overloaded
server.

To get back to William's question, I don't think his idea will need to
see the light of day. To properly adopt it, the NFS protocol would need
to be redefined and Sun have just done that. The new version of NFS has
something about flow control. [Though it appears at first glance that
clients are free to ignore "slow down" messages from the servers.]

Hopefully the new protocol will go some way to make NFS more usable in a
genuinely heterogeneous environment. Administrators should just have to
fire up the NFS service and let the protocol figure out for itself just
how fast or slow it should go. [If TCP can do it, why not NFS?] It is
just not reasonable for users or system administrators to have to tinker
with things like numbers of biod and nfsd processes or experiment with
the increasingly more baroque options that are kludged into NFS mounts.

		Jim

-- 
ARPA:	jim%cs.strath.ac.uk@ucl-cs.arpa, jim@cs.strath.ac.uk
UUCP:	jim@strath-cs.uucp, ...!uunet!mcvax!ukc!strath-cs!jim
JANET:	jim@uk.ac.strath.cs

"JANET domain ordering is swapped around so's there'd be some use for rev(1)!"

barmar@think.COM (Barry Margolin) (12/09/88)

In article <1287@stracs.cs.strath.ac.uk> jim@cs.strath.ac.uk writes:
>Worse, there doesn't appear to
>be any way that a client can ask the server to slow down, though an
>overloaded NFS client should not be as much of a problem as an overloaded
>server.

A client should never be overloaded.  NFS is based on RPC, so a server
never sends unrequested data.  If a client is overloaded it is because
it is sending requests faster than it can handle the responses.  This
might happen because it sends several requests in sequence, without
waiting for each to get a response, in an attempt to increase
throughput.  If it is sending too many requests this way, and the
server is much faster than the client, it may indeed get responses
back faster than it can handle.  But instead of asking the server to
slow down, the client can simply slow down its requests; instead of
asking for 10 blocks of a file at a time, it could ask for only five.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

cs@kanawha.Sun.COM (Carl Smith) (12/09/88)

> All RPC requests are marked with an identifier (xid) so that
> incoming replies can be matched with requests. Currently most
> NFS clients require that the reply xid exactly match the
> request xid, and don't change the xid on retries, but there is
> no compelling reason for this to be so.

	Although I can imagine situations in which a client might partition
its XID space to encode things like retransmissions, I'd be most surprised if
it weren't true that ALL clients require an RPC reply XID to exactly match the
request XID.
	Also, it's not true that the XID isn't changed on retries.  In ports
derived from NFSSRC, the XID is (wrongly) changed when the RPC level times out
and returns to the NFS caller, which then retries.  We've recently fixed this
in SunOS.  The philosophy behind it is that it's more important to have correct
behavior than to keep good statistics. :-)
	Since most servers keep a cache of recent successful non-idempotent
transactions (including the RPC XIDs associated with those transactions), we'd
like to make it easy for them to detect retransmissions, and they may do that
only by doing bit-for-bit comparisons on RPC XIDs (after all, an RPC server
doesn't know anything of the XID space partitioning its clients may or may
not be using).  Changing the XIDs on them only makes their job more difficult.

> I suggest the following way of marking retries:
> 1) All original xids have most & least significant byte = zero
> 2) All retry xids have most and least significant byte = ones
>
> The reason for using both the most and least significant bytes
> rather than just a single bit is that I want to detect these on
> the server, and there is no reason why the client should waste
> time putting its xids into network order - this means that the
> least significant bit (bit 0) might turn up as bit 24 on some
> systems.
>
> DOes anyone think this is a good idea? Do you want to be able
> to identify retries on the server?

	This still won't allow you to tell how many retransmissions are
occurring, and that may be interesting. Moreover, the simple detection
that an RPC request is a retransmission is useless to the server.
	Let's use your unlink example.  Suppose an NFSPROC_REMOVE operation
is retried, that the server sees the retransmission but not the original
request, and that the file doesn't exist.  To return an error (NFSERR_NOENT)
would be appropriate if the server knew that the file had never existed.
To return no error (NFSERR_OK) would be appropriate if the server knew that
the file had existed and had been removed by the original request.  To know
only that the request is a retransmission doesn't help in the least.


			Carl

cs@kanawha.Sun.COM (Carl Smith) (12/09/88)

> Here's one of the error codes
> (what a glorious name!) defined in the new edition of the protocol:
> 
> 	NFSERR_JUKEBOX = 30
>         	Slow down, buddy. The server has detected a retransmission
>         	of a request that is already in progress.

	The error is named after a machine made by Epoch, which uses WORMs for
backing store for normal disks, and which they call a jukebox.  The delays one
encounters when having to fetch data from a CD motivated it.

> To get back to William's question, I don't think his idea will need to
> see the light of day. To properly adopt it, the NFS protocol would need
> to be redefined and Sun have just done that.

	Actually, the previous suggestion was about detecting RPC retries.
The connection with NFS is illusory.

> Administrators should just have to
> fire up the NFS service and let the protocol figure out for itself just
> how fast or slow it should go. [If TCP can do it, why not NFS?]

	Right.  And it now does exactly that.  The NFS client code in SunOS 4.1
does dynamic adjustment of retransmission times and sizes of read, readdir, and
write requests.  We've been able to mount NFS file systems over the ARPANET and
do operations without once seeing the dreaded ``NFS server foo not responding''
messages.

			Carl

mre@beatnix.UUCP (Mike Eisler) (12/10/88)

In article <773@sequent.cs.qmc.ac.uk> liam@cs.qmc.ac.uk (William Roberts) writes:

...

>However, for tuning NFS it would be very useful to know how
>many retries your servers were processing in order to
>experiment with more nfsds, fewer biods or longer timeouts.

>All RPC requests are marked with an identifier (xid) so that
>incoming replies can be matched with requests. Currently most
>NFS clients require that the reply xid exactly match the
>request xid, and don't change the xid on retries, but there is
>no compelling reason for this to be so. I suggest the following

Well, there is a compelling reason to somehow bind retries with
requests, and matching all or part of the transaction ids (xids),
is one way. You want to do this so that old response packets from
a previous operation don't appear on the client's UDP/IP port,
and produce a bad response to a new request.

>way of marking retries:

>1) All original xids have most & least significant byte = zero
>2) All retry xids have most and least significant byte = ones

>The reason for using both the most and least significant bytes
>rather than just a single bit is that I want to detect these on
>the server, and there is no reason why the client should waste
>time putting its xids into network order - this means that the
>least significant bit (bit 0) might turn up as bit 24 on some
>systems.

You've lost me here. If the client doesn't put the the xid into
network order, then how does the server know where the logical
most and least significant bits are? The client and server have
no idea about each other's byte ordering.

>DOes anyone think this is a good idea? Do you want to be able
>to identify retries on the server?

It would be nice to keep statistics about retries on the server,
especially on a per client basis. (Note that in most NFS
implementations, the client keeps these kind of statistics, but not on
a per server basis). Your method of using the xid amounts to the server
unilaterally enforcing a format to a client generated cookie. This
violates the current NFS spec, if not the RPC spec. If you would like
to see this in the next NFS protocol spec, you ought to contact the NFS
people at Sun. The email address for sending comments on the next
protocol (version 3), is sun!cs, or cs@sun.com.

>William Roberts         ARPA: liam@cs.qmc.ac.uk  (gw: cs.ucl.edu)
>Queen Mary College      UUCP: liam@qmc-cs.UUCP
>LONDON, UK              Tel:  01-975 5250

	-Mike Eisler
	{sun,uunet}!elxsi!mre