[comp.protocols.nfs] Mountd loops in case of multiple mount requests.

cliff@SSD.CSD.HARRIS.COM (Cliff Van Dyke) (05/25/90)

In article <2143@inews.intel.com> kgarimel@hobbes.intel.com (Krishna Garimella) writes:
>On a file server, the mountd becomes inoperative in certain cases,
>when it is hit with many simultaneous mount requests from NFS clients.
>We have different flavours of clients(Suns, GPXs, 3100s, SISOs ...).
>This situation arises from a mass reboot of the clients(eg Power 
>glitch/faliure)
>
>Even when the mountd is killed and restarted, it "caters" to one (or two)
>mount requests and then goes into a loop. The clients give timeout
>messages.

I've seen a substantially similar problem (but with ypserv) which I traced
down to an unfortunate implementation of the UDP version of RPC. (This was a
straight port of Sun's version 3.2 reference port.) The problem occurred when
oodles of processes simultaneously accessed ypserv. ypserv would
respond to the requests in FIFO order. Meanwhile, the clients got
tired of waiting, timed out, and requested again. The server was so
busy handling requests that had already timed out that it didn't
have a chance to handle the new requests.

The problem is significantly alleviated in Sun's 4.0 version of the
reference port. Each retry done by the client is backed off by a
factor of 2 reducing the load on the server. Do you know which rev is
being used by your various clients?

In general the algorithms used for the UDP version of RPC in the
applications (e.g., ypserv and mounted) and the kernel (e.g., NFS and
lockd) leave much to be desired. I suspect some mechanism which uses
the history of previous performance of a server would prove to be most
beneficial.
--
Cliff Van Dyke                   cliff@ssd.csd.harris.com
Harris Computer Systems          ...!{uunet,novavax}!hcx1!cliff
2101 W. Cypress Creek Rd.
Ft. Lauderdale, FL 33309-1892    Tel: (305) 973-5349

liam@cs.qmw.ac.uk (William Roberts) (05/29/90)

In <4290@hcx1.SSD.CSD.HARRIS.COM> cliff@SSD.CSD.HARRIS.COM (Cliff Van Dyke) writes:

>In general the algorithms used for the UDP version of RPC in the
>applications (e.g., ypserv and mounted) and the kernel (e.g., NFS and
>lockd) leave much to be desired. I suspect some mechanism which uses
>the history of previous performance of a server would prove to be most
>beneficial.

The reason that the kernel stuff survives whereas the
application stuff dies is that ther user stuff is implemented
with the assumption that calls are non-idempotent (even if the
application writer knows full well that they are!).

Individual yp lookups would be happily served by allowing an
idempotent request option: instead of waiting longer each time
before retrying until eventually you give the server enough
time, you could also accept the first answer you get even if it
is beyond the timeout (this is what the kernel does for NFS
requests, for example). By all means make timeouts adaptive to
server load, but why waste good replies?

Does anyone know why the standard libraries make stream
connections to the portmapper?
-- 

William Roberts                 ARPA: liam@cs.qmw.ac.uk
Queen Mary & Westfield College  UUCP: liam@qmw-cs.UUCP
Mile End Road                   AppleLink: UK0087
LONDON, E1 4NS, UK              Tel:  071-975 5250 (Fax: 081-980 6533)

cs@Eng.Sun.COM (Carl Smith) (05/31/90)

In article <2285@sequent.cs.qmw.ac.uk>, liam@cs.qmw.ac.uk (William
Roberts) writes:
...
> Does anyone know why the standard libraries make stream
> connections to the portmapper?

	Most of the library functions (pmap_set, pmap_unset, pmap_getport,
pmap_rmtcall, and clnt_broadcast) use UDP.  Only pmap_getmaps uses TCP,
and that's because the size of the reply to a PMAPPROC_DUMP request might
well exceed the implementation's limit on the size of a datagram.

			Carl