cliff@SSD.CSD.HARRIS.COM (Cliff Van Dyke) (05/25/90)
In article <2143@inews.intel.com> kgarimel@hobbes.intel.com (Krishna Garimella) writes: >On a file server, the mountd becomes inoperative in certain cases, >when it is hit with many simultaneous mount requests from NFS clients. >We have different flavours of clients(Suns, GPXs, 3100s, SISOs ...). >This situation arises from a mass reboot of the clients(eg Power >glitch/faliure) > >Even when the mountd is killed and restarted, it "caters" to one (or two) >mount requests and then goes into a loop. The clients give timeout >messages. I've seen a substantially similar problem (but with ypserv) which I traced down to an unfortunate implementation of the UDP version of RPC. (This was a straight port of Sun's version 3.2 reference port.) The problem occurred when oodles of processes simultaneously accessed ypserv. ypserv would respond to the requests in FIFO order. Meanwhile, the clients got tired of waiting, timed out, and requested again. The server was so busy handling requests that had already timed out that it didn't have a chance to handle the new requests. The problem is significantly alleviated in Sun's 4.0 version of the reference port. Each retry done by the client is backed off by a factor of 2 reducing the load on the server. Do you know which rev is being used by your various clients? In general the algorithms used for the UDP version of RPC in the applications (e.g., ypserv and mounted) and the kernel (e.g., NFS and lockd) leave much to be desired. I suspect some mechanism which uses the history of previous performance of a server would prove to be most beneficial. -- Cliff Van Dyke cliff@ssd.csd.harris.com Harris Computer Systems ...!{uunet,novavax}!hcx1!cliff 2101 W. Cypress Creek Rd. Ft. Lauderdale, FL 33309-1892 Tel: (305) 973-5349
liam@cs.qmw.ac.uk (William Roberts) (05/29/90)
In <4290@hcx1.SSD.CSD.HARRIS.COM> cliff@SSD.CSD.HARRIS.COM (Cliff Van Dyke) writes: >In general the algorithms used for the UDP version of RPC in the >applications (e.g., ypserv and mounted) and the kernel (e.g., NFS and >lockd) leave much to be desired. I suspect some mechanism which uses >the history of previous performance of a server would prove to be most >beneficial. The reason that the kernel stuff survives whereas the application stuff dies is that ther user stuff is implemented with the assumption that calls are non-idempotent (even if the application writer knows full well that they are!). Individual yp lookups would be happily served by allowing an idempotent request option: instead of waiting longer each time before retrying until eventually you give the server enough time, you could also accept the first answer you get even if it is beyond the timeout (this is what the kernel does for NFS requests, for example). By all means make timeouts adaptive to server load, but why waste good replies? Does anyone know why the standard libraries make stream connections to the portmapper? -- William Roberts ARPA: liam@cs.qmw.ac.uk Queen Mary & Westfield College UUCP: liam@qmw-cs.UUCP Mile End Road AppleLink: UK0087 LONDON, E1 4NS, UK Tel: 071-975 5250 (Fax: 081-980 6533)
cs@Eng.Sun.COM (Carl Smith) (05/31/90)
In article <2285@sequent.cs.qmw.ac.uk>, liam@cs.qmw.ac.uk (William Roberts) writes: ... > Does anyone know why the standard libraries make stream > connections to the portmapper? Most of the library functions (pmap_set, pmap_unset, pmap_getport, pmap_rmtcall, and clnt_broadcast) use UDP. Only pmap_getmaps uses TCP, and that's because the size of the reply to a PMAPPROC_DUMP request might well exceed the implementation's limit on the size of a datagram. Carl