hurf@batcomputer.tn.cornell.edu (Hurf Sheldon) (10/27/89)
NFS under Ultrix and Risc Ultrix 3.0, 3.1 Problems We have experienced some very marked slowdowns of nfs traffic on two separate networks. One case being a uVaxIII serving a DS3100 user space only -not /usr-, (and being served by for one directory) Both systems 3.1 (on a fairly loaded network, delqa in the uVaxIII) Server is lightly loaded nfs speaking - runs nsfd 8 load average rarely over 4. 8meg memory - usually some of the nfsd's are swapped out. And the other case a uVaxII serving 5 diskless vs2000's and 4 underdisked DS3100's. (100 meg rz's) All systems 3.0 (9 clients and the server are the only hosts - server has 2 deqna interfaces and acts as a gateway - ) Server heavily loaded - runs nsfd 16 and often has load average in excess of ten and have seen it at 20+ - Always 6 or above. 13 meg memory - even when the load averge is high swapping is at a minimum. Using top I can see all 16 nfsd's running and not swapped. Both cases seem related to the partition on the server going over 90% full as reported by df. Most of the served partitions are over 150 meg and with 10% minfree a reported 90% full would still leave ~30meg free. Both cases are evidenced by 'nfs server so&so not responding' messages, sometimes timing out or coming back after a long (1-2mins) delay with an 'nfs server so&so ok' then repeating the process. Usually during a 'high thruput' job like compiling in the nfs directories but not always - sometimes a cp of a small file when other activity is at a minimum. Getting the disks below 80% full (as reported by df) made the problem go away. We will occaisionally get 'stale file handle' errors, cleared up by umount, mount of the affected partition. This has happened without the respective server going down or being rebooted. My questions: 1: Diagnostics - what are the available resources for performance analysis and/or tuning of nfs file systems? Besides iostat what can I use to profile disk performance and usage? 2: Set up schemes - is it better to have one 600meg disk served as one partition? (I thought not and have it 150, 150, 300 - obviously from a controller efficiency standpoint this is preferable) 3: As the partion served by the DS3100 has the same problem I surmise it is an nfs problem but what if anything can be done? - should I bump minfree to 5% (or less) and keep the disks 80% full? It seems the physical free space isn't the problem - the perceived free space is. 4: Manual or other refrences I may have missed to address the problem? BTW: both uVaxen have Maxtor esdi disks - the III has a SI/Webster controller - the II a Dilog Thanks in advance for any help/suggestions Contributions of larger, ever larger disks accepted with trepidation... hurf -- Hurf Sheldon Network: hurf@ionvax.tn.cornell.edu Lab of Plasma Studies Bitnet: hurf@CRNLION 369 Upson Hall, Cornell University, Ithaca, N.Y. 14853 ph:607 255 7267 I got a job in science; I bought a Porsche; Now, everyone takes me seriously.