[comp.unix.ultrix] NFS problems under Ultrix, Risc Ultrix

hurf@batcomputer.tn.cornell.edu (Hurf Sheldon) (10/27/89)

NFS under Ultrix and Risc Ultrix 3.0, 3.1 Problems



	We have experienced some very marked slowdowns of nfs
	traffic on two separate networks. One case being a uVaxIII
	serving a DS3100 user space only -not /usr-, (and being 
	served by for one directory)
	Both systems 3.1 (on a fairly loaded network, delqa in the uVaxIII)
	Server is lightly loaded nfs speaking - runs nsfd 8 load average
	rarely over 4. 8meg memory - usually some of the nfsd's are swapped out.

    
	And the other case a uVaxII serving 5 diskless vs2000's and
	4 underdisked DS3100's.  (100 meg rz's)
	All systems 3.0 (9 clients and the server are the only hosts -
	server has 2 deqna interfaces and acts as a gateway - )
	Server heavily loaded - runs nsfd 16 and often has load average
	in excess of ten and have seen it at 20+ - Always 6 or above.
	13 meg memory - even when the load averge is high swapping is
	at a minimum. Using top I can see all 16 nfsd's running and not
	swapped.

	Both cases seem related to the partition on the server going
	over 90% full as reported by df. Most of the served partitions
	are over 150 meg and with 10% minfree a reported 90% full would
	still leave ~30meg free. 

	Both cases are evidenced by 'nfs server so&so not responding'
	messages, sometimes timing out or coming back after a long (1-2mins)
	delay with an 'nfs server so&so ok' then repeating the process.
	Usually during a 'high thruput' job like compiling in the nfs
	directories but not always - sometimes a cp of a small file when
	other activity is at a minimum. Getting the disks below 80% full
	(as reported by df) made the problem go away.

	We will occaisionally get 'stale file handle' errors, cleared up
	by umount, mount of the affected partition. This has happened
	without the respective server going down or being rebooted.

	My questions:
	1: Diagnostics - what are the available resources for performance
			analysis and/or tuning of nfs file systems?	
			Besides iostat what can I use to profile disk 
			performance and usage?
	
	2: Set up schemes - is it better to have one 600meg disk served
	   as one partition? (I thought not and have it 150, 150, 300 - 
	obviously from a controller efficiency standpoint this is preferable)

	3: As the partion served by the DS3100 has the same problem I surmise
	   it is an nfs problem but what if anything can be done? - should I
	   bump minfree to 5% (or less) and keep the disks 80% full? It seems
	   the physical free space isn't the problem - the perceived free
	   space is.

	4: Manual or other refrences I may have missed to address the problem?

BTW: both uVaxen have Maxtor esdi disks - the III has a SI/Webster controller -
     the II a Dilog


Thanks in advance for any help/suggestions
Contributions of larger, ever larger disks accepted with trepidation...

hurf
-- 
     Hurf Sheldon			 Network: hurf@ionvax.tn.cornell.edu
     Lab of Plasma Studies		  Bitnet: hurf@CRNLION
     369 Upson Hall, Cornell University, Ithaca, N.Y. 14853  ph:607 255 7267
  I got a job in science; I bought a Porsche; Now, everyone takes me seriously.