[comp.sys.sun] nfsd going crazy

tr@ctt.bellcore.com (05/23/89)

I have a Sun-3 server running SunOS version 4.0.1 with about 25 boot
clients.  I know this is pushing the limit, but for the time being it has
to do.

My problem is that starting about two or three weeks ago, the load average
went from about 2 to about 9 or 10.  It has stayed there, and I haven't
changed anything.  The culprit has been the copies of nfsd.  They are
racking up huge amounts of cpu time.  I changed them to run 16 copies
rather than 8 and that made it worse.

Anything I should check?  Performance has gotten pretty bad!

Tom Reingold                   |INTERNET:       tr@ctt.bellcore.com
Bell Communications Research   |UUCP:           bellcore!ctt!tr
444 Hoes La room 1E225         |PHONE:          (201) 699-7058 [work],
Piscataway, NJ 08854           |                (201) 287-2345 [home]

hedrick@geneva.rutgers.edu (Charles Hedrick) (06/03/89)

Problem: nfsd running a lot, high load

There may be bugs that could cause this.  Others may comment on that.
Might I'd start by looking for something more mundane.  Now and then one
of our clients gets a program that goes into a loop that does disk
accesses.  This will of course bombard its server with requests.

Try
  etherfind -p -proto udp

on the server. This will look at all UDP packets being sent to your host.
I bet you'll see a continuous stream from one particular machine.  Now go
there and do a "ps aux" and see if you don't find something peculiar.

rlk@think.com (Robert L. Krawitz) (06/03/89)

One of your clients has something running hard, very likely in an infinite
loop.  We had a server (4/280) that for a month was running very slowly
(no one particularly complained, at least not in earshot of me).  I
finally took a look at the situation, and discovered that a 3/140 had a
csh that had been looping all that time.

Unfortunately, tcpdump doesn't work on 4.0; that was (still is on our two
servers still running 3.5) a really useful tool for things like this.
Modulo that, you have to play around with netstat -r, rup, and other
random tools of that nature.

ames >>>>>>>>>  |	Robert Krawitz <rlk@think.com>	245 First St.
bloom-beacon >  |think!rlk				Cambridge, MA  02142
harvard >>>>>>  .	Thinking Machines Corp.		(617)876-1111

zjat02@uunet.uu.net (Jon A. Tankersley) (06/11/89)

Check for textedit's running without attached windows.  Textedit can be
put in an infinite loop.  This can kill nfsd on the server.  Something
about search/replace all with only one occurance.

-tank-
-tank-
#include <std/disclaimer.h>		/* nobody knows the trouble I .... */
tank@apctrc.trc.amoco.com    ..!uunet!apctrc!tank