parker@waters.mpr.ca (Ross Parker) (04/20/89)
In article <278@kubix.UUCP> mvw@kubix.UUCP (Maarten van Wijk) writes: > >VAX 11/750 with DEUNA controller and Ultrix 3.0. > >When I NFS mount a disk of the VAX on a SUN 3/50 and start some compilations >on that disk, after some time the VAX hangs. The system is completely >dead and no messages appear on the console. >After forcing a crashdump and looking at the core I get the impression >one of the nfs daemons has taken over the CPU. The output from ps -axk >of the dump looks like this: > ... ... > >In a normal state the nfs daemons consume about the same cpu time. > >Has anybody experienced something like this or can give a solution >to the problem. Yes! We've been having a problem with NFS that appears to be caused by PCs on our network (using Sun's PC-NFS). Every once in a while the nfs daemons on one of our microvaxes (Ultrix 2.2 or 2.3) will just go bananas and eat up most of the CPU. We run 8 nfs daemons, and for the space of about 5 minutes (sometimes less), each will chew up about 10 percent of the CPU. This drives the load up to a level where everyone has to sit and wait for this to die down before they can work again. We think that this is happening when someone prints a file from a PC (but certainly not every time someone prints a file). It happens to all four microvaxes that we have PCs connected to. If anyone has any ideas, I'd certainly like to hear them!! DEC support is clueless so far. Ross Parker uunet!ubc-cs!mpre!parker | Microtel Pacific Research Ltd. | You can't erase the dream, Burnaby, B.C., | you can only wake me up... Canada, eh? |
david@ms.uky.edu (David Herron -- One of the vertebrae) (04/24/89)
In article <1659@eric.mpr.ca> parker@waters.UUCP (Ross Parker) writes: >In article <278@kubix.UUCP> mvw@kubix.UUCP (Maarten van Wijk) writes: >Yes! We've been having a problem with NFS that appears to be caused by >PCs on our network (using Sun's PC-NFS). Every once in a while the >nfs daemons on one of our microvaxes (Ultrix 2.2 or 2.3) will just go >bananas and eat up most of the CPU. We run 8 nfs daemons, and for the >space of about 5 minutes (sometimes less), each will chew up about >10 percent of the CPU. This drives the load up to a level where everyone >has to sit and wait for this to die down before they can work again. ... >If anyone has any ideas, I'd certainly like to hear them!! DEC support >is clueless so far. We have the same sort of problem .. about 5 or 6 times a week one of our uVaxIIen will lock up as you describe. We do not run PC-NFS but we do have some Sun's (v4 of SunOS) and a Sequent (v3.?? of Dynix) and all these guys share NFS back and forth. We are at v3 of Ultrix. DEC support is also clueless. Something which just occurred to me ... I believe that most of our uVaxIIen have DEQNA's rather than DELQA's in 'em. -- <- David Herron; an MMDF guy <david@ms.uky.edu> <- ska: David le casse\*' {rutgers,uunet}!ukma!david, david@UKMA.BITNET <- By all accounts, Cyprus was covered with trees at one time <- -- Until they discovered Bronze
steved@longs.LANCE.ColoState.Edu (Steve Dempsey) (04/24/89)
> In article <1659@eric.mpr.ca> parker@waters.UUCP (Ross Parker) writes: > >In article <278@kubix.UUCP> mvw@kubix.UUCP (Maarten van Wijk) writes: > >Yes! We've been having a problem with NFS that appears to be caused by > >PCs on our network (using Sun's PC-NFS). Every once in a while the > >nfs daemons on one of our microvaxes (Ultrix 2.2 or 2.3) will just go > >bananas and eat up most of the CPU. We run 8 nfs daemons, and for the > >space of about 5 minutes (sometimes less), each will chew up about > >10 percent of the CPU. This drives the load up to a level where everyone > >has to sit and wait for this to die down before they can work again. > ... > >If anyone has any ideas, I'd certainly like to hear them!! DEC support > >is clueless so far. > > We have the same sort of problem .. about 5 or 6 times a week one > of our uVaxIIen will lock up as you describe. We do not run PC-NFS > but we do have some Sun's (v4 of SunOS) and a Sequent (v3.?? of Dynix) > and all these guys share NFS back and forth. We are at v3 of Ultrix. > All this talk of stuck NFS servers, etc. sounds very familiar. We have quite the variety of hardware and software: Vax780's, '730, uVaxII, '3600's, 3200's, SUN3/50's, and many VS2000's; most running Ultrix2.2, the '780's running 4.3BSD+XINU. Ethernet is DELQA on the newer machines along with Proteon P1100's (proNET-10 ring). Every machine mounts at least one remote file system, and some make 2 or 3 gateway hops to get there. Machines on the same physical net do just fine. Different gateways seem to cause different problems: lots of timeouts, but they reasonable return (a few seconds), some LONG timeouts, and some just hang forever. Usually the client hangs, but sometimes the server all but locks up as described above. So what causes all this? Beats me, but our solution is to fix the read and write size to something smaller than a packet. That's options rsize=xxxx,wsize=xxxx in /etc/fstab. We chose 1024 because both proNET and ethernet tcp/ip packets are a few hundred bytes larger than 1K. All our NFS problems seem to have disappeared. Of course this solution was discovered completely by trial and error (and error and .... :-) Steve Dempsey, Center for Computer Assisted Engineering Colorado State University, Fort Collins, CO 80523 +1 303 491 0630 INET: steved@longs.LANCE.ColoState.Edu, dempsey@handel.CS.ColoState.Edu UUCP: boulder!ccncsu!longs.LANCE.ColoState.Edu!steved, ...!ncar!handel!dempsey
grr@cbmvax.UUCP (George Robbins) (04/24/89)
In article <11582@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes: > In article <1659@eric.mpr.ca> parker@waters.UUCP (Ross Parker) writes: > >In article <278@kubix.UUCP> mvw@kubix.UUCP (Maarten van Wijk) writes: > >Yes! We've been having a problem with NFS that appears to be caused by > >PCs on our network (using Sun's PC-NFS). Every once in a while the > >nfs daemons on one of our microvaxes (Ultrix 2.2 or 2.3) will just go > >bananas and eat up most of the CPU. We run 8 nfs daemons... ... > >If anyone has any ideas, I'd certainly like to hear them!! DEC support > >is clueless so far. > > We have the same sort of problem .. ... > DEC support is also clueless. Well, I was about to suggest that you blast the offending daemon with at quit signal and see if you could get a useful "core" file, which would at least give some clue as to what the daemon thought is was up to. Unfortunatly, the object is stripped, which makes debugging all but impossible. Still, you might give it a shot and send DEC the data. See "man signal" for things that will elict a dump, they probably don't try to catch all of them. Is there any hope of getting DEC to put unstripped objects on the distribution tape as an optional file? They seem to be limiting both the customers and their own ability to diagnose problems by shipping only the stripped versions... We run 4 biod deamons here on a 785 2.2, and I haven't noticed the problem you mention. On the other hand, NFS use here isn't very intensive and I might not even notice an occasional "lockup" as long as rn still works good. 8-) Oh yea, no PC-NFS (yet) and Sun-2's running 3.x, a Sun-4 running that interiem release and a bunch of Amiga's running Ameristar's NFS package. -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@uunet.uu.net Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
abstine@sun.soe.clarkson.edu (Arthur Stine) (04/24/89)
>In article <278@kubix.UUCP> mvw@kubix.UUCP (Maarten van Wijk) writes: >Yes! We've been having a problem with NFS that appears to be caused by >PCs on our network (using Sun's PC-NFS). Every once in a while the >nfs daemons on one of our microvaxes (Ultrix 2.2 or 2.3) will just go >bananas and eat up most of the CPU. We run 8 nfs daemons... >If anyone has any ideas, I'd certainly like to hear them!! DEC support >is clueless so far. >We have the same sort of problem .. > .... > DEC support is also clueless. > Well, I'm not sure if its a DEC specific problem. Our Sun servers exhibit the same behaviour. Last nite one of them went up to a load ave of 51 ! before I shut it down. DEC is probably using basically the same NFS code, so I suspect that there are just some latent bugs which are brought out by the presence of things like PCNFS on the net. art stine sr network engineer clarkson u
jim@maxwell.cs.strath.ac.uk (Jim Reid) (04/25/89)
In article <6677@cbmvax.UUCP> grr@cbmvax.UUCP (George Robbins) writes: >Well, I was about to suggest that you blast the offending daemon with >at quit signal and see if you could get a useful "core" file, which >would at least give some clue as to what the daemon thought is was up >to. Unfortunatly, the object is stripped, which makes debugging all >but impossible. This would not be much help, even if the daemons were unstripped. Both nfsd and biod are trivial programs - they execute only kernel code apart from a small amount of initialisation at start up. All nfsd and biod do is detach from their control tty and then invoke a system call that NEVER returns. This puts the process into kernel mode, where it runs the kernel's NFS client or server code. Getting a core dump is not much use unless you know how to get hold of the kernel stack inside the dump's u. area and then map that with the kernel's symbol table [with the kernel sources by your side for reference.... :-)]. You could get the daemon's stack backtrace much easier using adb on /vmunix and /dev/kmem. Even then, that might be of little use if the daemon is making repeated function calls and so all you get is a snapshot of the daemon's kernel activity. Jim ARPA: jim%cs.strath.ac.uk@ucl-cs.arpa, jim@cs.strath.ac.uk UUCP: jim@strath-cs.uucp, ...!uunet!mcvax!ukc!strath-cs!jim JANET: jim@uk.ac.strath.cs "!rof si ver tahw s'taht oS"
eric@pprg.unm.edu (Eric Engquist [CoE]) (04/26/89)
We were seeing a simular problem. About once a week our vax would go crazy with nfs loads. The problem was discovered to be find from machines that mounted nfs partitions from a vax. The source machine would go do the find from say /usr. If all machine nfs something such as /usr/man the finds from remote machines would all bang on the ultrix /usr/man partition. Hence if you do find's, do them carefully. If you are on a sun use -prune and -fstype options. -Eric Engquist UNM College of Engr. eric@sybil.unm.edu
guy@auspex.auspex.com (Guy Harris) (04/26/89)
>Well, I was about to suggest that you blast the offending daemon with >at quit signal and see if you could get a useful "core" file, Unless DEC has put the NFS server into user mode, you won't get a useful "core" file; NFS daemons in UNIX systems tend to run in the kernel, and have no user-mode data or stack segments, and thus you won't get a very interesting "core" file.
parker@waters.mpr.ca (Ross Parker) (05/03/89)
Oops!!! I was the original poster of the 'nfs daemon blocks system' discussion, and we lost our news link just as I started seeing some replies. It's back up now, and I would like to throw myself on the mercy of the net and ask that either people re-post any helpful replies, or (better still) if some kind soul could e-mail me the relevant replies, I'd be eternally grateful!!! Thanks! Ross Parker uunet!ubc-cs!mpre!parker | Microtel Pacific Research Ltd. | You can't erase the dream, Burnaby, B.C., | you can only wake me up... Canada, eh? |