bill@wrangler.WLK.COM (Bill Kennedy) (04/27/91)
This is the second ISC NFS anomaly that has me baffled, the other one was about a runaway portmap process. This one is more of a puzzle since two similarly configured systems exhibit two different behaviors. To try and track down some problems between NCR's NFS and Interactive I brought up a third system, carpet. NFS seems to start just fine and it seems to work OK but rexd likes to die when it's queried. With all of the appropriate daemons running on carpet (the newer box) I run rpcinfo -u carpet 100003 2 to get NFS status on carpet and I get rpcinfo: RPC: Program not registered program 100003 version 2 is not available And then a few minutes later on carpet's console I get Cannot register service: RPC: Timed out rexd: service rpc register: error and a ps shows that the rexd daemon has exited. I can start the daemon again by hand and it will continue to run until someone or something talks to it and it dies again. The elder system, ssbn, seems to work just fine, with the same command (but to ssbn) the reply is program 100003 version 2 ready and waiting If I start and stop NFS and verify that the daemons are all running on carpet and run a rpcinfo -p carpet to see what's registered I get program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper and nothing else. The same time out message eventually appears on carpet's console. I have commented out the pcnfsd, statd, and lockd stuff from the NFS startup script but that's true of both machines so I don't think it affects anything. The other difference is that carpet has 6MB of memory while ssbn has 12MB. That shouldn't make any difference either, but if I was so darned smart, I wouldn't be asking all these questions... Does anyone have any idea what might be happening? Thanks, -- Bill Kennedy uucp {att,cs.utexas.edu,pyramid!daver}!ssbn.wlk.com!bill internet bill@ssbn.WLK.COM or ssbn!bill@attmail.COM
bill@ssbn.WLK.COM (Bill Kennedy) (04/29/91)
In article <677@wrangler.WLK.COM>, I wrote: [ problems with rexd exiting prematurely and generally antisocial behavior with NFS running ... ] Jim Deitch (jdeitch@jadpc.cts.com) encountered (and reported without acknowledgement) the same problem and here's the workaround. If SL/IP is configured and up before NFS starts rexd will behave as described in my original article. To make rexd behave you must either comment it out of /etc/netd.cf and start it by hand with ifconfig after you start NFS or you must ifconfig it down, start NFS and ifconfig it back up. I think that ISC deserves a twist of the tail for this. It's just a nuisance, easily worked around, but they certainly should have ack'd Jim's report when they got it. Maybe they'll acknowledge the problem if the workaround is posted to the net. In article <676@wrangler.WLK.COM>, I wrote: [ runaway portmap processes when receiving an rpcinfo broadcast ... ] This was confirmed by several people and ISC says it will be fixed in the next release (do we need a FITNR acronym?). There's a clumsy workaround but just avoid rpcinfo broadcasts if you can. The workaround is to have a small enough number of queues (NQUEUE) allocated so that you'll run out of queues before you run out of process slots. The portmap processes are rather easily killed -9 when it happens if you have a process slot left to fork off a kill. If you do get jammed up such that you can't fork off a kill and you know the parent portmap PID you can exec a kill and log back in. I got one response saying that the phenomenon vanished as suddenly and mysteriously as it appeared after he recompiled portmap. I don't know if he's referring to the XDR source in section 5.7 of TFM or portmap.c but I'll type in the XDR source and a kind soul sent me a portmap.c, I'll try each and report if either/both produce the desired result. If it does I guess we don't need FITNR or at least those of us on the net don't... -- Bill Kennedy internet bill@ssbn.WLK.COM or ssbn!bill@attmail.COM uucp {att,cs.utexas.edu,pyramid!daver}!ssbn.wlk.com!bill