bill@wrangler.WLK.COM (Bill Kennedy) (04/27/91)
This is the second ISC NFS anomaly that has me baffled, the other
one was about a runaway portmap process. This one is more of a
puzzle since two similarly configured systems exhibit two different
behaviors.
To try and track down some problems between NCR's NFS and Interactive
I brought up a third system, carpet. NFS seems to start just fine
and it seems to work OK but rexd likes to die when it's queried. With
all of the appropriate daemons running on carpet (the newer box) I run
rpcinfo -u carpet 100003 2 to get NFS status on carpet and I get
rpcinfo: RPC: Program not registered
program 100003 version 2 is not available
And then a few minutes later on carpet's console I get
Cannot register service: RPC: Timed out
rexd: service rpc register: error
and a ps shows that the rexd daemon has exited. I can start the daemon
again by hand and it will continue to run until someone or something
talks to it and it dies again. The elder system, ssbn, seems to work
just fine, with the same command (but to ssbn) the reply is
program 100003 version 2 ready and waiting
If I start and stop NFS and verify that the daemons are all running on
carpet and run a
rpcinfo -p carpet to see what's registered I get
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
and nothing else. The same time out message eventually appears on carpet's
console.
I have commented out the pcnfsd, statd, and lockd stuff from the NFS
startup script but that's true of both machines so I don't think it
affects anything. The other difference is that carpet has 6MB of memory
while ssbn has 12MB. That shouldn't make any difference either, but if
I was so darned smart, I wouldn't be asking all these questions... Does
anyone have any idea what might be happening? Thanks,
--
Bill Kennedy uucp {att,cs.utexas.edu,pyramid!daver}!ssbn.wlk.com!bill
internet bill@ssbn.WLK.COM or ssbn!bill@attmail.COMbill@ssbn.WLK.COM (Bill Kennedy) (04/29/91)
In article <677@wrangler.WLK.COM>, I wrote: [ problems with rexd exiting prematurely and generally antisocial behavior with NFS running ... ] Jim Deitch (jdeitch@jadpc.cts.com) encountered (and reported without acknowledgement) the same problem and here's the workaround. If SL/IP is configured and up before NFS starts rexd will behave as described in my original article. To make rexd behave you must either comment it out of /etc/netd.cf and start it by hand with ifconfig after you start NFS or you must ifconfig it down, start NFS and ifconfig it back up. I think that ISC deserves a twist of the tail for this. It's just a nuisance, easily worked around, but they certainly should have ack'd Jim's report when they got it. Maybe they'll acknowledge the problem if the workaround is posted to the net. In article <676@wrangler.WLK.COM>, I wrote: [ runaway portmap processes when receiving an rpcinfo broadcast ... ] This was confirmed by several people and ISC says it will be fixed in the next release (do we need a FITNR acronym?). There's a clumsy workaround but just avoid rpcinfo broadcasts if you can. The workaround is to have a small enough number of queues (NQUEUE) allocated so that you'll run out of queues before you run out of process slots. The portmap processes are rather easily killed -9 when it happens if you have a process slot left to fork off a kill. If you do get jammed up such that you can't fork off a kill and you know the parent portmap PID you can exec a kill and log back in. I got one response saying that the phenomenon vanished as suddenly and mysteriously as it appeared after he recompiled portmap. I don't know if he's referring to the XDR source in section 5.7 of TFM or portmap.c but I'll type in the XDR source and a kind soul sent me a portmap.c, I'll try each and report if either/both produce the desired result. If it does I guess we don't need FITNR or at least those of us on the net don't... -- Bill Kennedy internet bill@ssbn.WLK.COM or ssbn!bill@attmail.COM uucp {att,cs.utexas.edu,pyramid!daver}!ssbn.wlk.com!bill