holtz@zonker.cascade.carleton.ca (Neal Holtz) (10/09/90)
I am sure this question has been asked (and answered) before, but I can't find it in my archives... Configuration: 1 - DN4500, disked 3 - DN3500s, disked 4 - DN 2500s, diskless - one booted off each disked node SR10.2 Aegis & BSD 1 Master registry, on DN4500, no replications DN4500 runs glbd, llbd, rgyd each DN3500 runs llbd Problem: With time, the registry server becomes unavailable on the DN3500's and users are logged in using the local registries. This seems to take the form of a gradual rot, with more of the DN3500s becoming 'serverless'. Also, of course, '/etc/passwd' is unreadable (doesn't exist). However, the registries are still available to the diskless DN2500s booted off the DN3500s, and /etc/passwd is OK. And the DN3500 has no trouble seeing the files on and otherwise communicating with the rgy server node. Attempted fixes: We have rebooted everything, we have manually restarted various servers, and looked at a few log files in in /usr/adm. Nothing worked, and no clues, either. Debugging via Apollos hot line seems to take a long time, as well. The clocks are set to within a few seconds. Details: Processes on the Master Registry node (DN4500): 1 > ps -ax PID TTY STAT TIME COMMAND 1 ? S < 0:23 /etc/init 2 ? R 1461:10 null 3 ? S 0:51 purifier 4 ? S 0:29 purifier 5 ? S 0:50 unwired_dxm 6 ? S 0:00 pinger 7 ? S 0:04 netreceive 8 ? S 0:39 netpaging 9 ? S 0:18 wired_dxm 10 ? S 0:38 netrequest 91 ? S 5:10 /etc/tcpd 96 ? S 1:25 /etc/routed -f -q 99 ? S 0:00 /etc/inetd 102 ? S 0:00 /etc/ncs/llbd 104 ? S 1:49 /etc/ncs/glbd 107 ? S < 2:02 /etc/rgyd 112 ? S 0:04 /sys/spm/spm 115 ? S 0:03 /sys/net/netman 117 ? S 0:03 /sys/ns/ns_helper 120 ? S 0:08 /sys/alarm/alarm_server -disk 98 -msg -w 0 0 550 100 - 122 ? S 0:01 /sys/mbx/mbx_helper 125 ? S < 0:08 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config 127 ? S < 8:29 dm processes on the serverless DN3500: Connected to node 19A9D "//thorin" login: Password: Using local registry. Can't use network registry: - Registry server unavailable (from RGYC / Server) 1 > ps -ax PID TTY STAT TIME COMMAND 1 ? S < 0:28 /etc/init 2 ? R 158:32 null 3 ? S 0:05 purifier 4 ? S 0:00 purifier 5 ? S 0:09 unwired_dxm 6 ? S 0:00 pinger 7 ? S 0:00 netreceive 8 ? S 0:23 netpaging 9 ? S 0:02 wired_dxm 10 ? S 0:13 netrequest 92 ? S 0:00 /etc/ncs/llbd 97 ? S 0:01 /sys/spm/spm 99 ? S 0:03 /sys/net/netman 101 ? S 0:02 /sys/alarm/alarm_server -disk 98 -msg -w 0 0 550 100 -v 20 20 105 ? S 0:00 /sys/mbx/mbx_helper 107 ? S < 0:05 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config -D1 s+r- 109 ? S < 4:09 dm Perhaps I'll dig out my SR8 floppies and re-install :-( -- Prof. Neal Holtz, Dept. of Civil Eng., Carleton University, Ottawa, Canada Internet: holtz@civeng.carleton.ca Tel: (613)788-5797 Fax: (613)788-3951
goldfish@CONCOUR.CS.CONCORDIA.CA (-- Paul Goldsmith) (10/10/90)
Your problem is probably related to interaction between the registry daemons and "tcpd". See your "ncs" manual (a thin thing you porobably won't even remember seeing) and lookup the part on selecting TCP versus the internal Apollo carrier protocol. If "tcpd" is up when the glbd & llbd start, they use tcp as a carrier. THIS DOESN'T WORK VERY WELL. The text below should offer some suggestions. Run the same "grep" on your system and compare the output. I have several extra lines. Check the manual, Complain like hell on the hotline, and make sure you understand the changes before going at it. concour,goldfish 215 grep lbd /etc/rc* /etc/rc:# llbd must fork and the parent exit before other NCS servers may /etc/rc:# be run. Do not use "/etc/ncs/llbd &" /etc/rc:# that the llbd will listen to. e.g., /etc/rc:# /etc/ncs/llbd -li dds /etc/rc:# will limit the llbd to listen on the dds protocol family - forcing it to /etc/rc:if [ -f /etc/ncs/llbd -a -f /etc/daemons/llbd ]; then /etc/rc: (echo " llbd\c" >/dev/console) /etc/rc:# /etc/ncs/llbd /etc/rc: /etc/ncs/llbd -li dds /etc/rc:# /etc/ncs/llbd is needed to run the glbd. /etc/rc:# that the glbd will listen to. e.g., /etc/rc:# /etc/ncs/glbd -li ip /etc/rc:# will limit the llbd to listen on the ip protocol family - forcing it to /etc/rc:if [ -f /etc/ncs/glbd -a -f /etc/daemons/glbd -a $LLBD_ENABLED = true ]; then /etc/rc: (echo " glbd\c" >/dev/console) /etc/rc:# /etc/ncs/glbd & /etc/rc: /etc/ncs/glbd -li dds & /etc/rc:# determines which nodes these are). rgyd requires that /etc/ncs/llbd /etc/rc:# /etc/ncs/llbd is needed to run lpd. concour,goldfish 216 -- Paul Goldsmith (goldfish) (514) 848-3031 <goldfish@concour.cs.concordia.ca> (Shirley Maclaine told me there would be LIFETIMES like this)