holtz@zonker.cascade.carleton.ca (Neal Holtz) (10/09/90)
I am sure this question has been asked (and answered) before, but I can't find
it in my archives...
Configuration:
1 - DN4500, disked
3 - DN3500s, disked
4 - DN 2500s, diskless - one booted off each disked node
SR10.2 Aegis & BSD
1 Master registry, on DN4500, no replications
DN4500 runs glbd, llbd, rgyd
each DN3500 runs llbd
Problem:
With time, the registry server becomes unavailable on the DN3500's
and users are logged in using the local registries. This seems
to take the form of a gradual rot, with more of the DN3500s becoming
'serverless'.
Also, of course, '/etc/passwd' is unreadable (doesn't exist).
However, the registries are still available to the diskless
DN2500s booted off the DN3500s, and /etc/passwd is OK.
And the DN3500 has no trouble seeing the files on and otherwise
communicating with the rgy server node.
Attempted fixes:
We have rebooted everything, we have manually restarted various
servers, and looked at a few log files in in /usr/adm.
Nothing worked, and no clues, either. Debugging via Apollos hot line
seems to take a long time, as well.
The clocks are set to within a few seconds.
Details:
Processes on the Master Registry node (DN4500):
1 > ps -ax
PID TTY STAT TIME COMMAND
1 ? S < 0:23 /etc/init
2 ? R 1461:10 null
3 ? S 0:51 purifier
4 ? S 0:29 purifier
5 ? S 0:50 unwired_dxm
6 ? S 0:00 pinger
7 ? S 0:04 netreceive
8 ? S 0:39 netpaging
9 ? S 0:18 wired_dxm
10 ? S 0:38 netrequest
91 ? S 5:10 /etc/tcpd
96 ? S 1:25 /etc/routed -f -q
99 ? S 0:00 /etc/inetd
102 ? S 0:00 /etc/ncs/llbd
104 ? S 1:49 /etc/ncs/glbd
107 ? S < 2:02 /etc/rgyd
112 ? S 0:04 /sys/spm/spm
115 ? S 0:03 /sys/net/netman
117 ? S 0:03 /sys/ns/ns_helper
120 ? S 0:08 /sys/alarm/alarm_server -disk 98 -msg -w 0 0 550 100 -
122 ? S 0:01 /sys/mbx/mbx_helper
125 ? S < 0:08 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config
127 ? S < 8:29 dm
processes on the serverless DN3500:
Connected to node 19A9D "//thorin"
login:
Password:
Using local registry. Can't use network registry: - Registry server unavailable (from RGYC / Server)
1 > ps -ax
PID TTY STAT TIME COMMAND
1 ? S < 0:28 /etc/init
2 ? R 158:32 null
3 ? S 0:05 purifier
4 ? S 0:00 purifier
5 ? S 0:09 unwired_dxm
6 ? S 0:00 pinger
7 ? S 0:00 netreceive
8 ? S 0:23 netpaging
9 ? S 0:02 wired_dxm
10 ? S 0:13 netrequest
92 ? S 0:00 /etc/ncs/llbd
97 ? S 0:01 /sys/spm/spm
99 ? S 0:03 /sys/net/netman
101 ? S 0:02 /sys/alarm/alarm_server -disk 98 -msg -w 0 0 550 100 -v 20 20
105 ? S 0:00 /sys/mbx/mbx_helper
107 ? S < 0:05 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config -D1 s+r-
109 ? S < 4:09 dm
Perhaps I'll dig out my SR8 floppies and re-install :-(
--
Prof. Neal Holtz, Dept. of Civil Eng., Carleton University, Ottawa, Canada
Internet: holtz@civeng.carleton.ca Tel: (613)788-5797 Fax: (613)788-3951goldfish@CONCOUR.CS.CONCORDIA.CA (-- Paul Goldsmith) (10/10/90)
Your problem is probably related to interaction between the registry
daemons and "tcpd".
See your "ncs" manual (a thin thing you porobably won't even remember
seeing) and lookup the part on selecting TCP versus the internal
Apollo carrier protocol. If "tcpd" is up when the glbd & llbd start,
they use tcp as a carrier. THIS DOESN'T WORK VERY WELL.
The text below should offer some suggestions. Run the same "grep" on
your system and compare the output. I have several extra lines.
Check the manual, Complain like hell on the hotline, and make sure you
understand the changes before going at it.
concour,goldfish 215 grep lbd /etc/rc*
/etc/rc:# llbd must fork and the parent exit before other NCS servers may
/etc/rc:# be run. Do not use "/etc/ncs/llbd &"
/etc/rc:# that the llbd will listen to. e.g.,
/etc/rc:# /etc/ncs/llbd -li dds
/etc/rc:# will limit the llbd to listen on the dds protocol family - forcing it to
/etc/rc:if [ -f /etc/ncs/llbd -a -f /etc/daemons/llbd ]; then
/etc/rc: (echo " llbd\c" >/dev/console)
/etc/rc:# /etc/ncs/llbd
/etc/rc: /etc/ncs/llbd -li dds
/etc/rc:# /etc/ncs/llbd is needed to run the glbd.
/etc/rc:# that the glbd will listen to. e.g.,
/etc/rc:# /etc/ncs/glbd -li ip
/etc/rc:# will limit the llbd to listen on the ip protocol family - forcing it to
/etc/rc:if [ -f /etc/ncs/glbd -a -f /etc/daemons/glbd -a $LLBD_ENABLED = true ]; then
/etc/rc: (echo " glbd\c" >/dev/console)
/etc/rc:# /etc/ncs/glbd &
/etc/rc: /etc/ncs/glbd -li dds &
/etc/rc:# determines which nodes these are). rgyd requires that /etc/ncs/llbd
/etc/rc:# /etc/ncs/llbd is needed to run lpd.
concour,goldfish 216
-- Paul Goldsmith
(goldfish) (514) 848-3031 <goldfish@concour.cs.concordia.ca>
(Shirley Maclaine told me there would be LIFETIMES like this)