rn@ap.co.umist.ac.uk (bob nutter) (07/20/90)
Hi! Can anyone shed light/point me in the right direction with this registry question? I'm running sr10.1 (with no patches) on a network, with 2 glbd's and 2 rgyd's. Occaisionally, on the node running the slave registry, the registry becomes unavailable, although it is still accessible from other nodes on the network. Indeed, the slave rgy is still accessible on other nodes. This obviously causes problems with lots of things such as su and lpr, and usually means I have to do an ex and go on the node, or restart rgyd using /etc/server -p. This doesn't make the registry available again for quite some time, and until it reappears, the same errors result. The node clocks are within a minute or so of each other, by the way. These are the errors I get: On doing a rgy_admin, I get the following error: 'Cannot locate in-service rgy replica, entry not found (ncs/rgy server)', and trying a 'set -h //phi' [the node which has the problem] causes the following: 'error reading replica list can't bind socket (ncs rpc runtime)'. I've got sr10.2, does installing that cure any similar problems? I know they recommend you to replace the sr10.1 rgyd with the sr10.2 one, but I tried that once during another registry disaster (it totally corrupted the registry during a 'cvtrgy -from10to9 ...', only managed to recover it by using a backed-up /sys/registry/rgy_data/unix/passwd with import_passwd and then having to reset *everybodies* acls!), and it just kept falling over. This doesn't happen on the master registry node at all. has anyone else experienced problems of this kind, and has anyone got any suggestions? Please reply via email, etc... (please, this is driving me mad!!!) bob "baffled" nutter ------------------------------------------------------------------------ --------- bob nutter, computer officer | I would have let it lie, UMIST dept of computation | -Mild mannered, yes, po box 88 manchester m60 1qd uk | But I *didn't* let it lie! tel:+44 61 200 3386 | -Vic Reeves Big Night Out email:rn@ap.co.umist.ac.uk |
krowitz@RICHTER.MIT.EDU (David Krowitz) (07/24/90)
Sounds like you global location broker database is whacked. Try using /etc/ncs/drm_admin to check and merge your master glbd database and any replicas you have (or had previously) been running. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter.mit.edu@eddie.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)