[comp.sys.apollo] registry server unavailable problem at sr10.1

rn@ap.co.umist.ac.uk (bob nutter) (07/20/90)

Hi!

Can anyone shed light/point me in the right direction with this registry
question?

I'm running sr10.1 (with no patches) on a network, with 2 glbd's and 2
rgyd's. Occaisionally, on the node running the slave registry, the
registry becomes unavailable, although it is still accessible from other
nodes on the network. Indeed, the slave rgy is still accessible on other
nodes. This obviously causes problems with lots of things such as su and
lpr, and usually means I have to do an ex and go on the node, or restart
rgyd using /etc/server -p. This doesn't make the registry available
again for quite some time, and until it reappears, the same errors
result. The node clocks are within a minute or so of each other, by the way.

These are the errors I get:

On doing a rgy_admin, I get the following error:
 'Cannot locate in-service rgy replica, entry not found (ncs/rgy server)', 

and trying a 'set -h //phi' [the node which has the problem] causes the
following:
 'error reading replica list can't bind socket (ncs rpc runtime)'.

I've got sr10.2, does installing that cure any similar problems? I know
they recommend you to replace the sr10.1 rgyd with the sr10.2 one, but I
tried that once during another registry disaster (it totally corrupted
the registry during a 'cvtrgy -from10to9 ...', only managed to recover
it by using a backed-up /sys/registry/rgy_data/unix/passwd with
import_passwd and then having to reset *everybodies* acls!), and it just
kept falling over. 

This doesn't happen on the master registry node at all. has anyone else
experienced problems of this kind, and has anyone got any suggestions?
Please reply via email, etc... (please, this is driving me mad!!!)

bob "baffled" nutter

------------------------------------------------------------------------
---------
bob nutter, computer officer    |  I would have let it lie,
UMIST dept of computation       |  -Mild mannered, yes,
po box 88 manchester m60 1qd uk |  But I *didn't* let it lie!
tel:+44 61 200 3386             |      -Vic Reeves Big Night Out
email:rn@ap.co.umist.ac.uk      |

krowitz@RICHTER.MIT.EDU (David Krowitz) (07/24/90)

Sounds like you global location broker database is whacked. Try
using /etc/ncs/drm_admin to check and merge your master glbd 
database and any replicas you have (or had previously) been
running. 


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)