[comp.sys.apollo] Registry unavailable on DN 10000

zeleznik%cs.utah.edu@wasatch.utah.edu (Mike Zeleznik) (08/17/89)

We have a 10000 and a 3500 on an ethernet, running one master and one
slave registry (10.1 on both machines).

** A number of times each day, when trying to login in to the 10K, the
system complains that the network registry is unavailable, and must use the
local one.  If we try a few times, it will eventually suceed.  This happens
if the 10K is running the master or the slave.

Has anyone seen this? Are the registries locked up when the master/slave
are resyncing?  I can't see why they couldn't be at least readable.

** Also, does anyone (HP/Apollo?) know if there are plans to replicate
the master registry?  In a big net, it seems a bit much to expect the
single master to be available in order to make any updates (or am I missing
something).

Thanks in advance,
Mike

  Michael Zeleznik              Computer Science Dept.
                                University of Utah
  zeleznik@cs.utah.edu          Salt Lake City, UT  84112
                                (801) 581-5617

majka@moose.cs.ubc.ca (Marc Majka) (08/17/89)

Mike Zeleznik writes:
>We have a 10000 and a 3500 on an ethernet, running one master and one
>slave registry (10.1 on both machines).
>
>** A number of times each day, when trying to login in to the 10K, the
>system complains that the network registry is unavailable, and must use the
>local one.  If we try a few times, it will eventually suceed.  This happens
>if the 10K is running the master or the slave.

We currently have a DN10000, two 3500s, and a 4000.  The DN10000 has the
master registry.  We have the same kind of trouble you describe, although
things lock up about once a day for us.  I had a slave registry running on
one of the 3500s, and could cause the problem at will by just trying to look
at the slave registry from the 10000.  It seems to completely freeze NCS:
the 3500 loses ALL communications with the outside world.  Going down to a
phase 2 shell (dm "ex" command), and restarting (")go") does not fix the
problem.  Only a complete shutdown and reboot gets things back to normal.
In dispair, I killed off the slave registry, and stopped running the location
broker daemons everywhere except on the 10000.  The 3500s *still* freeze up
daily.  I reported my troubles to the hot line, but to no avail.

I suspect that the problem is much deeper than the registries.  I hope
it is a bug in NCS (Network Crash System :-) on the DN10000, as I am about to
install a network of 68 3500s and 3000s, and I will go insane if the same thing
hits me there.  SR10.1.p fixed a *lot* of bugs which were in 10.0.p.  I suspect
that there are still a bunch of them in 10.1.

I understand that 10.2 will have a notion of "domains" which will allow
different master registries on a network to co-exist.  Has anyone seen it
running yet?  I would like to know how and if it works.  We live on an
extended ethernet which connects 3 universities and several research
institutions.  Having a single master registry on this network is an
administrative impossibility.

---
Marc Majka
Computer Science System Manager, University of British Columbia