[comp.sys.apollo] rgyd problems again

Markku.Savela@tel.vtt.fi (VTT/TEL) (12/24/90)

  I had had my lone Apollo running without a boot well over two
months and decided to boot it to clean out some hanging FIN_WAIT_2
tcp connections. Thought it would do good for it...

  Boot seemed to go well and system worked. But, ... sometimes
or for some operations system just seemed to freeze for a while
or just be awfully sluggish.

  llbd, glbd and rgyd seemed to be running fine, but then I tried
edrgy, and it didn't start (unfortunately didn't save the error
message).  Thougth, aha, maybe rgyd is trashed and killed it...
Shouldn't have done that, because when trying to restart it I
get:

> Registry: Fatal Error - Cannot register server replica interface - 0X1C010001 -
> communications failure (network computing system/RPC runtime)
> Registry: Fatal Error - 0X9010006 - IOT instruction fault (UNIX/signal

  Arghh! I'm no apollo expert, so I just rebooted the machine again.
This time "rgyd" didn't start even on boot. It didn't complain when
I started it manually, but it seems non-functional (hmm.. just now
when I tried, it seems to have gotten something ok, could "su" again
normally, but still cannot run "edrgy", I get...

># edrgy
>?(edrgy)  Unable to open the registry. - Registry server unavailable (RGYC/Serve
>r)
>#

   Any hints where the trouble is?

regards,
--
Markku Savela (savela@tel.vtt.fi), Technical Research Centre of Finland
Telecommunications Laboratory, Otakaari 7 B, SF-02150 ESPOO, Finland

troy@plod.cbse.unsw.oz.au (Troy Rollo) (12/24/90)

From article <5309@hemuli.tik.vtt.fi>, by Markku.Savela@tel.vtt.fi (VTT/TEL):

Markku.Savela>   Arghh! I'm no apollo expert, so I just rebooted the machine again.
Markku.Savela> This time "rgyd" didn't start even on boot. It didn't complain when
Markku.Savela> I started it manually, but it seems non-functional (hmm.. just now
Markku.Savela> when I tried, it seems to have gotten something ok, could "su" again
Markku.Savela> normally, but still cannot run "edrgy", I get...


Check the global location database...ie.

/etc/ncs/lb_admin
lb_admin> domain global
lb_admin> l

This should give a list of registries. If the list is incorrect, or, worse 
still, if you get an unexpected machine holding the known glb, your problem is
one of the now famous location broker problems.

How to fix:

	First, make sure you start the location broker with -li dds (normally
the correct option, meaning limit *to* domain distrbuted systems).
Then create a new file, /etc/ncs/glb_site.txt, containing a list of machines,
one machine per line, where you can expect to find your glb. This will
probably be dds://machine_name

Make it publicly readable (probably not necessary, but it doesn't hurt),
and reboot. This should prevent anybody else from throwing unwanted location
broker stuff at you.
___________________________________________________________
troy@mr_plod.cbme.unsw.oz.au	 
Fascist comments deleted for the duration of the Gulf Crisis