[comp.sys.apollo] rgyd taking over the machine

cosc4fp@jetson.uh.edu (08/22/90)

On our master server where we have the registry and the glbd and llbd, we are
seeing that rgyd start running at 95% out of the blue.  This causes mail, and
printing services to be distrupted since communication seems to be broken.

Has anyone seen this wild rgyd?  We are running 10.2 ........

krowitz@RICHTER.MIT.EDU (David Krowitz) (08/22/90)

Yup, I've seen this before. The most usual cause was a broken, wedged, or outright
dead /etc/tcpd. Trying using /etc/ping to check if TCP/IP to the node is working.
If it is, then try using "telnet" or "rlogin" to make certain that you're getting
to the *correct* machine (we had, at one point, a machine with a screwed up host
table that kept stealing packets meant for the machine with the rgyd). Both
registry and printing services use NCS for communications, which in turn uses TCP
services.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

thompson@PAN.SSEC.HONEYWELL.COM (John Thompson) (08/22/90)

> On our master server where we have the registry and the glbd and llbd, we are
> seeing that rgyd start running at 95% out of the blue.  This causes mail, and
> printing services to be distrupted since communication seems to be broken.
> 
> Has anyone seen this wild rgyd?  We are running 10.2 ........
Yes.  I've also seen it with the llbd and glbd.  My guess is that the NCS 
services aren't as robust as they should (could?) be, and that they start
thrashing if/when communications are interrupted/corrupted.  I've noticed
the llbd running amok fairly often after the tcpd dies or gets aborted.

When this occurs, I've just sigp'ed the NCS services, and re-started them.

Lately, we haven't seen the registry daemon running amok, so I don't know
what the scoop is.

John Thompson (jt)
Honeywell, SSEC
Plymouth, MN  55441
thompson@pan.ssec.honeywell.com

As ever, my opinions do not necessarily agree with Honeywell's or reality's.
(Honeywell's do not necessarily agree with mine or reality's, either)

wjw@eba.eb.ele.tue.nl (Willem Jan Withagen) (08/23/90)

In article <9008221628.AA10801@pan.ssec.honeywell.com> thompson@PAN.SSEC.HONEYWELL.COM (John Thompson) writes:
>
>
>> On our master server where we have the registry and the glbd and llbd, we are
>> seeing that rgyd start running at 95% out of the blue.  This causes mail, and
>> printing services to be distrupted since communication seems to be broken.
>> 
>> Has anyone seen this wild rgyd?  We are running 10.2 ........

rumour had it once that the registry services did not work well 
with certain number of entries in the accounts or groups.

I keep comp.sys.apollo for quite a while, and below one of the previous
discussions. And as far as I could see is this one not yet fixed.

Regards,
	Willem Jan Withagen.

-----------------------------
Well, I found out what caused the problems with my corrupted
and missing /etc/{org,group,passwd} files -- many thanks to 
Betsy Minahan @ apollo for diagnosing it.

It seems that if there are exactly 16 entries in the equivalents of
/etc/group or /etc/passwd, some of the 3 are unreadable and others
are corrupted.  This is a problem that Apollo knows about and is
working on. 

The workaround is to use edrgy and add another account, or person,
or something, to bring the number over 16.  It seems I also had
to delete \`node_data/systmp/{.org,.group,.passwd} as well, so that
they could get recreated properly (thanks to Michael Zeleznik for that
last tip).
------------------------------
Eindhoven University of Technology   DomainName:  wjw@eb.ele.tue.nl    
Digital Systems Group, Room EH 10.10 BITNET: ELEBWJ@HEITUE5.BITNET
P.O. 513                             Tel: +31-40-473401
5600 MB Eindhoven                    The Netherlands