[comp.sys.apollo] Corrupted /etc/passwd, /etc/org, /etc/group

holtz@cascade.carleton.ca (Neal Holtz) (05/26/90)

Sometime between 15:00 PM yesterday aft (thursday) and this morning, /etc/passwd
disappeared.  If you try to read it from the DM, you get:

   (CV) /etc/passwd - Entry not found from (network computing system / Registry Server)

At the same time, /etc/org is corrupted.  Part of the ouput of 'cat /etc/org' is:

    wheel::0:
    apollo::1:user
    none::12:root,daemon,none,user,lp,admin,sys_person,uucp,bin,holtz,mcdill,\
        jefferson,guest,nholtz
    sys_org::13:
    uucp/uucico
    none:..:12:12::/:
    sys_person:..:13:12::/:
    user:..:14:12::/:
    admin:..:15:12::/:
    ... [junk removed for brevity]

Note that the last 6 lines (after "sys_org::13:") are junk from a passwd file
(with encrypted passwords removed by me).

I can log on just fine, and 'edrgy -v' seems to display the right stuff.

Since yesterday aft, I have:

   1. used edrgy to add a new peson and account.  At that time, I had some
      trouble because of a read-only replica on another node, so I had
      to use 'edrgy -s //my_node'.  Even though //my_node was the master
      registry, edrgy by default would go after //other_node.
   2. started the cron daemon and ran find/update @ 3:00 am
   3. "rcp -r"'ed (from a Sun) the 50MB of the mit x11.r4 sources,
      and type 'make World' in the root dir (compile ran for
      7 hrs on a DN2500, by the way)
   4. added "xbiff &" and "xload &" to my .login.
   5. reconnected my external SCSI disk that was offline for
      a few days.

Presumably, it was my screwing around with the registries that caused the
problems, but for the life of me I cannot remember doing anything very
unusual with the exception of explicitly directing edrgy to a 
particuler site.

Does anyone have any idea what is wrong.  It looks the the 'org', 'passwd'
(and possibly 'group') typemanagers are bad.

Config:  DN2500, BSD4.3, SR10.2

Work around:  replaced /etc/passwd with symbolic link to /sys/registry/rgy_data/unix/passwd

Processes running:    


    USER       PID   SZ  RSS TTY     STAT  TIME COMMAND
    root         1 1184  481 ?       S <   0:28 /etc/init
    root         2    0    0 ?       R    14:37 null
    root         3    0    0 ?       S     0:01 purifier
    root         4    0    0 ?       S     0:00 purifier
    root         5    0    0 ?       S     0:02 unwired_dxm
    root         6    0    0 ?       S     0:00 pinger
    root         7    0    0 ?       S     0:00 netreceive
    root         8    0    0 ?       S     0:00 netpaging
    root         9    0    0 ?       S     0:00 wired_dxm
    root        10    0    0 ?       S     0:00 netrequest
    root        90 1856  131 ?       S     0:04 /etc/tcpd
    root        95 1888   39 ?       S     0:00 /etc/routed -f -q
    root        98 1888   60 ?       S     0:00 /etc/inetd
    root       101 2048  151 ?       S     0:00 /etc/ncs/llbd
    root       103 2976  313 ?       S     0:02 /etc/ncs/glbd
    root       106 3488  368 ?       S <   0:06 /etc/rgyd
    root       110  576   26 ?       S     0:00 /etc/cron
    user       115  704   12 ?       S     0:00 /sys/spm/spm
    root       118  672   61 ?       S     0:00 /sys/net/netman
    root       123 3552  695 ?       S <   0:08 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config -D1 s+r-
    user       124  896    0 ?       S     0:00 /sys/mbx/mbx_helper
    holtz      126 2848  591 ?       S <   0:42 dm
    root       148 1632   75 ?       S     0:16 vtserver
    holtz      140  896    1 pad0001 S     0:01 -csh
    holtz      145 1440   48 pad0001 S     0:02 xbiff
    holtz      146 1536  355 pad0001 S     0:03 xload
    holtz      141 1408  146 pad0002 S     0:06 /local/com/clock
    holtz      151  896  104 pad0003 S     0:00 $(shell)
    holtz      181  992  143 pad0003 R     0:00 ps aux
    holtz      147 2496  226 ttyp9   S     0:10 rn -e -h +hFrom +hSubj +hOrgan -M -m -S -/
    holtz      155  832   64 ttyp9   S     0:00 /bin/sh -c Pnews -h //zonker/user/holtz/.rnhead
    holtz      156  864   75 ttyp9   S     0:02 Pnews /usr/local/bin/Pnews -h //zonker/user/holtz/.rnhead
    holtz      180  928  142 ttyp9   S     0:03 /user/holtz/bin/emacs //zonker/user/holtz/.article

pato@apollo.HP.COM (Joe Pato) (05/29/90)

In article <1990May25.185740.5797@ccs.carleton.ca>,
holtz@cascade.carleton.ca (Neal Holtz) writes:
|> Sometime between 15:00 PM yesterday aft (thursday) and this morning,
/etc/passwd
|> disappeared.  If you try to read it from the DM, you get:
|> 
|>    (CV) /etc/passwd - Entry not found from (network computing system
/ Registry Server)
|> 

We recently discovered this problem with the sr10.2 registry servers.
The problem is an incorrect boundary condition check when producing the
dynamic view of the /etc/{passwd,group,org} files.  It occurs when there
are 16, 528, 1040, ... lines in any of these files.

A simple work around is to simply add (or delete) a dummy account/group/org
to prevent the files from having the number of lines that produce the error
case.

                    -- Joe Pato
                       Hewlett Packard Company
                       pato@apollo.hp.com

holtz@cascade.carleton.ca (Neal Holtz) (05/30/90)

Well, I found out what caused the problems with my corrupted
and missing /etc/{org,group,passwd} files -- many thanks to 
Betsy Minahan @ apollo for diagnosing it.

It seems that if there are exactly 16 entries in the equivalents of
/etc/group or /etc/passwd, some of the 3 are unreadable and others
are corrupted.  This is a problem that Apollo knows about and is
working on. 

The workaround is to use edrgy and add another account, or person,
or something, to bring the number over 16.  It seems I also had
to delete \`node_data/systmp/{.org,.group,.passwd} as well, so that
they could get recreated properly (thanks to Michael Zeleznik for that
last tip).