holtz@cascade.carleton.ca (Neal Holtz) (05/26/90)
Sometime between 15:00 PM yesterday aft (thursday) and this morning, /etc/passwd
disappeared. If you try to read it from the DM, you get:
(CV) /etc/passwd - Entry not found from (network computing system / Registry Server)
At the same time, /etc/org is corrupted. Part of the ouput of 'cat /etc/org' is:
wheel::0:
apollo::1:user
none::12:root,daemon,none,user,lp,admin,sys_person,uucp,bin,holtz,mcdill,\
jefferson,guest,nholtz
sys_org::13:
uucp/uucico
none:..:12:12::/:
sys_person:..:13:12::/:
user:..:14:12::/:
admin:..:15:12::/:
... [junk removed for brevity]
Note that the last 6 lines (after "sys_org::13:") are junk from a passwd file
(with encrypted passwords removed by me).
I can log on just fine, and 'edrgy -v' seems to display the right stuff.
Since yesterday aft, I have:
1. used edrgy to add a new peson and account. At that time, I had some
trouble because of a read-only replica on another node, so I had
to use 'edrgy -s //my_node'. Even though //my_node was the master
registry, edrgy by default would go after //other_node.
2. started the cron daemon and ran find/update @ 3:00 am
3. "rcp -r"'ed (from a Sun) the 50MB of the mit x11.r4 sources,
and type 'make World' in the root dir (compile ran for
7 hrs on a DN2500, by the way)
4. added "xbiff &" and "xload &" to my .login.
5. reconnected my external SCSI disk that was offline for
a few days.
Presumably, it was my screwing around with the registries that caused the
problems, but for the life of me I cannot remember doing anything very
unusual with the exception of explicitly directing edrgy to a
particuler site.
Does anyone have any idea what is wrong. It looks the the 'org', 'passwd'
(and possibly 'group') typemanagers are bad.
Config: DN2500, BSD4.3, SR10.2
Work around: replaced /etc/passwd with symbolic link to /sys/registry/rgy_data/unix/passwd
Processes running:
USER PID SZ RSS TTY STAT TIME COMMAND
root 1 1184 481 ? S < 0:28 /etc/init
root 2 0 0 ? R 14:37 null
root 3 0 0 ? S 0:01 purifier
root 4 0 0 ? S 0:00 purifier
root 5 0 0 ? S 0:02 unwired_dxm
root 6 0 0 ? S 0:00 pinger
root 7 0 0 ? S 0:00 netreceive
root 8 0 0 ? S 0:00 netpaging
root 9 0 0 ? S 0:00 wired_dxm
root 10 0 0 ? S 0:00 netrequest
root 90 1856 131 ? S 0:04 /etc/tcpd
root 95 1888 39 ? S 0:00 /etc/routed -f -q
root 98 1888 60 ? S 0:00 /etc/inetd
root 101 2048 151 ? S 0:00 /etc/ncs/llbd
root 103 2976 313 ? S 0:02 /etc/ncs/glbd
root 106 3488 368 ? S < 0:06 /etc/rgyd
root 110 576 26 ? S 0:00 /etc/cron
user 115 704 12 ? S 0:00 /sys/spm/spm
root 118 672 61 ? S 0:00 /sys/net/netman
root 123 3552 695 ? S < 0:08 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config -D1 s+r-
user 124 896 0 ? S 0:00 /sys/mbx/mbx_helper
holtz 126 2848 591 ? S < 0:42 dm
root 148 1632 75 ? S 0:16 vtserver
holtz 140 896 1 pad0001 S 0:01 -csh
holtz 145 1440 48 pad0001 S 0:02 xbiff
holtz 146 1536 355 pad0001 S 0:03 xload
holtz 141 1408 146 pad0002 S 0:06 /local/com/clock
holtz 151 896 104 pad0003 S 0:00 $(shell)
holtz 181 992 143 pad0003 R 0:00 ps aux
holtz 147 2496 226 ttyp9 S 0:10 rn -e -h +hFrom +hSubj +hOrgan -M -m -S -/
holtz 155 832 64 ttyp9 S 0:00 /bin/sh -c Pnews -h //zonker/user/holtz/.rnhead
holtz 156 864 75 ttyp9 S 0:02 Pnews /usr/local/bin/Pnews -h //zonker/user/holtz/.rnhead
holtz 180 928 142 ttyp9 S 0:03 /user/holtz/bin/emacs //zonker/user/holtz/.articlepato@apollo.HP.COM (Joe Pato) (05/29/90)
In article <1990May25.185740.5797@ccs.carleton.ca>, holtz@cascade.carleton.ca (Neal Holtz) writes: |> Sometime between 15:00 PM yesterday aft (thursday) and this morning, /etc/passwd |> disappeared. If you try to read it from the DM, you get: |> |> (CV) /etc/passwd - Entry not found from (network computing system / Registry Server) |> We recently discovered this problem with the sr10.2 registry servers. The problem is an incorrect boundary condition check when producing the dynamic view of the /etc/{passwd,group,org} files. It occurs when there are 16, 528, 1040, ... lines in any of these files. A simple work around is to simply add (or delete) a dummy account/group/org to prevent the files from having the number of lines that produce the error case. -- Joe Pato Hewlett Packard Company pato@apollo.hp.com
holtz@cascade.carleton.ca (Neal Holtz) (05/30/90)
Well, I found out what caused the problems with my corrupted
and missing /etc/{org,group,passwd} files -- many thanks to
Betsy Minahan @ apollo for diagnosing it.
It seems that if there are exactly 16 entries in the equivalents of
/etc/group or /etc/passwd, some of the 3 are unreadable and others
are corrupted. This is a problem that Apollo knows about and is
working on.
The workaround is to use edrgy and add another account, or person,
or something, to bring the number over 16. It seems I also had
to delete \`node_data/systmp/{.org,.group,.passwd} as well, so that
they could get recreated properly (thanks to Michael Zeleznik for that
last tip).