holtz@cascade.carleton.ca (Neal Holtz) (05/26/90)
Sometime between 15:00 PM yesterday aft (thursday) and this morning, /etc/passwd disappeared. If you try to read it from the DM, you get: (CV) /etc/passwd - Entry not found from (network computing system / Registry Server) At the same time, /etc/org is corrupted. Part of the ouput of 'cat /etc/org' is: wheel::0: apollo::1:user none::12:root,daemon,none,user,lp,admin,sys_person,uucp,bin,holtz,mcdill,\ jefferson,guest,nholtz sys_org::13: uucp/uucico none:..:12:12::/: sys_person:..:13:12::/: user:..:14:12::/: admin:..:15:12::/: ... [junk removed for brevity] Note that the last 6 lines (after "sys_org::13:") are junk from a passwd file (with encrypted passwords removed by me). I can log on just fine, and 'edrgy -v' seems to display the right stuff. Since yesterday aft, I have: 1. used edrgy to add a new peson and account. At that time, I had some trouble because of a read-only replica on another node, so I had to use 'edrgy -s //my_node'. Even though //my_node was the master registry, edrgy by default would go after //other_node. 2. started the cron daemon and ran find/update @ 3:00 am 3. "rcp -r"'ed (from a Sun) the 50MB of the mit x11.r4 sources, and type 'make World' in the root dir (compile ran for 7 hrs on a DN2500, by the way) 4. added "xbiff &" and "xload &" to my .login. 5. reconnected my external SCSI disk that was offline for a few days. Presumably, it was my screwing around with the registries that caused the problems, but for the life of me I cannot remember doing anything very unusual with the exception of explicitly directing edrgy to a particuler site. Does anyone have any idea what is wrong. It looks the the 'org', 'passwd' (and possibly 'group') typemanagers are bad. Config: DN2500, BSD4.3, SR10.2 Work around: replaced /etc/passwd with symbolic link to /sys/registry/rgy_data/unix/passwd Processes running: USER PID SZ RSS TTY STAT TIME COMMAND root 1 1184 481 ? S < 0:28 /etc/init root 2 0 0 ? R 14:37 null root 3 0 0 ? S 0:01 purifier root 4 0 0 ? S 0:00 purifier root 5 0 0 ? S 0:02 unwired_dxm root 6 0 0 ? S 0:00 pinger root 7 0 0 ? S 0:00 netreceive root 8 0 0 ? S 0:00 netpaging root 9 0 0 ? S 0:00 wired_dxm root 10 0 0 ? S 0:00 netrequest root 90 1856 131 ? S 0:04 /etc/tcpd root 95 1888 39 ? S 0:00 /etc/routed -f -q root 98 1888 60 ? S 0:00 /etc/inetd root 101 2048 151 ? S 0:00 /etc/ncs/llbd root 103 2976 313 ? S 0:02 /etc/ncs/glbd root 106 3488 368 ? S < 0:06 /etc/rgyd root 110 576 26 ? S 0:00 /etc/cron user 115 704 12 ? S 0:00 /sys/spm/spm root 118 672 61 ? S 0:00 /sys/net/netman root 123 3552 695 ? S < 0:08 /etc/Xapollo -K /usr/X11/lib/keyboard/keyboard.config -D1 s+r- user 124 896 0 ? S 0:00 /sys/mbx/mbx_helper holtz 126 2848 591 ? S < 0:42 dm root 148 1632 75 ? S 0:16 vtserver holtz 140 896 1 pad0001 S 0:01 -csh holtz 145 1440 48 pad0001 S 0:02 xbiff holtz 146 1536 355 pad0001 S 0:03 xload holtz 141 1408 146 pad0002 S 0:06 /local/com/clock holtz 151 896 104 pad0003 S 0:00 $(shell) holtz 181 992 143 pad0003 R 0:00 ps aux holtz 147 2496 226 ttyp9 S 0:10 rn -e -h +hFrom +hSubj +hOrgan -M -m -S -/ holtz 155 832 64 ttyp9 S 0:00 /bin/sh -c Pnews -h //zonker/user/holtz/.rnhead holtz 156 864 75 ttyp9 S 0:02 Pnews /usr/local/bin/Pnews -h //zonker/user/holtz/.rnhead holtz 180 928 142 ttyp9 S 0:03 /user/holtz/bin/emacs //zonker/user/holtz/.article
pato@apollo.HP.COM (Joe Pato) (05/29/90)
In article <1990May25.185740.5797@ccs.carleton.ca>, holtz@cascade.carleton.ca (Neal Holtz) writes: |> Sometime between 15:00 PM yesterday aft (thursday) and this morning, /etc/passwd |> disappeared. If you try to read it from the DM, you get: |> |> (CV) /etc/passwd - Entry not found from (network computing system / Registry Server) |> We recently discovered this problem with the sr10.2 registry servers. The problem is an incorrect boundary condition check when producing the dynamic view of the /etc/{passwd,group,org} files. It occurs when there are 16, 528, 1040, ... lines in any of these files. A simple work around is to simply add (or delete) a dummy account/group/org to prevent the files from having the number of lines that produce the error case. -- Joe Pato Hewlett Packard Company pato@apollo.hp.com
holtz@cascade.carleton.ca (Neal Holtz) (05/30/90)
Well, I found out what caused the problems with my corrupted and missing /etc/{org,group,passwd} files -- many thanks to Betsy Minahan @ apollo for diagnosing it. It seems that if there are exactly 16 entries in the equivalents of /etc/group or /etc/passwd, some of the 3 are unreadable and others are corrupted. This is a problem that Apollo knows about and is working on. The workaround is to use edrgy and add another account, or person, or something, to bring the number over 16. It seems I also had to delete \`node_data/systmp/{.org,.group,.passwd} as well, so that they could get recreated properly (thanks to Michael Zeleznik for that last tip).