dwex%mtgzfs3@mtgzy.att.com (David E Wexelblat) (04/25/91)
Let me preface this by saying that I am pretty sure that the problem is specific to the 3B1, and not a gtar bug. But one never knows. [gtar v1.09, compiled basically as System V, with 3B1 shared libraries] I was testing out the multi-volume and verification features on a medium sized directory prior to using it to back up my hard disk. Like a bozo, I specified the block device for the floppy disk instead of the raw device. Gtar went on dumping stuff to the floppy (or trying to :->) after the red LED went out on the floppy. Either there was no end-of-file reported by the driver, or gtar ignored it. What's the problem you ask? After breaking out of gtar, I tried to cd to the directory where gtar was living and got /u/dwex/gnu: not a directory Uh oh, I think to myself. Time to run fsck. But wait! Before I can su I get: panic: dup inode (or something like that). This is getting better and better, I say to myself. Hit the reset button. Guess what? Init asks me what run-level I want! Yes, you guessed it. /etc/inittab is all gone. No big deal, I think. Just run fsck from the floppy, and then make a new iniitab. Drag out the floppy boot disk. Boot from floppy, and break out onto the floppy filesystem disk. Type "/mnt/etc/fsck" and I get: /mnt/etc/fsck: cannot execute Not good. Then I 'cat' /mnt/etc/fsck. Nice file of nroff text. Oops again. At this point many people would be in deep trouble. The normal floppy file system has no fsck on it. But fortunately I was smart a while ago, and made my own floppy file system with fsck on it. So I run fsck, and there are about 40 dup inodes. Fortunately fsck tells me what's nuked. Lots of good stuff like /etc/inittab, /etc/getty, /etc/iv. Well, once my disk was patched, I created a new inittab, using /usr/lib/uucp/uugetty. Reboot, pull stuff off of floppy, and live happily ever after. (Theoretically, according to the documents, uugetty won't work on the console. But it worked long enough to get my system back.) Now, I was running as myself, not as root, when I ran gtar. SO HOW THE HELL DID MY HARD DISK GET NUKED? I wrote a test program to try reading to end-of-file on both the raw and block floppy devices, and both correctly reported end-of-file and quit. I wasn't about to try a test with writes (once was enough, thank you). Any pointers (besides "don't use gtar") would be useful. Another horror story for you: A couple of years ago my hard disk grew a bad block smack dab in the middle of /unix. This is not fun. Note that a copy of /unix will not fit on the floppy file system, and I haven't figured out a way to read a cpio archive while maintaining access to the floppy file system. So here's what we did (a good friend worked through this disaster with me -- this is when I discovered that cpio just doesn't bother to write anything when it can't read a disk block :-<): 1) Go to another 3B1 2) dd if=/unix of=/tmp/unixa count=200 3) dd if=/unix of=/tmp/unixb count=200 skip=200 4) repeat for the rest of /unix 5) mount /dev/fp021 /mnt (floppy file system disk) 6) cp /tmp/unixa /mnt 7) dismount -f 8) boot my dead 3b1 from floppy 9) cp /unixa /mnt/unixa (copy from floppy file system to hard disk) 10) repeat 5-9 for the other parts 11) mv /mnt/unix /mnt/unix.fubar 12) cat /mnt/unix? > /mnt/unix 13) boot off hard disk 14) make backup 15) format hard disk 16) restore foundation set 17) discover cpio brain-damage 18) dd each piece of corrupted cpio archive to /tmp 19) use adb to patch each piece (basically, just fix the length in the header) 20) dd the files back out to floppy 21) restore backup 22) have several beers :-> All of this took about 10 hours. This was about 2 weeks before the first version of afio was posted to the net (at least the first one I ever saw). So now in / on my hard disk is /unix.bk.Z, and on my floppy file system (in addition to fsck) are afio and uncompress. Fool me once, shame on you. Fool me twice, shame on me. (I'm not sure it this all deserves a :->, a :-<, or a !@#$%) -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- David Wexelblat | dwex@mtgzz.att.com | I asked her her name. AT&T Bell Laboratories | ...!att!mtgzz!dwex | She said her name was 200 Laurel Ave - 4B-421 | | 'Maybe' Middletown, NJ 07748 | (201) 957-5871 | --Damn Yankees
clewis@ferret.ocunix.on.ca (Chris Lewis) (04/27/91)
In article <1991Apr24.201757.26147@cbnewsj.att.com> dwex%mtgzfs3@mtgzy.att.com (David E Wexelblat) writes: >[gtar v1.09, compiled basically as System V, with 3B1 shared libraries] He goes on to describe how some testing with gtar managed to splatter his hard disk. I've had similar things happen twice, once while running something like: find <some dir> -name '*.Z' exec compress -dc '{}' ';' | grep '^Subject:' | sed ... (sort of like finding all of the subjects out of a compressed news spool - LOTS of files) and the other time the machine was "dead in the morning". Could have been a compress(v4)/pathalias(v10) run. Or B-news expire(2.11.19). 3.5.1.4 O/S, no hardware mods. No disk errors in the log. The first time just managed to nuke getty and init. Easy to repair. The second time nuked the /bin and /etc directory, and placed all of the subfiles under /lost+found. Repaired by remaking the directories (from the boot floppy) and comparing checksums with similarly configured 3b1's to figger out which /lost+found file was which. [NOTE: there is enough room on the boot floppy to put fsck on it. DO IT NOW! I had to wait for a courier...] I figger that there is a very subtle bug down deep somewhere in the O/S which causes F/S corruption in exceedingly rare high load situations. (It could even be the infamous SV inode bug manifesting itself in a slightly different way). This doesn't seem related to the "pulse dial during high disk load causing panic" problem which I've reported earlier, and has completely disappeared since I went to tone dial. Further, at least at my version level, the 3b1 is rather fragile in disk full situations. If your disk goes full, you can lose /etc/inittab, /usr/lib/uucp/L.sys as well as other things (eg: setgetty edits /etc/inittab on uucico outbound startup - if this occurs during disk full - poof!) I suggest you copy some of these files somewhere so that you can recover... -- Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List: ferret-request@eci386; Psroff (not Adobe Transcript) enquiries: psroff-request@eci386 or Canada 416-832-0541. Psroff 3.0 in c.s.u soon!