[comp.sys.3b1] Extreme unhappiness caused by gtar on AT&T 3B1 v3.51

dwex%mtgzfs3@mtgzy.att.com (David E Wexelblat) (04/25/91)

Let me preface this by saying that I am pretty sure that the problem
is specific to the 3B1, and not a gtar bug.  But one never knows.

[gtar v1.09, compiled basically as System V, with 3B1 shared libraries]

I was testing out the multi-volume and verification features on
a medium sized directory prior to using it to back up my hard disk.
Like a bozo, I specified the block device for the floppy disk
instead of the raw device.  Gtar went on dumping stuff to the floppy
(or trying to :->) after the red LED went out on the floppy.
Either there was no end-of-file reported by the driver, or gtar
ignored it.

What's the problem you ask?  After breaking out of gtar, I tried
to cd to the directory where gtar was living and got 

	/u/dwex/gnu: not a directory

Uh oh, I think to myself.  Time to run fsck.  But wait!  Before
I can su I get:

	panic: dup inode

(or something like that).  This is getting better and better, I
say to myself.  Hit the reset button.  Guess what?  Init asks me
what run-level I want!  Yes, you guessed it.  /etc/inittab is
all gone.  No big deal, I think.  Just run fsck from the floppy,
and then make a new iniitab.

Drag out the floppy boot disk.  Boot from floppy, and break out onto 
the floppy filesystem disk.  Type "/mnt/etc/fsck" and I get:

	/mnt/etc/fsck: cannot execute

Not good.  Then I 'cat' /mnt/etc/fsck.  Nice file of nroff text.
Oops again.  At this point many people would be in deep trouble.
The normal floppy file system has no fsck on it.  But fortunately
I was smart a while ago, and made my own floppy file system with
fsck on it.  So I run fsck, and there are about 40 dup inodes.
Fortunately fsck tells me what's nuked.  Lots of good stuff
like /etc/inittab, /etc/getty, /etc/iv.  Well, once my
disk was patched, I created a new inittab, using /usr/lib/uucp/uugetty.
Reboot, pull stuff off of floppy, and live happily ever after.
(Theoretically, according to the documents, uugetty won't work
on the console.  But it worked long enough to get my system back.)

Now, I was running as myself, not as root, when I ran gtar. SO HOW
THE HELL DID MY HARD DISK GET NUKED?  I wrote a test program
to try reading to end-of-file on both the raw and block floppy
devices, and both correctly reported end-of-file and quit.  I 
wasn't about to try a test with writes (once was enough, thank
you).  Any pointers (besides "don't use gtar") would be useful.

Another horror story for you: 
A couple of years ago my hard disk grew a bad block smack dab in
the middle of /unix.  This is not fun.  Note that a copy of
/unix will not fit on the floppy file system, and I haven't
figured out a way to read a cpio archive while maintaining 
access to the floppy file system.  So here's what we did (a
good friend worked through this disaster with me -- this is when
I discovered that cpio just doesn't bother to write anything
when it can't read a disk block :-<):

	1) Go to another 3B1
	2) dd if=/unix of=/tmp/unixa count=200
	3) dd if=/unix of=/tmp/unixb count=200 skip=200
	4) repeat for the rest of /unix
	5) mount /dev/fp021 /mnt (floppy file system disk)
	6) cp /tmp/unixa /mnt
	7) dismount -f
	8) boot my dead 3b1 from floppy
	9) cp /unixa /mnt/unixa (copy from floppy file system to hard disk)
	10) repeat 5-9 for the other parts
	11) mv /mnt/unix /mnt/unix.fubar
	12) cat /mnt/unix? > /mnt/unix
	13) boot off hard disk
	14) make backup
	15) format hard disk
	16) restore foundation set
	17) discover cpio brain-damage
	18) dd each piece of corrupted cpio archive to /tmp
	19) use adb to patch each piece (basically, just fix the
	    length in the header)
	20) dd the files back out to floppy
	21) restore backup
	22) have several beers :->

All of this took about 10 hours.  This was about 2 weeks before
the first version of afio was posted to the net (at least the
first one I ever saw).  So now in / on my hard disk is /unix.bk.Z,
and on my floppy file system (in addition to fsck) are afio and
uncompress.  Fool me once, shame on you.  Fool me twice, shame on me.
(I'm not sure it this all deserves a :->, a :-<, or a !@#$%)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
David Wexelblat             | dwex@mtgzz.att.com    | I asked her her name.
AT&T Bell Laboratories      | ...!att!mtgzz!dwex    |   She said her name was
200 Laurel Ave - 4B-421     |                       |      'Maybe'
Middletown, NJ  07748       | (201) 957-5871        | --Damn Yankees

clewis@ferret.ocunix.on.ca (Chris Lewis) (04/27/91)

In article <1991Apr24.201757.26147@cbnewsj.att.com> dwex%mtgzfs3@mtgzy.att.com (David E Wexelblat) writes:
>[gtar v1.09, compiled basically as System V, with 3B1 shared libraries]

He goes on to describe how some testing with gtar managed to splatter
his hard disk.

I've had similar things happen twice, once while running something like:

	find <some dir> -name '*.Z' exec compress -dc '{}' ';' | 
	    grep '^Subject:' | sed ...
	
       (sort of like finding all of the subjects out of a compressed
       news spool - LOTS of files)

and the other time the machine was "dead in the morning".  Could have been
a compress(v4)/pathalias(v10) run.  Or B-news expire(2.11.19).

3.5.1.4 O/S, no hardware mods.  No disk errors in the log.

The first time just managed to nuke getty and init.  Easy to repair.

The second time nuked the /bin and /etc directory, and placed all of the
subfiles under /lost+found.  Repaired by remaking the directories (from the
boot floppy) and comparing checksums with similarly configured 3b1's to
figger out which /lost+found file was which.

[NOTE: there is enough room on the boot floppy to put fsck on it.  DO IT
NOW!  I had to wait for a courier...]

I figger that there is a very subtle bug down deep somewhere in the O/S
which causes F/S corruption in exceedingly rare high load situations.
(It could even be the infamous SV inode bug manifesting itself in a slightly
different way).  This doesn't seem related to the "pulse dial during high
disk load causing panic" problem which I've reported earlier, and has
completely disappeared since I went to tone dial.

Further, at least at my version level, the 3b1 is rather fragile in disk
full situations.  If your disk goes full, you can lose /etc/inittab,
/usr/lib/uucp/L.sys as well as other things (eg: setgetty edits /etc/inittab
on uucico outbound startup - if this occurs during disk full - poof!)
I suggest you copy some of these files somewhere so that you can recover...
-- 
Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca
UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List:
ferret-request@eci386; Psroff (not Adobe Transcript) enquiries:
psroff-request@eci386 or Canada 416-832-0541.  Psroff 3.0 in c.s.u soon!