[comp.unix.wizards] patching bad sectors on a uda50 drive

chris@umcp-cs.UUCP (Chris Torek) (11/10/86)
In article <1537@oddjob.UUCP> matt@oddjob.UUCP (Matt Crawford) writes:
>EVRLK ... formerly known as "rabads" ... appears on the rev 24
>diagnostic tape, and most FEs have no experience with it.
>
>Caveat: by default, EVRLK will write a forced error (FER) on the
>replacement block.

If it follows the recommended procedure for host-initiated bad
block replacement (DECese for `fixing the pack'), it writes forced
errors if and only if it is unable to read the original sector.

>... If your goal is to avoid dump/format/restore you should answer
>"No" to the prompt "Enable write with forced errors", avoiding the
>FER mark and allowing you to attempt to clean up the disk with fsck
>after your bad sectors have been forwarded.

There is a problem with this.  If the bad block is in the inode
area of a cylinder group, or if it is in a directory, fsck may
indeed be able to patch things up.  In this case, preventing the
forced error may be a good thing, as fsck may be able to get some
of the original data, and any is better than none.  If, however,
the bad block is in the midst of a regular file, that file will
now have garbage in the middle that (without the forced error) may
be rather hard to find.

A third possibility, of course, is that the block is free.  In this
case, the forced error modifier should be irrelevant, since the
block should be written before it is ever read.  (If the bad sector
is in the tail part of block that has been fragmented, it might
cause spurious error messages during dumps, if it is marked with
a forced error.  Dump itself will not complain, but the kernel
will.  Also, pack-to-pack copies will fail in the presence of forced
errors.)

So what *should* you do?  Well, ideally, toss out that RA81 and
buy an Eagle.  It is cheaper, faster, and more reliable.  (It is
also---alas!---fifty megabytes smaller; but then, you may be able
to buy two Eagles for the price of one RA81.)  But if you cannot
do that, run /etc/icheck to find out where the bad block lives:

	ra0g: hard error sn6674: unit 0, lbn 125436: uncorrectable
	ecc data error (code 8, subcode 7)

---your driver will no doubt print something cryptic instead, such as

	uda0: hard error datagram, hdr 1e9f2 event 0350

(Well, the former is *less* cryptic.  At least I need not run a
progmam to decode it!)

Take the sector number (1e9f2 = 125346) and subtract the start for
the appropriate partition; this should come out approximately (if
not exactly) equal to the sector number in the `ra0g' part.  Feed
this value to icheck -b:

	% icheck -b 6674 /dev/rra0g

Icheck will print something like this:

	6672 arg; frag 0 of 8, inode=2054, class=logical data block 837

which tells me that sector 6672 is owned by inode 2054 as a data
block.  Turn the inode into a path name with /etc/ncheck:

	% ncheck -i 2054 /dev/rra0g
	2054 	/israel/sendmail.ps

(which, since I was actually doing this on rhp3d, which is mounted
as /tmp, means the file which owns sector 6672 is /tmp/israel/sendmail.ps).

Armed with all this information about the bad block, tell EVRLK
not to use the forced error modifier, then go clean up your file
system by hand afterward.  At worst, you can use /etc/clrblk to
zero out the sector.  Oh: you have to write clrblk first.  For that
matter, I would have to write it too---for the fourth time, at the
very least....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu