[net.bugs.4bsd] UDA50 and bad blocks

probe@mm730.uq.OZ (Cameron Davidson) (08/01/85)

This was prompted by arnold@gatech - message <655@gatech.CSNET>, except that
it has nothing to do with Ultrix.
The experience on our machine (vax730/4.2BSD/RA80) indicates that there may
be as many problems in automatically forwarding newly found bad-blocks as
there would be avoided by it.

As I understand it the UDA-50 controller transparently does bad-block
forwarding PROVIDED the blocks were flagged the last time the disc surfaces
were formatted. The problem occurs when you start to get previously good
sectors reporting hard errors - even though you cannot read the contents of
the sector it might not really be a bad block.

We recently had a problem on an RA-80 with gradually increasing intensity of
soft and then hard errors. The DEC service engineers just said well, if you
ran VMS...  Each file with a hard error that we found we just hid away and
wondered what would happen if it got worse. Then of course a hard error
occurred in the inode area of the user filesystem and that was the end of
our complacency. While we were messing around trying to get a list of all
the bad blocks the root filesystem fell over and a bit later so did the swap
area when I was trying to boot the mini-root fs from tape.

Then we found out that DEC diagnostics cannot just read a disc to find errors,
it must first write known data...well the last backup was fairly recent.
The diagnostics laboured happily over the night and reported about 16 sectors
which did not write/read correctly, most with more than one offence, and most
associated with a specific head. However, of the sectors which UN*X had
previously called unreadable, only one appeared as a hard error and two
were reporting soft (ECC correctable) errors. Reformatting the disc
and adding the bad sector info reported 20 sectors revectored, and then
retesting the disc gave a similar number of fresh bad blocks. The problem
turned out to be the read/write amplifier board - there was nothing wrong
with the head/disc assembly.

Lessons: (well they were new to me)

1. BUG IN DUMP: it reads inodes in 8k chunks - fine... but if one sector
out of the 16 is unreadable you've lost the lot. By that stage it is
probably impossible to recompile dump with a smaller block size.

2. If the software added bad blocks to the hardware revectoring table on its
own account then there would have been a race in our case to see whether we
first filled up the bad-block table with not-really-bad blocks or clobbered
one of the inode blocks. No operating system can survive having its directory
structures corrupted (even, I am told, VMS) and if that happens there is
nothing to do but a dump/reformat/restore. Until that occurs, and if the
errors are in file data areas only, it is a fairly simple matter to allocate
sectors with hard errors to dummy files that can be ignored. 
	The main occasion on which it would be nice to be able to add bad
blocks to the re-vector table would be if they were in the paging area.
If the DEC diagnostics are able to reformat just a given range of cylinders
then this would be enough (I can't remember - but certainly the exerciser
program can check any given fraction of the disc). Failing this we only need
a standalone program to add bad blocks to the table, but I don't suppose
DEC are too keen to give out the necesary info.
	The difficulty inherent in any automatic bad-block table rewriting
lies in judging when the unreliability of a given sector becomes intolerable;
certainly a single instance of failure which is cured by rewriting it should
not be sufficient. This leads to variable criteria depending on the location
within the disc partition. I would suggest that the simplest solution to
impliment and to use would be a user program allowing manual entry of a
block into the re-vector table (all volunteers one step forward please).

3. reliable file systems? We may have umpteen cloned superblocks in 4.2BSD
but for a reliable system we would also need duplicated inodes. Try mounting
a filesystem with unreadable inodes and see what happens.

4. How do you tell which block is giving the hard error - the "sec no"
reported by the error message is actually the STARTING sector number for the
transfer (usually multiple sector). It is the "hdr" that reports the real
disc sector that went bad.

5. DEC diagnostics cannot report which sectors are currently unreadable (e.g
with too many bit errors for ECC). If anybody wants it I now have a trivial
program which reads the disc and reports head, cylinder, sector etc of
unreadable sectors. (DEC doc. didn't tell me about the half cylinder offset on
every second cylinder)

Cameron Davidson

ACSnet or CSNET: probe@mm730.uq.oz
UUCP:		...seismo!munnari!mm730.uq.oz!probe
ARPA:		probe%mm730.uq.oz@seismo

chris@umcp-cs.UUCP (Chris Torek) (08/06/85)

[Please note that all my responses are based only on what I *think* is
true; I have almost no hard data on UDA50s.]

>As I understand it the UDA-50 controller transparently does bad-block
>forwarding PROVIDED the blocks were flagged the last time the disc
>surfaces were formatted. The problem occurs when you start to get
>previously good sectors reporting hard errors - even though you
>cannot read the contents of the sector it might not really be a
>bad block.

Let us agree on some definitions first:

	bad sector: a sector from which data written cannot reliably
		be reread.
	soft error: an error that is correctable, in this case by
		using ecc information.
	hard error: an error that is not correctable (the original
		data cannot be reconstructed).

(Note that all these errors can only be detected by attempting to
read a sector.)

Anyway, one may get a hard error that is not due to a bad sector,
if the data has been lost due to (e.g.) write current failure rather
than media problems.

Now, as to UDA50 bad sector forwarding:  DEC SDI (Standard Disk
Interface?) format specifies that there are some number of RCT
(Replacement and Cacheing Table?) areas on the disk (in the case
of an RA81 there are four).  For each sector, the controller will
look in the RCT tables to see if the sector has been forwarded.
It will never add a sector to these tables itself.

>We recently had a problem on an RA-80 with gradually increasing
>intensity of soft and then hard errors.

How unusual :-).  (I wish I knew the magic words that would transform
RA81s into Fuji Eagles.)

>The DEC service engineers just said well, if you ran VMS...

Sigh.  However, partial good news: DEC is rumored to have a standalone
program called "rabads", which can be used to add sectors to the
RCT tables.  If your field service rep hasn't heard of it, try to
contact someone in Ultrix support.  (I have not actually seen this
program myself, however.)

>[...] Then of course a hard error occurred in the inode area of
>the user filesystem [...].

>Then we found out that DEC diagnostics cannot just read a disc to
>find errors, it must first write known data...well the last backup
>was fairly recent.  The diagnostics laboured happily over the night
>and reported about 16 sectors [...]. Reformatting the disc and adding
>the bad sector info reported 20 sectors revectored, and then
>retesting the disc gave a similar number of fresh bad blocks. The
>problem turned out to be the read/write amplifier board - there
>was nothing wrong with the head/disc assembly.

I reacall some ECOs on the r/w board: problems with the write
current levels, I believe.  In any case we still get an inordinate
number of "lost rd/wr ready drive error" errors (code 11, subcode
4, in MSCP lingo).  I wonder.

>Lessons: (well they were new to me)

>1. BUG IN DUMP: it reads inodes in 8k chunks - fine... but if one
>sector out of the 16 is unreadable you've lost the lot. By that
>stage it is probably impossible to recompile dump with a smaller
>block size.

When the driver does bad block forwarding itself this is less of
a problem, since it occasionally recovers.  However, your point is
well taken: dump should retry using 512 byte reads.

>2. If the software added bad blocks to the hardware revectoring
>table on its own account then there would have been a race in our
>case to see whether we first filled up the bad-block table with
>not-really-bad blocks or clobbered one of the inode blocks. No
>operating system can survive having its directory structures
>corrupted (even, I am told, VMS) and if that happens there is
>nothing to do but a dump/reformat/restore. Until that occurs, and
>if the errors are in file data areas only, it is a fairly simple
>matter to allocate sectors with hard errors to dummy files that
>can be ignored.

The bad block forwarding should only be done after the "bad" sector
has been tested, since the driver sometimes reports bad sectors
when they have only transient hard errors.  Dave Gehrt's driver
does this.  Also, when the block is forwarded the replacement sector
must be initialized with a "forced error" if the original data is
suspect.  This error will vanish when the block is rewritten later.

Unix *can* recover from losing directories; losing inodes is worse
(the files are essentially gone) but not all is lost: if the
remainder of the disk is readable, fsck will usually handle it,
even if you have to copy just the readable portions to a new drive
first.

Replacement is done like this: when you get a bad block report,
1. read the original data, and remember whether it succeds,
2. copy that data to RCT sector 1 ("spare" sector) (all RCTs),
3. write test pattern, if fail, replace,
4. read test pattern, if fail or doesn't match, replace,
5. ignore the error, copy the spare sector back and return
   (with forced error iff step 1 failed),
6. replace: allocate a replacement sector,
7. write the replacement sector entry in all the RCTs (i.e.,
   mark the RCT entry in use),
8. issue M_OP_REPLACE command to replace the original sector,
9. copy the spare sector to back to the original logical
   block (which has now been remapped).

Of course there are all sorts of things that can go wrong, so it's
not quite that simple.

You are indeed in trouble when the RCT fills up.  At that point
your only option is to replace the HDA.  Fortunately an RA81 RCT
holds over 600 sectors.  (RA81s tend to come with 50-200 bad sectors
already mapped!)

>The difficulty inherent in any automatic bad-block table rewriting
>lies in judging when the unreliability of a given sector becomes
>intolerable; certainly a single instance of failure which is cured
>by rewriting it should not be sufficient.

(This was covered above.)

>This leads to variable criteria depending on the location within
>the disc partition. I would suggest that the simplest solution to
>impliment and to use would be a user program allowing manual entry
>of a block into the re-vector table (all volunteers one step forward
>please).

I intend (in my copious spare time :-) ) to someday allow an ioctl
in the UDA50 driver that forwards a given sector.  This can then
be done by hand or by some program that tallies /usr/adm/messages
or whatever.  In the meantime rabads (if it exists) is a fairly
weildy solution (weildy being the opposite of unweildy).

>3. reliable file systems? We may have umpteen cloned superblocks
>in 4.2BSD but for a reliable system we would also need duplicated
>inodes. Try mounting a filesystem with unreadable inodes and see
>what happens.

Depends on how you define "reliable".  4.2 can recover from many
kinds of disk trashings, but you're going to lose *something*.
(And you're not supposed to *mount* filesystems with bugs anyway.)

>4. How do you tell which block is giving the hard error [....]

The driver should report the LBN (logical block number) from the
hard error datagram.  (There may be no datagram; in such cases one
must guess.)

>Cameron Davidson
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

chris@umcp-cs.UUCP (Chris Torek) (08/07/85)

Whoops...

>From: chris@umcp-cs.UUCP (Chris Torek)
>Fortunately an RA81 RCT holds over 600 sectors.

This is true, but (I just checked yesterday) is off by two orders
of magnitude.  An RA81 has an incredible 17528 replacement sectors!
Talk about planning ahead....

Gyre (one of our 750s) has 308 bad blocks, 9 unusable replacement
blocks, and 17211 spares still available on its single RA81.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland