[net.unix-wizards] read error in icheck/dcheck

BH@MIT-AI@sri-unix (06/06/82)

From: Brian Harvey <BH@MIT-AI>
Date: 23 May 1982 20:56-EDT
It came to my attention the other day that the message you get on the
console from the kernel when a disk read gets bad data does not tell you
the actual bad block, but rather the first block the program tried to
read.  icheck and dcheck read inode blocks 16 at a time.  They compound
the problem by themselves (in the routine bread) typing out a message which
also refers to the first of the 16 blocks rather than the one which is
actually bad.  I spent half an hour trying to fix a perfectly okay block
before figuring out what was going on.

I've modified icheck/dcheck so that if a read fails, they try again one
block at a time, so they can tell you which block was really bad.  (The
bread routine not only types the wrong block number, but also tries to
deal with the problem by returning a block of zeros; unfortunately what
it zeros out is the first 512 bytes of the buffer, not the 512 bytes which
actually are bad.)  I could distribute the code if anyone wants, but it's
trivial to write it yourself once you know it's needed.

Now, what I still don't understand is why the block which blew up in /dev/rrp3
simply generated an ECC console message but produced completely correct data
when I read /dev/rp3.  (This is an RM03 disk using the hp.c driver slightly
modified to match the RM geometry.)  Does the raw disk do less error handling?
I thought that at the lowest level it was the same read routine, and the
only difference is that the kernel buffers the cooked disk for you.  Wrong?