BH@MIT-AI@sri-unix (06/06/82)
From: Brian Harvey <BH@MIT-AI> Date: 23 May 1982 20:56-EDT It came to my attention the other day that the message you get on the console from the kernel when a disk read gets bad data does not tell you the actual bad block, but rather the first block the program tried to read. icheck and dcheck read inode blocks 16 at a time. They compound the problem by themselves (in the routine bread) typing out a message which also refers to the first of the 16 blocks rather than the one which is actually bad. I spent half an hour trying to fix a perfectly okay block before figuring out what was going on. I've modified icheck/dcheck so that if a read fails, they try again one block at a time, so they can tell you which block was really bad. (The bread routine not only types the wrong block number, but also tries to deal with the problem by returning a block of zeros; unfortunately what it zeros out is the first 512 bytes of the buffer, not the 512 bytes which actually are bad.) I could distribute the code if anyone wants, but it's trivial to write it yourself once you know it's needed. Now, what I still don't understand is why the block which blew up in /dev/rrp3 simply generated an ECC console message but produced completely correct data when I read /dev/rp3. (This is an RM03 disk using the hp.c driver slightly modified to match the RM geometry.) Does the raw disk do less error handling? I thought that at the lowest level it was the same read routine, and the only difference is that the kernel buffers the cooked disk for you. Wrong?