chris@umcp-cs.UUCP (Chris Torek) (11/10/86)
In article <1537@oddjob.UUCP> matt@oddjob.UUCP (Matt Crawford) writes: >EVRLK ... formerly known as "rabads" ... appears on the rev 24 >diagnostic tape, and most FEs have no experience with it. > >Caveat: by default, EVRLK will write a forced error (FER) on the >replacement block. If it follows the recommended procedure for host-initiated bad block replacement (DECese for `fixing the pack'), it writes forced errors if and only if it is unable to read the original sector. >... If your goal is to avoid dump/format/restore you should answer >"No" to the prompt "Enable write with forced errors", avoiding the >FER mark and allowing you to attempt to clean up the disk with fsck >after your bad sectors have been forwarded. There is a problem with this. If the bad block is in the inode area of a cylinder group, or if it is in a directory, fsck may indeed be able to patch things up. In this case, preventing the forced error may be a good thing, as fsck may be able to get some of the original data, and any is better than none. If, however, the bad block is in the midst of a regular file, that file will now have garbage in the middle that (without the forced error) may be rather hard to find. A third possibility, of course, is that the block is free. In this case, the forced error modifier should be irrelevant, since the block should be written before it is ever read. (If the bad sector is in the tail part of block that has been fragmented, it might cause spurious error messages during dumps, if it is marked with a forced error. Dump itself will not complain, but the kernel will. Also, pack-to-pack copies will fail in the presence of forced errors.) So what *should* you do? Well, ideally, toss out that RA81 and buy an Eagle. It is cheaper, faster, and more reliable. (It is also---alas!---fifty megabytes smaller; but then, you may be able to buy two Eagles for the price of one RA81.) But if you cannot do that, run /etc/icheck to find out where the bad block lives: ra0g: hard error sn6674: unit 0, lbn 125436: uncorrectable ecc data error (code 8, subcode 7) ---your driver will no doubt print something cryptic instead, such as uda0: hard error datagram, hdr 1e9f2 event 0350 (Well, the former is *less* cryptic. At least I need not run a progmam to decode it!) Take the sector number (1e9f2 = 125346) and subtract the start for the appropriate partition; this should come out approximately (if not exactly) equal to the sector number in the `ra0g' part. Feed this value to icheck -b: % icheck -b 6674 /dev/rra0g Icheck will print something like this: 6672 arg; frag 0 of 8, inode=2054, class=logical data block 837 which tells me that sector 6672 is owned by inode 2054 as a data block. Turn the inode into a path name with /etc/ncheck: % ncheck -i 2054 /dev/rra0g 2054 /israel/sendmail.ps (which, since I was actually doing this on rhp3d, which is mounted as /tmp, means the file which owns sector 6672 is /tmp/israel/sendmail.ps). Armed with all this information about the bad block, tell EVRLK not to use the forced error modifier, then go clean up your file system by hand afterward. At worst, you can use /etc/clrblk to zero out the sector. Oh: you have to write clrblk first. For that matter, I would have to write it too---for the fourth time, at the very least.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu