jjb@ares.cs.wayne.edu (Jon J. Brewster) (02/22/90)
This concerns a uVAX3600 running Ultrix V3.1. We've recently started having problems with dump -- we get lots of bread messages with block no.s and a final one that says "More than 32 block read errors from 64884" Uerf shows many bad block replacement events for that disk, all for the same LBN, 177837. However, it also lists "PREVIOUS RBN 0." and "NEW RBN 0.". I assume that RBN means Replacement Block Number. It also says BAD BLOCK REPL CAUSE x0048. A couple of puzzlements: when I do the icheck/ncheck two-step on the block no.'s listed in the "(This should not happen) bread..." messages, I find that the blocks are located in a very few files (i.e., many blocks pointed at by far fewer inodes). Further, the inode numbers are almost consecutive. Seems somehow reasonable. However, icheck claims that block 64884 is a free block. So, what does that last message indicate? Second, I would hope that the problems uerf reports would somehow relate to this, but I can't see any relationship between the LBN and any of the bread errors. Does anyone know the significance of the RBN 0 lines? The drive is an old one, left over from an 11/780 of yesteryear. I wonder if the lines mean that it's incapable of doing bad block replacement? (I've run radisk on the disk, and it seems to have no effect. The -s option always gets caught in a loop saying that LBN 177837 is replaced, and the -r option returns with no message.) Please e-mail me and I'll summarize if there's any interest. TIA, -- Jon J. Brewster jjb@cs.wayne.edu ...!umich!wsu-cs!jjb
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (02/22/90)
In article <1091@wsu-cs>, jjb@ares.cs.wayne.edu (Jon J. Brewster) writes: > This concerns a uVAX3600 running Ultrix V3.1. We've recently started > having problems with dump -- we get lots of bread messages with block > no.s and a final one that says "More than 32 block read errors from > 64884" Uerf shows many bad block replacement events for that disk, all > for the same LBN, 177837. However, it also lists "PREVIOUS RBN 0." and > "NEW RBN 0.". I assume that RBN means Replacement Block Number. It > also says BAD BLOCK REPL CAUSE x0048. I haven't studied the BBR algorithm closely enough to see what is really going on, but it sounds like something is causing BBR to break. I saw a problem similar to this on an V3.x field test, but it was fixed. As a guess it could be the RCT is corrupted and causing BBR to fail in mysterious ways. Generally when BBR doesn't work you'll get message like: Media is write-protected. Back up media and reformat. > > Jon J. Brewster > jjb@cs.wayne.edu > ...!umich!wsu-cs!jjb For reference: RBN - Replacement Block Number. RCT - Replacement Control Table. LBN numbers from radisk are reported relative to the beginning of the disk, where most other ULTRIX utilities report block numbers relative to the file system (partition). Solution? You'll want to back up as much as the disk as you can. Since dump doesn't work try tar, but you'll need to avoid the file with the messed up blocks. Since ncheck and icheck are being useless try find and cp: find file-system -print -exec cp {} /dev/null \; After getting a back up have Field Service format the disk. -- Alan Rollow alan@nabeth.enet.dec.com