cameron@runx.ips.oz (James Cameron) (04/06/88)
I'm wondering what to do about errors reported by VAX/VMS on my RA60 disk. The rate of errors is increasing. Two weeks ago I would see one or two errors in one week. Now I see about three per day. I am concerned. According to ANALYZE/ERROR_LOG and VAXsim, the errors are ECC errors, which trigger an attempted bad-block replacement operation. The three types that I have seen are; Uncorrectable ECC error, Four symbol ECC error, and Five symbol ECC error. The bad block replacement operation reports 'BLOCK VERIFIED GOOD' on each error. I guess this means that the block that caused the error is still considered OK for use. Can anyone confirm this? (BLOCK VERIFIED BAD would therefore mean that it's confirmed bad and re-vectored) As for the cause and correction of the problem - I shall leave that up to DEC Field Service - it's their job. What I'd like to ask all of you is this; Is an 'UNCORRECTABLE ECC ERROR' an error that causes the operation to fail? (i.e. some user somewhere gets a stack dump). If so, how can I determine; a) What user (VMS process) encountered the problem? (The error log retains the full process ID - how do I translate this into the process ID used by ACCOUNTING and SHOW SYSTEM?) b) In what file, and also what virtual block number, did the error occur? The only information in the error log is the logical block number on the disk - how can I translate this LBN into a file-name (or id) and a VBN? I'd rather not try to dump the header of every file I'm worried about, there are rather a alot of them. Has anyone needed to do this before - did they write a program to scan INDEX.SYS? c) On what surface of the disk (head number?) did the error occur? If many errors are occurring on one head could this indicate that the head is damaged? (maybe leave this one to Field Service too, but the point is that the error log doesn't tell me where the logical block is - maybe this is just impossible in DSA) If anyone can offer any suggestions, I will listen. Please reply by mail - if there is sufficient interest I will summarise later. Some other questions... Does anyone know the password that Field Service should use to change the VAXsim_Monitor process error rate margins? F.S. down- under doesn't seem to know - I've asked. Is there any way to determine how many replacement blocks are available on a DSA disk? (RA60/RA81...) Or, how many blocks have been re-vectored? What is this difference between a four and five symbol ECC error? Thank you for taking the time to read this. James Cameron, Kilpatrick Green Pty. Ltd., P.O. Box N366, Sydney 2000 Australia Internet: cameron@runx.ips.oz.au UUCP: uunet!runx.ips.oz.au!cameron