hackertj@egrunix.UUCP (Thomas J Hacker) (07/19/89)
Hello there!! I have 4.3 BSD running on a VAX 11/750 with a single ra81 disk drive, and I've been having some problems that I haven't been able to decode, and I'd like some advice/help: It all started after a power glitch (that seems to have made it through a UPS!) that toggled a fault light on the drive, but after power up, the light went out....spurious, I thought. Then I got this: Jun 27 11:27:57 unix vmunix: uda0: soft error, SDI error, unit 0, event 053, hdr 0x0 Jun 27 11:27:57 unix vmunix: 0x13 0x4 0x0 0x0 0x0 0x81 0x0 0x0 0x0 0x1 0x0 0x0 ...I thought I'd wait and see if it repeated: Jul 7 13:39:20 unix vmunix: uda0: soft error, SDI error, unit 0, event 053, hdr 0x0 Jul 7 13:39:20 unix vmunix: 0x13 0x4 0x0 0x0 0x0 0xa 0x0 0xf5 0x0 0x3 0x0 0x0 Now, I know that the SDI error means generally "something's wrong with the drive" So, I thought I would wait a day or two to see if it would repeat, then this came up: Jul 11 18:07:30 unix vmunix: mcr0: soft ecc addr 1a72 syn 73 I have this feeling that it's being caused by a bad spot on the disk. But, I can't seem to find any reference ANYWHERE to what device an mcr0: is!! I'm probably going to have DEC come out and reformat the drive, but before I go though all that, does anyone know any ways to get around this? (like how to convert the ecc addr's to things bad144 can comprehend?) Please respond by EMAIL, and I'll post responses. -Thanks!! Thomas Hacker Systems Programmer Oakland University hacker@unix.secs.oakland.edu hackertj@vms.secs.oakland.edu -- Thomas Hacker ...Weave a circle round him thrice, Systems Programmer And close your eyes with holy dread, Oakland University For he on honeydew hath fed, --"Kubla Khan" hackertj@unix.secs.oakland.edu And drunk the milk of Paradise. -- ST Coleridge
chris@mimsy.UUCP (Chris Torek) (07/19/89)
In article <115@egrunix.UUCP> hackertj@egrunix.UUCP (Thomas J Hacker) writes: >... I have 4.3 BSD running on a VAX 11/750 with a single >ra81 disk drive .... >... uda0: soft error, SDI error, unit 0, event 053, hdr 0x0 and >... mcr0: soft ecc addr 1a72 syn 73 These are probably unrelated. (I say `probably' because a memory error can cause many other spurious errors.) The `uda0, unit 0' error (`053') is an `sdi command timeout drive error'. It means the drive did not do something (an SDI command) within its normal alloted time. The controller and/or drive recovered from the error. The 4.3BSD-tahoe driver decodes the event numbers, and is generally much improved over the 4.3BSD driver. >But, I can't seem to find any reference ANYWHERE to what device an >mcr0: is!! `mcr' is a `memory controller'. `addr 1a72 syn 73' means that the error register contained 001A7273. In conjunction with a table for the memory boards, this will tell which chip on which board is going bad. >Please respond by EMAIL, and I'll post responses. Too late :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris