[comp.unix.questions] Disk Problems on a 11/750 running 4.3 BSD

hackertj@egrunix.UUCP (Thomas J Hacker) (07/19/89)

Hello there!! I have 4.3 BSD running on a VAX 11/750 with a single
ra81 disk drive, and I've been having some problems that I haven't
been able to decode, and I'd like some advice/help:

It all started after a power glitch (that seems to have made it through
a UPS!) that toggled a fault light on the drive, but after power up, the
light went out....spurious, I thought.  Then I got this:

Jun 27 11:27:57 unix vmunix: uda0: soft error, SDI error, unit 0,
event 053, hdr 0x0
Jun 27 11:27:57 unix vmunix: 	0x13	0x4	0x0	0x0	0x0
0x81	0x0	0x0	0x0	0x1	0x0	0x0

...I thought I'd wait and see if it repeated:

Jul  7 13:39:20 unix vmunix: uda0: soft error, SDI error, unit 0,
event 053, hdr 0x0

Jul  7 13:39:20 unix vmunix: 	0x13	0x4	0x0	0x0	0x0
0xa	0x0	0xf5	0x0	0x3	0x0	0x0


Now, I know that the SDI error means generally "something's wrong with
the drive"
So, I thought I would wait a day or two to see if it would repeat,
then this came up:

Jul 11 18:07:30 unix vmunix: mcr0: soft ecc addr 1a72 syn 73

I have this feeling that it's being caused by a bad spot on the disk.
But, I can't seem to find any reference ANYWHERE to what device an
mcr0: is!!

I'm probably going to have DEC come out and reformat the drive, but before
I go though all that, does anyone know any ways to get around this?
(like how to convert the ecc addr's to things bad144 can comprehend?)

Please respond by EMAIL, and I'll post responses.

-Thanks!!
Thomas Hacker
Systems Programmer
Oakland University
hacker@unix.secs.oakland.edu
hackertj@vms.secs.oakland.edu








-- 
Thomas Hacker               ...Weave a circle round him thrice,
Systems Programmer             And close your eyes with holy dread, 
Oakland University	       For he on honeydew hath fed, --"Kubla Khan" 
hackertj@unix.secs.oakland.edu And drunk the milk of Paradise. -- ST Coleridge

chris@mimsy.UUCP (Chris Torek) (07/19/89)

In article <115@egrunix.UUCP> hackertj@egrunix.UUCP (Thomas J Hacker) writes:
>... I have 4.3 BSD running on a VAX 11/750 with a single
>ra81 disk drive ....

>... uda0: soft error, SDI error, unit 0, event 053, hdr 0x0
and
>... mcr0: soft ecc addr 1a72 syn 73

These are probably unrelated.  (I say `probably' because a memory error
can cause many other spurious errors.)

The `uda0, unit 0' error (`053') is an `sdi command timeout drive error'.
It means the drive did not do something (an SDI command) within its
normal alloted time.  The controller and/or drive recovered from the
error.

The 4.3BSD-tahoe driver decodes the event numbers, and is generally
much improved over the 4.3BSD driver.

>But, I can't seem to find any reference ANYWHERE to what device an
>mcr0: is!!

`mcr' is a `memory controller'.  `addr 1a72 syn 73' means that the
error register contained 001A7273.  In conjunction with a table for
the memory boards, this will tell which chip on which board is going
bad.

>Please respond by EMAIL, and I'll post responses.

Too late :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris