[comp.unix.wizards] Help with uda errors

Hampton@dockmaster.arpa (David R. Hampton) (12/17/88)

I am running BSD 4.2 on a VAX 11/785, using the University of Maryland
uda driver.  The modification time on my driver is 19 December 1985.  I
just started seeing numerous uda errors today on one of our three
drives.  These errors are occurring on a RA60 disk pack that is being
used as one large partition.  Most of the errors are soft errors, but
there is an occasional hard error thrown in.  Could someome please
explain to me what they mean.  Thanks much.

P.S.  DEC was just here yesterday for our monthly maintenance visit.  I
had not seen any errors before then.  Could they have changed something
that would cause this?

--------------------------------------------------

uda0: soft error, disk transfer error, unit 1, grp 0x0, hdr 0x3f038,
      event 0313
100040 80239df8 1 cb4102 96e8 1060000 10005 381b 2040000 103 66f55acb
      3f038 60100a8 0 0 0
uda0: soft error, SDI error, unit 1, event 0353, hdr 0x0
100040 80239df8 1 eb4103 96e8 1060000 10005 381b 2040000 103 66f55acb
      0 c0014b 2051700 2f000401 0
uda0: soft error, disk transfer error, unit 1, grp 0x0, hdr 0x4cce8,
      event 0313
100040 8023aa70 1 cb4102 96e8 1060000 10005 381b 2040000 103 66f55acb
      4cce8 40102fd 0 0 0
uda0: soft error, SDI error, unit 1, event 0353, hdr 0x0
100040 8023aa70 1 eb4103 96e8 1060000 10005 381b 2040000 103 66f55acb
      0 c0041b 504c800 2f0004e2 0
uda0: soft error, SDI error, unit 1, event 0353, hdr 0x0
100040 8023a4f8 1 2b4103 96e8 1060000 10005 381b 2040000 103 66f55acb
      0 c0041b 504c800 2f0004e2 0
uda0: hard error, disk transfer error, unit 1, grp 0x0, hdr 0x4cce8,
      event 0313
100040 8023b138 1 cb0102 96e8 1060000 10005 381b 2040000 103 66f55acb
      4cce8 c0041b 2045500 2f000462 0
ra1c: hard error sn314600 status 313

etc, etc.

David R. Hampton
Hampton @ Dockmaster.Arpa
301/859-4537

chris@mimsy.UUCP (Chris Torek) (12/21/88)

In article <17846@adm.BRL.MIL> Hampton@dockmaster.arpa (David R. Hampton)
writes:
>I am running BSD 4.2 on a VAX 11/785, using the University of Maryland
>uda driver.  The modification time on my driver is 19 December 1985.

You have an ancient edition (but then, you have 4.2BSD...).  It probably
mostly works.  I have nothing decent for 4.2BSD though.

>uda0: soft error, disk transfer error, unit 1, grp 0x0, hdr 0x3f038,
>      event 0313

The important number is the `event'.  0313 translates to `lost receiver
ready drive error'.  `hdr 0x3f038' here says that the drive was working
on block 258104---probably irrelevant in this case.  The receiver probably
refers to the UART receiver for the serial cable between the controller
and the drive.  (This is a WAG.)

>uda0: soft error, SDI error, unit 1, event 0353, hdr 0x0
>uda0: hard error, disk transfer error, unit 1, grp 0x0, hdr 0x4cce8,
>      event 0313

I have never figured out what makes an error an `SDI error', but 0353
is `drive detected error drive error'---not very informative, other than
that the drive's error checking code thinks something is wrong.  Here
you got one lost-receiver-ready, so it retried and got a drive-detected-
error, and gave up.

>P.S.  DEC was just here yesterday for our monthly maintenance visit.  I
>had not seen any errors before then.  Could they have changed something
>that would cause this?

Check the cables.  One is probably loose, or bent.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

eap@bu-cs.BU.EDU (Eric Pearce) (12/28/88)

  We had some similar errors on a RA81 on a VAX 750 running 4.3 BSD:

  Dec 20 17:14:32 bucsb vmunix: uda0: soft error, disk transfer error, 
  unit 2, grp 0x0, hdr 0xd4c27, event 0650
  Dec 20 17:14:32 bucsb vmunix: uda0: soft error, disk transfer error, 
  unit 2, grp 0x0, hdr 0xd4ef1, event 0650
 
  I did a level 0 dump of the entire disk.  DEC ran 'EVRLB' from the
  diag supervisor and then I restored the disk.  So far, no errors.

  Was this the "correct" thing to do?  DEC said the drive was ok as far
  as they could tell, but they would replace the HDA if I got errors 
  after doing the format.

  I would be interested in hearing about tools I could use to map out
  bad blocks or sectors (I have done this without too much trouble on
  the Sun, Encore and Celerity products).

 -e
-- 
-------------------------------------------------------------------------------
 Eric Pearce                                   ARPANET eap@bu-it.bu.edu
 Boston University Information Technology      CSNET   eap%bu-it@bu-cs
 111 Cummington Street                         JNET    jnet%"ep@buenga" 
 Boston MA 02215                               UUCP    !harvard!bu-cs!bu-it!eap 
 617-353-2780 voice  617-353-6260 fax          BITNET  ep@buenga

chris@mimsy.UUCP (Chris Torek) (12/29/88)

In article <26927@bu-cs.BU.EDU> eap@bu-cs.BU.EDU (Eric Pearce) writes:
>Dec 20 17:14:32 bucsb vmunix: uda0: soft error, disk transfer error, 
>unit 2, grp 0x0, hdr 0xd4c27, event 0650

0650 is a 6-symbol ecc error (therefore correctable and corrected).
The 4.3BSD-tahoe driver decodes these things, printing something like

	uda0: soft error, disk transfer error: unit 2, lbn 871463:
	6 symbol ecc error (code A, subcode B)

The code and subcode are there in case DEC suddenly define new error codes.

>I did a level 0 dump of the entire disk.  DEC ran 'EVRLB' from the
>diag supervisor and then I restored the disk.  So far, no errors.

I cannot keep them straight, but presumably EVRLB forwards any marginal
sectors it finds.

>Was this the "correct" thing to do?

DEC have several `formatters' for forwarding bad sectors.  None of
them are capable of restoring an HDA to a virgin state, but at least
one of them (once called `rabads') can forward a sector by LBN.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris