[comp.sys.dec] 4.2 BSD Installation problem

msitd22@ms3.UUCP (Jim Chappell) (12/16/86)

I am installing virgin 4.2 on several VAX 11/780's.
I have succeeded with 2 of them but have hit a snag with the third one.
I can load the mini and full root, but in tarring in /usr on /dev/ra0h, I
get the following errors, usually followed by a hung system.

/* One iteration of */
uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr  0xb1558
/* several iterations of */
uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 


The sequence can repeat with a higher hdr number.

This VAX has two memory controllers with 4 MB on mcr0 and 2 MB  on mcr1;
one uda50; and 3 ra81 drives.

DEC has been in several times and can't isolate the problem.

I'll appreciate any help the net can offer.

Thanks,


Jim Chappell  ...!seismo!vrdxhq!ms3!jrc 
ISN Corp.  (703) 979-8900
1235A Jeff Davis Hwy, Suite 605A
Arlington, Va 22202

-- 
Jim Chappell  ...!seismo!vrdxhq!ms3!jrc 
ISN Corp.
1235A Jeff Davis Hwy, Suite 605A
Arlington, Va 22202

chris@mimsy.UUCP (Chris Torek) (12/17/86)

In article <491@ms3.UUCP>, msitd22@ms3.UUCP (Jim Chappell) writes:
>I am installing virgin 4.2 on several VAX 11/780's.

(Why?  You should be installing 4.3.  Anyway...)

>uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr  0xb1558
>uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 

The 4.2BSD UDA50 driver prints `hard error' messages for many soft
errors.  If the hard error message is not followed by another error
of the form `ra%d: hard error sn%d', it was a soft error.

`event 0353' is, I think, a `1 symbol ecc data error'.  `hdr 0xb1558'
indicates that the error was associated with logical block 0xb1558.
None of this is very reliable information.

The 4.3BSD driver is only slightly better.  A much-improved driver
may soon be available.  (CDC is replacing an HDA here; until then
the driver is not going anywhere.)  This one has pinpointed a number
of RA81/UDA50 problems locally, from failing HDAs to a bad port on
a UDA50.  (The last was a guess on my part: `ctlr detected pulse
or parity data error' sounded like a problem in the drive-to-controller
cables or serial interface.  It was.  VMS diagnostics could not
find the problem, but moving the cables proved it.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu