[comp.bugs.4bsd] 4.2 BSD Installation problem

msitd22@ms3.UUCP (Jim Chappell) (12/16/86)

I am installing virgin 4.2 on several VAX 11/780's.
I have succeeded with 2 of them but have hit a snag with the third one.
I can load the mini and full root, but in tarring in /usr on /dev/ra0h, I
get the following errors, usually followed by a hung system.

/* One iteration of */
uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr  0xb1558
/* several iterations of */
uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 


The sequence can repeat with a higher hdr number.

This VAX has two memory controllers with 4 MB on mcr0 and 2 MB  on mcr1;
one uda50; and 3 ra81 drives.

DEC has been in several times and can't isolate the problem.

I'll appreciate any help the net can offer.

Thanks,


Jim Chappell  ...!seismo!vrdxhq!ms3!jrc 
ISN Corp.  (703) 979-8900
1235A Jeff Davis Hwy, Suite 605A
Arlington, Va 22202

-- 
Jim Chappell  ...!seismo!vrdxhq!ms3!jrc 
ISN Corp.
1235A Jeff Davis Hwy, Suite 605A
Arlington, Va 22202

chris@mimsy.UUCP (Chris Torek) (12/17/86)

In article <491@ms3.UUCP>, msitd22@ms3.UUCP (Jim Chappell) writes:
>I am installing virgin 4.2 on several VAX 11/780's.

(Why?  You should be installing 4.3.  Anyway...)

>uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr  0xb1558
>uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 

The 4.2BSD UDA50 driver prints `hard error' messages for many soft
errors.  If the hard error message is not followed by another error
of the form `ra%d: hard error sn%d', it was a soft error.

`event 0353' is, I think, a `1 symbol ecc data error'.  `hdr 0xb1558'
indicates that the error was associated with logical block 0xb1558.
None of this is very reliable information.

The 4.3BSD driver is only slightly better.  A much-improved driver
may soon be available.  (CDC is replacing an HDA here; until then
the driver is not going anywhere.)  This one has pinpointed a number
of RA81/UDA50 problems locally, from failing HDAs to a bad port on
a UDA50.  (The last was a guess on my part: `ctlr detected pulse
or parity data error' sounded like a problem in the drive-to-controller
cables or serial interface.  It was.  VMS diagnostics could not
find the problem, but moving the cables proved it.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

sechrest@orstcs.UUCP (sechrest) (12/18/86)

We have sometimes had similar problems on our RA81`s. I have no
specific understanding of what is going on, but it seems that we 
can eliminate some class of disc errors by replacing the spindle 
brushes and insuring good clean contacts on the ground wires.

There was a suggestion on the net a long time ago that it was a static
build up on the spindle. Dec has said that this should not happen, but
they come in and change the brushes when I ask them.

The normal sequence of failures starts out with a couple of errors and
then it slowly builds up to many errors. The HDR errors are generally not
the same. Ie they seem to move around the disc. If they get too bad
the disc will get some serious trash on it, but we fix the problem before 
that.

					John Sechrest
					sechrest@orstcs