[net.unix-wizards] RA81 unreliability

pep@down.FUN (01/30/85)

I've had a LOT of trouble with RA81 discs.  I run a facility with eight RA81s
(along with RM80, RA80, RA60, & RL02 discs) distributed among six VAX 11/750s.
Of the eight 81s, I have had to replace four HDAs in the past year, and I may
have another dead one on my hands.

The Symptoms:  usually start out with soft errors, status/event codes of
	053, 0353, and sometimes 0213 and 0350.  Most of the time, these are
	followed by hard errors.  The errors usually become more severe (more
	frequent; proportion of hard errors increases) if the disc is left
	in service.  The problem has occurred under three versions of UNIX.

The Cause:  unknown.  I observe that the errors appear on a disc that has
	suddenly seen a lot of write activity, after performing reliably for
	months (e.g., convert to a new version of UNIX and restore data).
 
The Diagnosis:  DEC diagnostics (EVRLA) sometimes detect a problem (hard
	error), sometimes not.

The Remedy:  attempt to reformat the disc.  This has succeeded (and cured
	the problem) four times.  To date, reformatting has failed four
	times; these HDAs have been replaced.  Reformatting has failed in
	various ways:  usual complaints are failure to format LBN area,
	failure to format DBN or XBN area.  I have also seen a complaint
	that more than 12.5% of a track is bad.

One Field Service Hypothesis:  is that UNIX trashes DEC's area of the disc
	when it encounters a bad block, clobbering tables needed to reformat.
	Only twelve (hard) errors were reported on the latest RA81 before we
	attempted to reformat - reformatting still failed.

What I Believe:  I've heard that RA81s have been developing bad spots in the
	field.  (This is consistent with the war stories I've been trading
	with friends.)  UNIX doesn't forward the bad blocks, so the most
	attractive cure (?) is to reformat the disc.  Apparently the formatter
	used at the factory is more powerful than the one available in the
	field; if DEC's area of the disc is bad, the field formatter can't
	recover.  The HDA must be replaced; the old HDA is returned to the
	factory for reformatting, or marked usable for a VMS site (VMS does
	dynamic bad block forwarding).  

					Pat Parseghian
					Princeton Univ. EECS Dept.