[comp.sys.hp] Fault tolerance

tgl@ZOG.CS.CMU.EDU (Tom Lane) (04/10/88)

The lesson to be drawn from this little affair is simple:

	Fault-tolerant systems must report the faults they tolerate,
	so that said faults can eventually be fixed.

There has been discussion of this point recently in comp.risks (see Tim Mann
in RISKS 6.53, Jerome Saltzer in 6.54).  I won't try to duplicate their very
well-written articles; go read 'em yourself.

This does make one wonder about Bob Campbell's recent claim of no failures
in the first year of Series 800 shipments.  Maybe they just *think* there've
been no failures...

-- 
				tom lane
Internet: tgl@zog.cs.cmu.edu
UUCP: <your favorite internet/arpanet gateway>!zog.cs.cmu.edu!tgl
BITNET: tgl%zog.cs.cmu.edu@cmuccvma

campbelr@hpsel1.HP.COM (Bob Campbell) (04/11/88)

> That is incredible!  But do I take it the HP Neeley Sales Region
> 825 that I did my benchmark on with the bad floating point is the
> FIRST failure?  . . .                                            
> 
> Roger N. Clark
> bgphp1!rclark

The 800 series is good, but it is not that good.  (Yet :-)  The banner
I believe refered to no crashes at a customer site in a year of all
shipments.

The sales office might be a better place to get MTBF stats, I can only speak
for the machines used here for testing.  The systems here are usually proto-
types and not release quality.   I will see what I can come up with in the
way of stats.

Bob Campbell