[comp.os.research] Why Machines Crash

wilkes@hplajw.hpl.hp.com (John Wilkes) (07/03/90)

In article <4667@darkstar.ucsc.edu>, mis@Seiden.com (Mark Seiden) writes:
> marzullo@cs.cornell.edu (Keith Marzullo) writes:
> >In March of 1986 at the IBM Workshop on Fault-Tolerant Distributed Computing
> >Jim Gray talked about a study he did at Tandem about the reasons machines
> >crash (most likely: operator pushed reset). Does anyone have a reference to
> >a published version of this study?

You might also try:

%z InProceedings
%K Gray86a
%A Jim Gray
%T Why do computers stop and what can be done about it?
%C Proc. 5th Symp. on Reliability in Distrib. Software and Database Sys.
%D 1986
%P 3 11
%p IEEE Computer Society Press, catalog number 86CH2260--8
%x An analysis of the failure statistics of a commercially available
%x fault-tolerant system shows that administration and software
%x are the major contributors to failure.

john wilkes
(sorry for the arcane format)