[comp.archives] [comp.os.research] TR available: "Estimating the Reliability of Hosts Using the Internet"

darrell@sequoia.ucsc.edu (Darrell Long) (03/12/91)

Archive-name: internet/research/internet-host-reliability/1991-03-11
Archive: midgard.ucsc.edu:/pub/tr/ucsc-crl-91-06.ps.Z [128.114.14.6]
Original-posting-by: darrell@sequoia.ucsc.edu (Darrell Long)
Original-subject: TR available: "Estimating the Reliability of Hosts Using the Internet"
Reposted-by: emv@ox.com (Edward Vielmetti)


This is a substantial revision (all new experiments) of the earlier "A Study
of the Reliability of Internet Sites".

It's available via anonymous FTP from midgard.ucsc.edu (128.114.14.6) as
pub/tr/ucsc-crl-91-06.ps.Z

DL
-- 

        Estimating the Reliability of Hosts Using the Internet

        D. D. E. Long                         J. L. Carroll, C. J. Park
Computer & Information Sciences                 Mathematical Sciences
   University of California                   San Diego State University
     Santa Cruz, CA 95064                        San Diego, CA 92182

(408) 459-2616                              (619) 594-7242, (619) 594-6171

darrell@cis.ucsc.edu                      carroll@sdsu.edu, cjpark@sdsu.edu



                                ABSTRACT

Modeling the reliability distributed systems, whether through analysis or
simulation, requires a good understanding of the reliability of the
components.  Careful modeling allows highly fault-tolerant distributed data
bases and similar applications to be constructed at the least cost.

It is often assumed that the failure and repair rates of components are
exponentially distributed.  This hypothesis is testable for failure rates,
though the process of gathering and reducing the data to a usable form can be
difficult.  By applying an appropriate test statistic, some of the samples
were found to have a realistic chance of being drawn from an exponential
distribution, while others can be confidently classed as non-exponential.

For this study, data were collected from a large number of hosts via the
Internet with no special privileges or monitoring facilities.  Over 350,000
hosts were considered, and more than 68,000 of these that were judged
likely to respond were queried.  These hosts were sampled several times over
the course of two months to obtain up-times, and finally to determine average
host availability.  A rich collection of information was gathered in this
fashion, allowing estimates of availability, mean-time-to-failure (MTTF) and
mean-time-to-repair (MTTR) to be derived.  The results reported here
correspond with those seen in practice.