[ca.unix] REQUEST FOR DATA

darrell@sdcsvax.UUCP (07/07/87)

Hello.  We're doing research on the behaviour of distributed systems.  And
I'd like to solicit your help! What I need is data on the failure modes of
systems.  An example follows.

CPU	OS	MTTF(S)	MTTF(H)	MTTR(S)	MTTR(H)	PMI	PMD
---	--	-------	-------	-------	-------	---	---
VAX-780	4.3BSD	240+60	8640+0	1+1	4+2	720	4
3B2	V.3	720+120	4320+0	3+1	48+8	-	-

NET	MTTF(I)	MTTF(R)	MTTF(C)	MTTR(I)	MTTR(R)	MTTR(C)
---	-------	-------	-------	-------	-------	-------
Ether	17280+0 2160+24	8640+0	8+168	1+1	4+4

MTTF = mean time to failure
MTTR = mean time to repair
PMI  = preventative maintenance interval
PMD  = preventative maintenance duration
S    = software
H    = hardware
I    = interface
R    = routing
C    = cabling

The means are given as a constant term plus an exponentially distributed 
(random) term.  A software failure typically means that the system hangs
and has to be rebooted.  A hardware failure means it hangs, halts or
whatever and the man-from-DEC has to be called.  Similar things for the 
network failures.

Please look around your site, talk to your systems folk.  Then send to me
the information that you discover.  I'll summarize and post it to the net.

Your help is greatly appreciated!

Darrell Long
Department of Computer Science and Engineering, C-014
University of California, San Diego
La Jolla, California  92093

ARPA: Darrell@Beowulf.UCSD.EDU
UUCP: sdcsvax!beowulf!darrell

darrell@sdcsvax.UUCP (07/07/87)

Hello.  We're doing research on the behaviour of distributed systems.  And
I'd like to solicit your help! What I need is data on the failure modes of
systems.  An example follows.  All times are specified in HOURS.

CPU	OS	MTTF(S)	MTTF(H)	MTTR(S)	MTTR(H)	PMI	PMD
---	--	-------	-------	-------	-------	---	---
VAX-780	4.3BSD	240+60	8640+0	1+1	4+2	720	4
3B2	V.3	720+120	4320+0	3+1	48+8	-	-

NET	MTTF(I)	MTTF(R)	MTTF(C)	MTTR(I)	MTTR(R)	MTTR(C)
---	-------	-------	-------	-------	-------	-------
Ether	17280+0 2160+24	8640+0	8+168	1+1	4+4

MTTF = mean time to failure
MTTR = mean time to repair
PMI  = preventative maintenance interval
PMD  = preventative maintenance duration
S    = software
H    = hardware
I    = interface
R    = routing
C    = cabling

The means are given as a constant term plus an exponentially distributed 
(random) term.  A software failure typically means that the system hangs
and has to be rebooted.  A hardware failure means it hangs, halts or
whatever and the man-from-DEC has to be called.  Similar things for the 
network failures.

Please look around your site, talk to your systems folk.  Then send to me
the information that you discover.  I'll summarize and post it to the net.

Your help is greatly appreciated!

Darrell Long
Department of Computer Science and Engineering, C-014
University of California, San Diego
La Jolla, California  92093

ARPA: Darrell@Beowulf.UCSD.EDU
UUCP: sdcsvax!beowulf!darrell