darrell@sdcsvax.UUCP (07/07/87)
Hello. We're doing research on the behaviour of distributed systems. And I'd like to solicit your help! What I need is data on the failure modes of systems. An example follows. CPU OS MTTF(S) MTTF(H) MTTR(S) MTTR(H) PMI PMD --- -- ------- ------- ------- ------- --- --- VAX-780 4.3BSD 240+60 8640+0 1+1 4+2 720 4 3B2 V.3 720+120 4320+0 3+1 48+8 - - NET MTTF(I) MTTF(R) MTTF(C) MTTR(I) MTTR(R) MTTR(C) --- ------- ------- ------- ------- ------- ------- Ether 17280+0 2160+24 8640+0 8+168 1+1 4+4 MTTF = mean time to failure MTTR = mean time to repair PMI = preventative maintenance interval PMD = preventative maintenance duration S = software H = hardware I = interface R = routing C = cabling The means are given as a constant term plus an exponentially distributed (random) term. A software failure typically means that the system hangs and has to be rebooted. A hardware failure means it hangs, halts or whatever and the man-from-DEC has to be called. Similar things for the network failures. Please look around your site, talk to your systems folk. Then send to me the information that you discover. I'll summarize and post it to the net. Your help is greatly appreciated! Darrell Long Department of Computer Science and Engineering, C-014 University of California, San Diego La Jolla, California 92093 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: sdcsvax!beowulf!darrell
darrell@sdcsvax.UUCP (07/07/87)
Hello. We're doing research on the behaviour of distributed systems. And I'd like to solicit your help! What I need is data on the failure modes of systems. An example follows. All times are specified in HOURS. CPU OS MTTF(S) MTTF(H) MTTR(S) MTTR(H) PMI PMD --- -- ------- ------- ------- ------- --- --- VAX-780 4.3BSD 240+60 8640+0 1+1 4+2 720 4 3B2 V.3 720+120 4320+0 3+1 48+8 - - NET MTTF(I) MTTF(R) MTTF(C) MTTR(I) MTTR(R) MTTR(C) --- ------- ------- ------- ------- ------- ------- Ether 17280+0 2160+24 8640+0 8+168 1+1 4+4 MTTF = mean time to failure MTTR = mean time to repair PMI = preventative maintenance interval PMD = preventative maintenance duration S = software H = hardware I = interface R = routing C = cabling The means are given as a constant term plus an exponentially distributed (random) term. A software failure typically means that the system hangs and has to be rebooted. A hardware failure means it hangs, halts or whatever and the man-from-DEC has to be called. Similar things for the network failures. Please look around your site, talk to your systems folk. Then send to me the information that you discover. I'll summarize and post it to the net. Your help is greatly appreciated! Darrell Long Department of Computer Science and Engineering, C-014 University of California, San Diego La Jolla, California 92093 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: sdcsvax!beowulf!darrell