mark@mips.COM (Mark G. Johnson) (09/15/89)
In the previous posting <<26811@obiwan.mips.COM>> I used an approximation-
formula to simplify the probability expressions, without explicitly calling
attention to the approximation. Article <2280011@hpsal2.HP.COM> by
saxena@hpsal2.HP.COM (Nirmal Saxena) pointed out the inexactitude;
unfortunately, his modification was incorrect.
Sticklers-for-mathematical-precision might perhaps be interested in the
exact expressions, without using approximation formulae. They appear
below. Engineering approximations were given in <26811@obiwan.mips.COM>.
I recommend the engineering approach; among other advantages, it provides
expressions that are far easier to invert.
A single part whose Mean Time Between Failures is "m" units of time:
***************************************************************************
* Prob of a failure between time 0 and T is P(fail) = 1 - exp(-T/m) *
* Prob of not-failure is P(not-fail) = exp(-T/m). *
***************************************************************************
To compute the probability that one or more units out of a population of
50,000 will fail within 5 years, we simply compute the probability
that zero units will fail, and then realize that P(one or more failures)
is equal to 1.0 - P(no fails).
The probability of 0 failures among 50,000 units, is just the
probability that the first one doesn't fail, times the probability the
second one doesn't fail, times..... (i.e. P(no-fail) to the 50,000 power).
If the MTBF is 100 years and we want to find the prob of 0 failures after
5 years:
P(0 failures in 50,000 units) = [exp(-5/100)] ** 50000 == exp(-2500)
So the probability that there are one or more failures in the 50,000 units
is one minus P(no-fails); that is, [1.0 - exp(-2500)]. (very nearly 1).
In general we want to know the probability of (fewer than K failures)
over a specified time interval. The original article stipulated that the
Big Boss would fire the engineer if, during the 5-year product lifetime
there were 100 or more failures out of 50,000 installations in the field.
Thus the engineer wanted to have a large probability of (fewer than 100
failures). In the example we solved for the MTBF that gave a probability
of (100 or more failures) equal to 0.33; that is, the probability of
(fewer than 100 failures) was 0.67.
If each of N identical parts has an MTBF equal to "m" units of time,
*****************************************************************************
* *
* P(out of N parts, fewer than K failures from time 0 to time T) = *
* *
* Sum from i=0 to i=(K-1) {C(N,i) * (1 - exp(-T/m))^i * (exp(-T/M)^(N-i)} *
* *
*****************************************************************************
where the binomial coefficient C(N,i) is N! / (i! (N-i)!) and C(N,0)==1
So, in our example we set the probability equal to 0.67 and solve for m.
{Now you see why the engineering approximation is sometimes useful;
solving for m in the exact expression above is messy}. Utilizing a
numerical solution method, we find that m = 2619.6 years is the required
MTBF to give an 0.67 probability of (fewer than 100 failures over 5 years
among 50,000 parts).
Recall that the chip vendors proudly boast "1 century MTBF". So, using
the exact formula we find that this MTBF is 26 times too small; the Big
Boss will fire the design engineer. The engineering solution agreed;
it was a bit more conservative, dictating an MTBF of 75.7 centuries to
achieve fewer than 100 failures among 50,000 parts over 5 years.
--
-- Mark Johnson
MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
(408) 991-0208 mark@mips.com {or ...!decwrl!mips!mark}