[comp.arch] Time between memory failure

davidb@braa.inmos.co.uk (David Boreham) (02/02/90)

In article <578@amara.UUCP> dan@rollmops.UUCP (Dan Kaiser) writes:
>I seem to recall that the mean time between a memory
>error for a megabyte dynamic ram memory is on the order
>of minutes.  Does anyone know actual numbers or have
>a reference?  I interested in values for both dynamic
>and static memory in (current) commercial technologies
>(i.e. chips I might find in my Mac or PC).
>
>Dan R. Kaiser
>Applied Dynamics International
>dan%amara.uucp@mailgw.cc.umich.edu

Well, real hard numbers for systems are difficult to get.
Firstly, soft errors in DRAMs are very unlikley today, compared
with the older generations of devices. Secondly, since the
incidence of soft errors is so low, one can't simply sit and
wait for a couple and from the time taken derive an MTTSE figure
(Mean Time To Soft Error). 

What the manufacuturers do is to perform
an accelerated test, using an alpha source (usually Americium 241)
do deliver a decent dose of radiation to the die. This tests the
die coats and the basic resistance of the RAM cell to bombardment.
I doesn't however tell you how the device will behave in real use.

Using some statistical tricks (probably all kludge factors), they 
then come up with an MTTSE number. Currently I have reliability
reports for a number of 1Meg and 4Meg DRAMs. These give figures
varying from 450FIT for a 1Meg down to an amazing 19.48 FIT for 
one 4Meg ! (FIT is Failures in 10E9 hours of device operation).
The stacked capacitor cells used on some 4Meg parts seem to help
a great deal. BTW, the report which gives 19FIT is all in Japanese
and I can only read the numbers and maths !

The difficulty in determining figures for systems built from these
devices stems not only from the fact that the quoted reliability 
of the DRAMs now gives much less concern than many of the other
components in the system but also from the fact that the reliability
changes with voltage, cycle time and how much you are actually
using the DRAMs.

Personal Opinion
----------------

No personal computer or workstation built nowadays needs parity or EDC.
Worry about soft errors only if you are doing work for the DOD, a bank,
medical systems or are using buckets of devices (we only considered using
EDC on a system which had 4Gbyte of memory).

End of Personal Opinion
-----------------------



P.S. Soft error rates for SRAMs are not generally published, but
I wouldn't be surprised if the error rate of a 1Mbit DRAM was about
the same as a 4Mbit DRAM (SRAM cells are not much stronger than DRAM
these days).
100FIT is a common spec for 1Mbit SRAM devices. BTW, this is not
total bull, I work on the same building as some very clever SRAM 
designers.


David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

terry@sunquest.UUCP (Terry Friedrichsen) (02/07/90)

In article <3938@ganymede.inmos.co.uk>, davidb@braa.inmos.co.uk (David Boreham) writes:
> No personal computer or workstation built nowadays needs parity or EDC.
> Worry about soft errors only if you are doing work for the DOD, a bank,
> medical systems or are using buckets of devices

Now wait.  I've seen a couple of instances of memory parity errors on
fairly new PCs.  Are you really trying to say that I'd be better off
not knowing about memory parity errors, and I should just let programs
quietly screw up?  Or am I missing your point in some way?  I know your
article addressed soft chip errors exclusively, but your conclusion seems
a bit strong.

Memory reliability involves more than just the RAM chip itself; there's
many a slip twixt the chip and the CPU.  I'll buy the idea that errors
are so infrequent that ECC is unnecessary overkill, but sorry, I just
GOTTA have that parity check ...

Terry R. Friedrichsen
TERRY@SDSC.EDU  (alternate address; I live and work in Tucson)

Disclaimer:  the company doesn't read my messages, so it can't
		possibly know what I'm saying!

shri@ccs1.cs.umass.edu (H.Shrikumar{shri@ncst.in}) (02/11/90)

In article <1911@sunquest.UUCP> terry@sunquest.UUCP (Terry) responds to:

><3938@ganymede.inmos.co.uk>, davidb@braa.inmos.co.uk (David Boreham)
>> No personal computer or workstation built nowadays needs parity or EDC.
>
>Now wait.  I've seen a couple of instances of memory parity errors on
>fairly new PCs.  Are you really trying to say that I'd be better off

   Just to comment that in the memory design,the tightest timing path is
the parity checker/generator. And with so much pressure to tighten
the memory timing, this path "might" be underdesigned.

  One of the PC's back home gives a RAM ERROR if the room gets a bit
hotter that usual... I think the DRAMs are fine, `cos the same DRAMs work
well with other PC's; but rather the parity checker timing is
marginal and acts up if the mercury creeps up a few degrees.

  If only I could disable the parity check when I am playing games on
a warm day :-)

-- shrikumar ( shri@ccs1.cs.umass.edu, shri@ncst.in )