[comp.arch] Computers: The New Generation

hunt@spar.SPAR.SLB.COM (Neil Hunt) (09/10/87)

In article <1184@itm.UUCP> danny@itm.UUCP (Danny) writes:

> [...]
>    How about a computer with say, 300 Meg of RAM.  There is, also,
> [...]
>    Nevertheless, whatever else may be happening, a scoo-bah of memory
>has mucho apeal.  Comments?
>

Better make sure that it has full error correction !

On a Sun 3, I believe that you can put 28 Mbytes of mem, at which time
you should expect a parity error to be detected about once a month,
with current technology. Thus 300 Megabytes will get a soft error every
three days or so (bit of a pain !). I understand that Sun 4s will have error
*correction* hardware so that they can correct single bit (?) errors,
and thus go to larger memories without crashing too often.

Does anyone know about soft failure modes of DRAMs ? How likely
is it to find double bit errors ? With denser and denser memory chips,
one might expect that one day soon, background alpha particles will be
able to flip several adjacent bits.

By the way, my dream machine would have much more than 300 M ! Some
people here have swap discs in the 100s of M on their lispms, and
still could use more ! Also I don't know why you would have a conventional
disc to back up your DRAM. I would trust my (EC) memory more than
a disc, but do conventional type backups on an Optical WORM disc
now and then.

Neil/.

bobw@wdl1.UUCP (Robert Lee Wilson Jr.) (09/11/87)

Concerning the failure rates for DRAMS: There are data available,
from the manufacturers and others, but most present DRAM
configurations for large memories are arranged so that two-bit (and
more) errors come from simultaneous failures in two DRAM chips (or
associated logic) rather than a double bit failure in a single chip.
Most DRAM cips (again I meanas used in for large memories) are 1 bit
wide by 256K or 1M or 4M or .... bits capacity. If your memory design
is n bits wide (including whatever checking bits are used) it
is typically composed of some multiple of n memory chips, in blocks.
In each block are n chips, each holding 1 bit out of 256K (or 1M,
etc.) locations. Thus for the cosmic ray to have a multi-bit effect
it must simultaneously affect several chips, and moe than that must
affect those chips in the same bit locations. That certainly is
possible but it seems less likely than affecting several bits in one
chip, and probability is the central issue when designing codes to
handle different kinds of errors. This appears again when you look
at other singl-point-of-failure possibilities. Since a single
failure in some auxilary logic might easily produce all 1's or all
0's, some ECC schemes are careful to detect those failures as
special cases, even though they are many-bit errors.

richard@aiva.ed.ac.uk (Richard Tobin) (09/11/87)

In article <797@spar.SPAR.SLB.COM> hunt@spar.UUCP (Neil Hunt) writes:
>Does anyone know about soft failure modes of DRAMs ? How likely
>is it to find double bit errors ? With denser and denser memory chips,
>one might expect that one day soon, background alpha particles will be
>able to flip several adjacent bits.

I don't know how likely such adjacent bit errors are, but it shouldn't matter
much.  Most memory chips are <some large number> x 1 bit, which means that a
given byte will consist of a bit from each of 8 chips.  So an error of the kind
described will produce correctable 1-bit errors in several adjacent bytes,
rather than an uncorrectable multi-bit error in one byte.

> Thus 300 Megabytes will get a soft error every
> three days or so (bit of a pain !).

If this is accurate, it means that a given byte has a 1-in-10^9 chance of
getting a single-bit error in a given day, which means the chance of it getting
2 errors in one day (from different alpha particles) is 1-in-10^18 - fairly
safe, since to provoke an uncorrectable error, the second bit has to be
corrupted before error-correction puts the first one right (this suggests that
you should make sure all your physical memory gets read frequently).
-- 
Richard Tobin,                         JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,             ARPA:  R.Tobin%uk.ac.ed@nss.cs.ucl.ac.uk
Edinburgh University.                  UUCP:  ...!ukc!ed.ac.uk!R.Tobin