[comp.sys.next] parity vs. ECC memory

mrc@Tomobiki-Cho.CAC.Washington.EDU (Mark Crispin) (12/03/90)

In my experience in dealing with semiconductor memory, memory failures
are usually catastrophic.  I rarely see a single bit in a single
address go bad; rather, I see the same bit in an entire sequence of
addresses go bad.

The usual result of such an error is a system crash.  In my opinion,
this is a good thing.  Once the system has been corrupted, the worst
thing for it to do is to charge on thinking it is alright for it to
continue writing on the disk.  A crash merely causes loss of your
unsaved work; continuing in such circumstances may cause the loss of
*all* your work.

Parity checking may help in fault diagnosis, if the system lasts long
enough to blat out an error message pointing at the bad memory.

If survival is important, then ECC memory is the way to go.  However,
ECC memory can lead you into a false sense of security; you may have
memory going bad without any warning.

A memory sweep at bootstrap and refusal to bring any suspect memory
online (plus prompt replacement of bogus memory) is probably a better
protection than anything else.

 _____   | ____ ___|___   /__ Mark ("Gaijin") Crispin "Gaijin! Gaijin!"
 _|_|_  -|- ||   __|__   /  / R90/6 pilot, DoD #0105  "Gaijin ha doko?"
|_|_|_|  |\-++-  |===|  /  /  Atheist & Proud         "Niichan ha gaijin."
 --|--  /| ||||  |___|    /\  (206) 842-2385/543-5762 "Chigau. Omae ha gaijin."
  /|\    | |/\| _______  /  \ FAX: (206) 543-3909     "Iie, boku ha nihonjin."
 / | \   | |__|  /   \  /    \MRC@CAC.Washington.EDU  "Souka. Yappari gaijin!"
Hee, dakedo UNIX nanka wo tsukatte, umaku ikanaku temo shiranai yo.

eps@toaster.SFSU.EDU (Eric P. Scott) (12/09/90)

I'd really like to see a customer-runnable* extensive memory
exerciser/diagnostic that can be booted from the ROM monitor.

*"customer-runnable" does NOT mean "only available to authorized
NeXT service representatives"

					-=EPS=-
-- 
Parity errors have an 11% "false positive" probability.  :-)