[net.micro] another *parity* argument

kurt@fluke.UUCP (Kurt Guntheroth) (08/08/84)

Here is another thought.  What do you do with parity once you detect an
error.  This is true for hospitals and fire stations too.  A parity
error or other non-recoverable memory error trashes whatever you were
doing.  If it was an instruction fetch, the state of the machine is
trashed unless you are lucky to have a machine like the 32000 that can
do instruction restart (does the 68010 design recover from parity
errors?) Even if parity error happened in data, you may have changed
the status flags or some such.  Anyway, for many processors, there is
no way to recover from a parity error, so all parity does for you is
tell you that things have just been trashed.  Typical recovery from
parity errors is to halt waiting for reset.

So even for critical applications, parity may not be enough.  Either you
have ECC or you forget it.
-- 
Kurt Guntheroth
John Fluke Mfg. Co., Inc.
{uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt

wall@fortune.UUCP (Jim Wall) (08/14/84)

   Remember that a parity error is telling you that your memory
cannot be trusted, at all! Memory parity errors are all hard, once
failed, they can never be trusted until rewritten. Usually if one
location is bad then other locations will be as well.

    What most systems do upon a detection of a parity error is 
jump to a ROM based error routine that sends an error message
to the appropriate output device and gracefully brings itself
down. It never tries to recover any data or program. I know people
will scream over this, but you cannot be sure of the memory 
anymore, so trying to save portions of it will just cause grief.

Remember that memory failures are rarely just one address affected,
most are caused by power surges, ESD, and other similar events.

EDC is better, but not at all perfect, and for the cost isn't worth
it in my book. Now fault tolerence is entirely a different story....

					-Jim Wall
					!amd!fortune!wall