kurt@fluke.UUCP (Kurt Guntheroth) (08/08/84)
Here is another thought. What do you do with parity once you detect an error. This is true for hospitals and fire stations too. A parity error or other non-recoverable memory error trashes whatever you were doing. If it was an instruction fetch, the state of the machine is trashed unless you are lucky to have a machine like the 32000 that can do instruction restart (does the 68010 design recover from parity errors?) Even if parity error happened in data, you may have changed the status flags or some such. Anyway, for many processors, there is no way to recover from a parity error, so all parity does for you is tell you that things have just been trashed. Typical recovery from parity errors is to halt waiting for reset. So even for critical applications, parity may not be enough. Either you have ECC or you forget it. -- Kurt Guntheroth John Fluke Mfg. Co., Inc. {uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt
wall@fortune.UUCP (Jim Wall) (08/14/84)
Remember that a parity error is telling you that your memory cannot be trusted, at all! Memory parity errors are all hard, once failed, they can never be trusted until rewritten. Usually if one location is bad then other locations will be as well. What most systems do upon a detection of a parity error is jump to a ROM based error routine that sends an error message to the appropriate output device and gracefully brings itself down. It never tries to recover any data or program. I know people will scream over this, but you cannot be sure of the memory anymore, so trying to save portions of it will just cause grief. Remember that memory failures are rarely just one address affected, most are caused by power surges, ESD, and other similar events. EDC is better, but not at all perfect, and for the cost isn't worth it in my book. Now fault tolerence is entirely a different story.... -Jim Wall !amd!fortune!wall