[net.micro] Parity checking

jbn@wdl1.UUCP (John B. Nagle) (08/15/84)

     Machines without parity checking must be considered only 
slightly above the toy level.  Intermittent errors are a continual
nagging problem in such machines.  The IBM PC has parity checking;
the PC Jr does not.  The TI Professional does not, and suffers from
intermittent problems because of it.  Any machine costing over $1000
should unquestionably have parity checking; below that level, there is
some argument for economy, but personally I would go for parity all the
way down to the appliance control processor level.
     In a small computer general-purpose operating system, parity errors in 
user space should kill the job involved and display a message, not crash
the machine.  Parity errors in system space should crash the machine with
a message.  More elaborate strategies are possible; this is a minimum.
     Power supplies should be designed such that if the output voltage
deviates from the rated value, the machine goes down.  A zener in the
right place will accomplish this.  It is better to crash fully than
have an undetected error.  Again, more elaborate strategies are possible,
such as power fail interrupts, but just plowing on is a bad idea.
     If you build an unreliable machine, it will not sell.  Remember
the Coleco Adam?
     					
				John Nagle

robison@eosp1.UUCP (Tobias D. Robison) (08/17/84)

References:

Parity checking is a flawed feature of limited use unless you and
your system together can decide what an error means, and what to
do about it.  Here's a case in point:

	- You are editing a file, and it's been a while since your
	  last backup, when suddenly the dreaded parity error occurs.
	  Your data in memory is lost, and you must reboot.
	  And yet, the only thing that was wrong was that part of the
	  operating system needed for CRT output was bombed.  You
	  should have been able to store your document.  In fact, a
	  system without parity checking WOULD have let you store your
	  document.

Good parity checking requires the following:

	- The system will try to continue running after a parity
	  error.  It's your choice whether it should:
	    + just continue what you were doing
	    + execute routines in ROM designed to store data on disk
	      before rebooting.
	    + try to reload the erroneous data from disk and continue
	    + etc...

	- You and the system must be able to tell the difference
	between kinds of parity errors:
	   + to OSYS code
	   + to applications code
	   + to OSYS data
	   + to applications data.

	You want to try different things in these cases, and a good
	system will check ALL memory, tell you what problem(s) you
	have, and prompt with appropriate recoveries.

	 In general, you need a system that keeps data and instructions
	 separate, so you can distinguish the above cases.

Without the choices of recovery, you know that a parity error will ruin
your current work.  You don't know, however, that your current work is
accurate in the absence of a parity error; there may have been a memory
error not detected by parity checking; there may have been a bug;
you still have to do sanity checks to make sure your work is OK.

- Toby Robison (not Robinson!)
allegra!eosp1!robison
decvax!ittvax!eosp1!robison

jones@fortune.UUCP (08/22/84)

#R:wdl1:-38400:fortune:28000048:000:549
fortune!jones    Aug 21 14:59:00 1984

I believe that all the actions Toby refers to come under the heading
of error recovery after the parity hit.  It is certainly true that the
way  an error is handled makes a big difference in user satisfaction.
However, the fact that an error is handled poorly does not negate the
general importance of detecting the error.

Dan Jones (Remember! In vollyball you can only score when you serve!)

UUCP:	{ihnp4,ucbvax!amd,hpda,sri-unix,harpo}!fortune!jones
DDD:	(415)594-2440
USPS:	Fortune Systems Corp, 101 Twin Dolphin Drive, Redwood City, CA 94065