[net.micro] net.digital: Is parity *really* worth it?

kalagher@MITRE.ARPA (08/02/84)

From:  Dick Kalagher <kalagher@MITRE.ARPA>

A parity check certainly cannot, by itself, ensure that the ambulance will
go to the right street. In fact, the difference in reliability of
the entire system will probably be negligible with or without
parity. Also, let's not fool ourselves into thinking that a parity
check will make the computer system 100% reliable.  There are other
sources of potential error and even the parity will be in error
sometimes. No system is error free and there is always a cost-performance-
reliability tradeoff that must be made.  If I take your error to the
(make that argument, not error) to theextreme than I need a complete
redundent system, probbly with an independent main power source.

dfarber@UDEL-EE.ARPA (08/02/84)

From:      Dave Farber <dfarber@UDEL-EE.ARPA>

It is a bad situation to have an error that corrupts a data base and have
no idea anything went wrong. The main benefit of parity is you know
you are in trouble and that is worth a whole lot. If you want to
recover then ecc is the route (and a heavy wallet)

witters@fluke.UUCP (John Witters) (08/02/84)

I feel the primary advantage of parity is to let a user know when the memory
has suddenly gone "flakey".  In this case, I think it is better to crash the
system rather than let it run and corrupt data (i.e. disk file directorys).
The parity detection prevents a user from operating a system until the problem
is corrected.  I agree that memory chips these days are extremely reliable.
However, this doesn't console the poor user unlucky enough to have a chip go
bad that wipes out all data since the last backup.

Another advantage is that parity will prevent a system from exhibiting bizzare
and un-reproducable behavior due to a bad memory.  Parity will crash the system
before this behavior occurs, and will immediately indicate where the problem
lies.

The primary disadvantage of parity is soft errors.  This causes complaints from
users that the system normally works O.K., but reports a parity error roughly
once a month (or three months or six months).  This problem can be solved with
more hardware.  The approach is to treat the error like a page fault in a
virtual memory system:  branch to the error routine, re-read the memory cell to
find out if it is a hard error, then restart the instruction that caused the
error.  If it was a hard error, then crash the system.  This requires a
processor with instructions that can be halted and re-started, and hardware to
record where memory errors occur in, addition to the parity detection
circuitry.  Of course, all this extra hardware will reduce reliability.  Most
people prefer to deal with a few extra user complaints than add this kind of
hardware.  If the system already uses virtual memory, it may not take much
extra hardware to use this scheme.

The decision to use parity depends on the system.  It doesn't make much sense
for a home computer that has a total of eight memory chips.  It makes much more
sense for a workstation that has a few hundred chips.


						John Witters
						John Fluke Mfg. Co. Inc.
						P.O.B. C9090 M/S 243F
						Everett, Washington  98206

						(206) 356-5274

wall@fortune.UUCP (Jim Wall) (08/03/84)

   As most people have stated already, parity is really determined
by the market you are trying to sell to. Board space, cost, complexity
are all really second to whether you win or lose sales of the final
product. We are of course looking at this from the perspective of a 
designer building a product to be marketed and sold, and not from the
viewpoint of a consumer looking to buy a machine. 

   But that isn't what this article is about, this one is about EDC
error detection and correction. Personally, I'm against it.  All the
simple EDC codes, such as Hamming codes, by adding three more bits
per byte, you are gaurenteed of correcting all single bit errors,
detecting all double bit errors and have a high probability of 
detecting multiple bit errors. Notice that you can only correct
single bit errors. In this era of ESD, power spikes, and electrical
noise, you are most often subjected to massive memory corruption, and
rarely is the end result of a memory hit a single bit error. THere 
have been studies done on this, but someone must have my copy; I'll
look for it.

   The other real drawback that I see with EDC is the performance hit,
the chips they have to perform these marvelous correction algorithms
are not what could be called real fast. Each memory read must have 
time allotted in it for an analysis of the data (including the code
bits) and time for any data correction in necessary. This isn't double
the nominal memory cycle time, but it is more than 50% additional.
The only way to circumvent this is with a CPU that can be aborted and
then restarted,...  but that has it's own unique brand of problems.


I should know this stuff, my whole life has been one parity error
after another....

					-Jim Wall
					!amd!fortune!wall

phil@amd.UUCP (Phil Ngai) (08/04/84)

> The primary disadvantage of parity is soft errors.

Hm, I consider this an advantage. Hard errors are easy to catch.
The thing I consider most important about parity is that
it catchs the soft errors which would otherwise corrupt my
data WITHOUT my knowing it.

If you keep good backups as you should then the important thing
is knowing that your data can be trusted. Parity is a big step
towards this. Even for 8 chips worth.

I would think that most users would be willing to accept the tradeoff
once it is explained to them.
-- 
 I'm going to keep boring until I strike oil.
 Phil Ngai (408) 982-6554
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amd!phil
 ARPAnet: amd!phil@decwrl.ARPA

BILLW@SRI-KL.ARPA (08/15/84)

One thing that is happening in memories these day is that the
indivdual chips are getting much larger.  What this means is
that an individual chip failure becomes catastrophic!  If you
have an IBM PC with 512K of memory using 64K chips, and one
chip goes bad (completely bad, say)., You have lost part of
your memory, and if you are technically inclined, you can
swap some chips on the board, throw some switches, and bring
your PC back up with 448K memory.  If you have a Macintosh
with 512K bytes of memory (and a 16 bit bus, remember!), your
system is now completely dead.  For this reason, EDC may become
quite valuable...

BillW