kalagher@MITRE.ARPA (08/02/84)
From: Dick Kalagher <kalagher@MITRE.ARPA> A parity check certainly cannot, by itself, ensure that the ambulance will go to the right street. In fact, the difference in reliability of the entire system will probably be negligible with or without parity. Also, let's not fool ourselves into thinking that a parity check will make the computer system 100% reliable. There are other sources of potential error and even the parity will be in error sometimes. No system is error free and there is always a cost-performance- reliability tradeoff that must be made. If I take your error to the (make that argument, not error) to theextreme than I need a complete redundent system, probbly with an independent main power source.
dfarber@UDEL-EE.ARPA (08/02/84)
From: Dave Farber <dfarber@UDEL-EE.ARPA> It is a bad situation to have an error that corrupts a data base and have no idea anything went wrong. The main benefit of parity is you know you are in trouble and that is worth a whole lot. If you want to recover then ecc is the route (and a heavy wallet)
witters@fluke.UUCP (John Witters) (08/02/84)
I feel the primary advantage of parity is to let a user know when the memory has suddenly gone "flakey". In this case, I think it is better to crash the system rather than let it run and corrupt data (i.e. disk file directorys). The parity detection prevents a user from operating a system until the problem is corrected. I agree that memory chips these days are extremely reliable. However, this doesn't console the poor user unlucky enough to have a chip go bad that wipes out all data since the last backup. Another advantage is that parity will prevent a system from exhibiting bizzare and un-reproducable behavior due to a bad memory. Parity will crash the system before this behavior occurs, and will immediately indicate where the problem lies. The primary disadvantage of parity is soft errors. This causes complaints from users that the system normally works O.K., but reports a parity error roughly once a month (or three months or six months). This problem can be solved with more hardware. The approach is to treat the error like a page fault in a virtual memory system: branch to the error routine, re-read the memory cell to find out if it is a hard error, then restart the instruction that caused the error. If it was a hard error, then crash the system. This requires a processor with instructions that can be halted and re-started, and hardware to record where memory errors occur in, addition to the parity detection circuitry. Of course, all this extra hardware will reduce reliability. Most people prefer to deal with a few extra user complaints than add this kind of hardware. If the system already uses virtual memory, it may not take much extra hardware to use this scheme. The decision to use parity depends on the system. It doesn't make much sense for a home computer that has a total of eight memory chips. It makes much more sense for a workstation that has a few hundred chips. John Witters John Fluke Mfg. Co. Inc. P.O.B. C9090 M/S 243F Everett, Washington 98206 (206) 356-5274
wall@fortune.UUCP (Jim Wall) (08/03/84)
As most people have stated already, parity is really determined by the market you are trying to sell to. Board space, cost, complexity are all really second to whether you win or lose sales of the final product. We are of course looking at this from the perspective of a designer building a product to be marketed and sold, and not from the viewpoint of a consumer looking to buy a machine. But that isn't what this article is about, this one is about EDC error detection and correction. Personally, I'm against it. All the simple EDC codes, such as Hamming codes, by adding three more bits per byte, you are gaurenteed of correcting all single bit errors, detecting all double bit errors and have a high probability of detecting multiple bit errors. Notice that you can only correct single bit errors. In this era of ESD, power spikes, and electrical noise, you are most often subjected to massive memory corruption, and rarely is the end result of a memory hit a single bit error. THere have been studies done on this, but someone must have my copy; I'll look for it. The other real drawback that I see with EDC is the performance hit, the chips they have to perform these marvelous correction algorithms are not what could be called real fast. Each memory read must have time allotted in it for an analysis of the data (including the code bits) and time for any data correction in necessary. This isn't double the nominal memory cycle time, but it is more than 50% additional. The only way to circumvent this is with a CPU that can be aborted and then restarted,... but that has it's own unique brand of problems. I should know this stuff, my whole life has been one parity error after another.... -Jim Wall !amd!fortune!wall
phil@amd.UUCP (Phil Ngai) (08/04/84)
> The primary disadvantage of parity is soft errors.
Hm, I consider this an advantage. Hard errors are easy to catch.
The thing I consider most important about parity is that
it catchs the soft errors which would otherwise corrupt my
data WITHOUT my knowing it.
If you keep good backups as you should then the important thing
is knowing that your data can be trusted. Parity is a big step
towards this. Even for 8 chips worth.
I would think that most users would be willing to accept the tradeoff
once it is explained to them.
--
I'm going to keep boring until I strike oil.
Phil Ngai (408) 982-6554
UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amd!phil
ARPAnet: amd!phil@decwrl.ARPA
BILLW@SRI-KL.ARPA (08/15/84)
One thing that is happening in memories these day is that the indivdual chips are getting much larger. What this means is that an individual chip failure becomes catastrophic! If you have an IBM PC with 512K of memory using 64K chips, and one chip goes bad (completely bad, say)., You have lost part of your memory, and if you are technically inclined, you can swap some chips on the board, throw some switches, and bring your PC back up with 448K memory. If you have a Macintosh with 512K bytes of memory (and a 16 bit bus, remember!), your system is now completely dead. For this reason, EDC may become quite valuable... BillW