aglew@dwarfs.csg.uiuc.edu (Andy Glew) (05/07/90)
ECC on memory has several performance costs: (1) the time to compute the ECC on a read - can it be overlapped with the processor, or can you handle a "the memory location you read several cycles ago, that you have written to all of the registers, was bad" trap with a significant delay? (2) the hardware to do the ECC computation - does it add load delays, or could it be used for something else? (3) partial writes become (non-atomic) read-modify-write operations... Note that (3) isn't necessary if you already have all of the old cache line data in your cache - well, you do an RMW in cache, and recompute the ECCs, but at least you don't have to do the RMW across the bus. This is equivalent to a write-allocate policy, which is implicit in many of the snoopy policies that obtain a cache line exclusively on the first write. But sometimes first-write exclusivity isn't desirable, and/or you want to be able to do partial writes. Q: does anyone know of "sometimes" ECC systems? Ie. memory systems where there is a valid bit associated with the ECC, so that a partial write where the ECC cannot be totally computed just clears the valid bit (perhaps downgrading to some simpler form of parity check, which may be embedded within the ECC). The next time that the whole cache line is written or read, then a new full-line width ECC could be computed. A "scrubber" process in the OS could go through all of memory updating the full ECC to valid during idle time. Obvious counter argument is that if you can afford not to have ECC at all times, then you can afford not to have ECC at all. Or, if the frequency of partial writes is low enough that the (un)reliability of not having ECC on those memory locations is acceptable, then the performance cost of doing RMWs at those times is probably acceptable as well. I'm not proposing or evaluating, just wondering out loud if it has already been done somewhere. -- Andy Glew, aglew@uiuc.edu
schow@bcarh185.bnr.ca (Stanley T.H. Chow) (05/08/90)
In article <AGLEW.90May7112551@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes: > Q: does anyone know of "sometimes" ECC systems? In our central office telephone switches, we have ECC that is sometimes switched off. Specifically, we switch off ECC when we are running in "Lock-step Sync" on matched processor pair with every cycle matched for error. (The ECC is always writen into memory, just the check is suppressed.) Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185 Me? Represent other people? Don't make them laugh so hard.
rh@craycos.com (Robert Herndon) (06/15/90)
Saying that 'ECC' can't correct multiple bit errors is too strong. Traditional SECDED won't help you, but there are S2ECD2ED codes that will correct any errors in a single two bit block of a machine word, S4ECD4ED codes that will correct any errors in a single four bit block, etc., while recognizing double block errors. ECC is generally only for people who worry about large words (or perhaps cache lines), say, 64 bits or larger. There is some additional cost over simple SECDED, as SECDED for a 64 bit word requires 8 check bits, S2ECD2ED for a 64 bit word requires 10 or 12 check bits, and S4ECD4ED for a 64 bit word requires 16 check bits. Block-error correcting codes are probably not that well known/ popular; this may have some effect. IBM has published (and patented) some stuff on this, but I don't have the references handy. If you're interested, e-mail me and I can probably dig them up. Even so, what percentage of machines "out there" really use SECDED codes? What percentage of memory chip production do these products consume? Robert Herndon Cray Computer Corp. rh@craycos.com 1110 Bayfield Dr. 719/540-4240 Colorado Springs, CO
stevew@wyse.wyse.com (Steve Wilson x2580 dept303) (06/16/90)
In article <1990Jun14.220128.7904@craycos.com> rh@craycos.com (Robert Herndon) writes: >ECC is generally only for people who worry about large words >(or perhaps cache lines), say, 64 bits or larger. I haven't found this to be the case. I've seen machines that used ECC for 8 bit words(i.e. an ECC memory card for the IBM P.C.) and LOTS of commercial machines that use ECC on 32 bit memory organizations. You'll typically see ECC on mini/super-micros built for commercial use. > stuff deleted.... > Even so, what percentage of machines "out there" really use >SECDED codes? What percentage of memory chip production do >these products consume? > >Robert Herndon Cray Computer Corp. Back in the bad old days, say circa 1978-1979 when the 64k memory chips were just coming out there was a story running around Burroughs that Burroughs alone consumed 10% of the world's memory production at that time. All of their machines at that time had ECC. Maybe that gives a ballpark answer to your question. Steve Wilson