[comp.sys.next] NeXT Memory - No Error Checking or

callen@inmet (10/31/88)

>/* Written 12:32 am  Oct 30, 1988 by james@bigtex.UUCP in inmet:comp.sys.next 
>For those not aware: the Intel 80x88 family has a design flaw that
>requires external hardware to disable NMI.  Without such hardware it
>is not possible to prevent the system from randomly crashing when NMIs
>are used.

NMI means "Non Maskable Interrupt" and it's considered a feature, not a bug.

Getting back to parity, I'm surprised, too, that NeXT didn't use either
parity or ECC. There are a number of single-chip ECC solutions available,
so I can't believe that the cost would have gone up THAT much. But then,
I'm not a memory designer, either. 

I am under the (possible wrong) impression that soft memory errors are
often caused by cosmic rays streaking through one of the cells in a DRAM 
and discharging it, and that higher density memories are MORE susceptible
to this type of error. Is there a memory expert listening who can comment?

-- Jerry Callen
   callen@inmet.inmet.com
   ...{uunet,harvard}!inmet!callen

abali@baloo.eng.ohio-state.edu (Bulent Abali) (11/03/88)

In article <207400001@inmet> callen@inmet writes:
>Getting back to parity, I'm surprised, too, that NeXT didn't use either
>parity or ECC. There are a number of single-chip ECC solutions available,
>so I can't believe that the cost would have gone up THAT much. But then,

It is not the cost alone which effects the decision of 
using ECC or not. Parity and ECC techniques are real performance 
killers. Generating the check bits for writes, generating and 
comparing the check bits for reads take time. This may cost 
1 to 3 wait states for memory access. Furthermore, byte write and 
halfword write (word=32 bits) operations have an additional 
performance penalty. For such cases, the hardware must first read
the fullword which contains the halfword or byte to be written into,
generate the check bits, and finally write into the memory. So these
type of operations may take twice as long. 

If NeXT didn't put ECC in it's memory, I'll say it is a tradeoff
to get performance, not because of cost. If I were a microprocessor
designer, I would generate the check bits or parity on the chip, 
rather than trying to squeeze in a larger cache.

whh@pbhya.PacBell.COM (Wilson Heydt) (11/03/88)

In article <207400001@inmet>, callen@inmet writes:
> 
> I am under the (possible wrong) impression that soft memory errors are
> often caused by cosmic rays streaking through one of the cells in a DRAM 
> and discharging it, and that higher density memories are MORE susceptible
> to this type of error. Is there a memory expert listening who can comment?

I *not* an expert, but I did read a very good article in SciAm a couple 
of years ago on the subject.

According to the article, the primary cause of soft errors comes from the
trace impurities of heavy nuclei in the plastic package.  (Ceramics,
by-the-bye are worse in this context.)  The ones to worry about are
alpha emitters, like Uranium and Thorium.

One can just see a new industry sprining up . . . high-purity plastics
for encapsulating memory chips, with guarenteed low levels of alpha-emtters.

   --Hal

=========================================================================
  Hal Heydt                             |    "Hafnium plus Holmium is
  Analyst, Pacific*Bell                 |     one-point-five, I think."
  415-645-7708                          |       --Dr. Jane Robinson
  {att,bellcore,sun,ames,pyramid}!pacbell!pbhya!whh

jensen@gt-eedsp.UUCP (P. Allen Jensen) (11/03/88)

In article <956@accelerator>, abali@baloo.eng.ohio-state.edu (Bulent Abali) writes:
> If NeXT didn't put ECC in it's memory, I'll say it is a tradeoff
> to get performance, not because of cost. If I were a microprocessor
> designer, I would generate the check bits or parity on the chip, 
> rather than trying to squeeze in a larger cache. 

According to the NeXT salesperson I talked to, the reason was COST not
performance.  I am not an expert on memory hardware, but it doesn't seem
reasonable to me that parity should cause a performance hit.

-- 
P. Allen Jensen
Georgia Tech, School of Electrical Engineering, Atlanta, GA  30332-0250
USENET: ...!{allegra,hplabs,ulysses}!gatech!gt-eedsp!jensen
INTERNET: jensen@gt-eedsp.gatech.edu

abali@baloo.eng.ohio-state.edu (Bulent Abali) (11/04/88)

In article <553@gt-eedsp.UUCP> jensen@gt-eedsp.UUCP (P. Allen Jensen) writes:

>According to the NeXT salesperson I talked to, the reason was COST not
>performance.  

Adding SEC/DED increases DRAM cost by 7/32, less than 25%. 
Adding parity per 32 bits increases DRAM cost by 1/32, about 3%.
Adding parity per byte increases DRAM cost by 1/8, 12.5%.

>I am not an expert on memory hardware, but it doesn't seem
>reasonable to me that parity should cause a performance hit.

I am not an expert on memory hardware either, but I know that
parity and ECC does cause a performance hit.

ejf@well.UUCP (Erik James Freed) (11/07/88)

>Adding SEC/DED increases DRAM cost by 7/32, less than 25%. 
>Adding parity per 32 bits increases DRAM cost by 1/32, about 3%.
>Adding parity per byte increases DRAM cost by 1/8, 12.5%.
I am not sure that this is a pro-parity or anti-parity addition
but I would like to add that  when you look at an individual feature
it is too easy to say "well it would only add X% to the cost" but
the real issue is what do you do when you are faced with 100s of these
decisions? A few percent for each feature can easily double the price.
remember that each of these costs must include a factor of 4-5 to
allow for a profitable margin. Anyone who has to make these kind
of decisions quickly realizes why these seemingly small ommisions
are so important to holding the line on price. There have been
a lot of unsuccessful products/companies that didnt address this
important issue.

>I am not an expert on memory hardware either, but I know that
>parity and ECC does cause a performance hit. 
Actually you can implement parity so that it does not hurt performance
by having parity checking not hold up a memory cycle but interrupt 
the cpu after the memory cycle if an error is found.