[comp.arch] Memory speed, why so slow?

ssr@stokes.Princeton.EDU (Steve S. Roy) (05/08/91)

I have a question for those out there who are in the know about
dynamic RAM.

Is it my imagination or is it true that while the storage capacity of
dynamic RAM chips has increased by orders of magnitude the speed has
not?  Every workstation (or PC for that matter) depends heavily on a
cache of some kind, and for many applications the limiting speed is
not the cpu but the main memory.

Is it a fundamental VLSI level constraint that keeps the cutting edge
memories at a particular speed?  If you consider a chip that has a
given fraction of the storage of a cutting edge chip, will it have a
predictable fraction of the response time?

Is it that the people making chips (driven by the people buying chips)
feel that the current speeds are 'good enough' and the main limitation
is size, and therefore most of the effort goes toward greater density
rather than greater speed?  Do the designers, or whoever makes the
decisions, figure that 'If they really want something faster then
they'll pay for static RAM?'

Just wondering.  This seems to have a bearing on the recent
discussions on memory bandwidth for workstations, mainframes, and
supers.  I certainly find the the current crop of machines is highly
unbalanced in favor of CPU speed and away from memory (or other I/O)
bandwidth.

Steve Roy
ssr@acm.princeton.edu
Program of Applied and Computational Mathematics
Princeton University
Princeton NJ, 08544

raje@lattice.stanford.edu (Prasad Raje) (05/09/91)

In article <9245@idunno.Princeton.EDU> ssr@stokes.Princeton.EDU (Steve S. Roy)
asks:
   Is it my imagination or is it true that while the storage capacity of
   dynamic RAM chips has increased by orders of magnitude the speed has
   not? 

   Is it a fundamental VLSI level constraint that keeps the cutting edge
   memories at a particular speed?  If you consider a chip that has a
   given fraction of the storage of a cutting edge chip, will it have a
   predictable fraction of the response time?

   Is it that the people making chips (driven by the people buying chips)
   feel that the current speeds are 'good enough' and the main limitation
   is size, and therefore most of the effort goes toward greater density
   rather than greater speed?  Do the designers, or whoever makes the
   decisions, figure that 'If they really want something faster then
   they'll pay for static RAM?'


I am not sure I have direct answers to all your questions, but here is
a perspective on what happens and why in the DRAM business.

Memory density quadruples every 3 years. The majority of this increase
comes from lithography. That is, the cell size (one pass transistor and
one capacitor) gets smaller. There is also some help from the increase in
chip area. This is a double whammy for yield, one because the process is
getting more complex due to the finer and more exotic (see below) geometries
and second because the chip area is larger (yield is ~ e^(-lambda*Area)).

The cell size must get smaller, but the cell capacitance cannot.
Why? 
You need a large enough cell capacitance to guard against soft errors due
to radiation. That is you need to store at some minimum number of electrons
per cell. These days this number is in the few tens of thousands of electrons.
The second reason you need a certain minimum charge is to be able to drive
enough of charge onto the bit line. The cell state is sensed by charge sharing
between the dinky cell capacitance (~ 50 fF) and the huge bit line capacitance
( ~ 1 pF). 

Exotic geometries:
The upshot of the above paragraph is that you need a certain minimum capacitor
area while the cell area needs to keep shrinking. The way this has been 
recently been done is using the third dimension. The cell does not look planar
anymore. You either have deep "trenches" etched (not "dug") into the silicon
substrate or you have "stacks" raised on top of the cell to provide capacitor
area. This is precisely the kind of processing exotica that makes DRAMs so
expensive to develop. And of course there are ever more complicated device
issues, cell leakage, reliability of the capacitor oxide ...

Speed:
Very roughly
Memory access time =   time spent in the decoders (1)
                   + time spent to drive the word line (2)
                   + time the dinky cell takes to drive the bit line (3)
                   + time taken for column select and sensing (4)
                   + other misc stuff before your bit is on the pins (5)

(1) decreases with each technology generation (faster transistors) but
increases because there is more decoding to be done in the larger DRAM
(2) decreases because of faster transistors but increases because word lines
get longer because of the larger RAM array (remember chip areas are increasing)
(3) DRAM cell designers kill themselves to keep cell capacitance constant.
(see above) Bit line capacitance is increasing because of the larger array.
Overall, there is a slight increase.
(4) same as (1)
(5) constant

The result? Total delay is constant.

Now if you were to use the lithography levels of a 4M DRAM to make a 1M
DRAM, you certainly would have a smaller access time.
This access time would however not compare with that available with an
SRAM (I hope it is clear why SRAMs are faster than DRAMs).

The market seems to want ever larger DRAMs. Delivering that has been feat
enough. Speed has remained constant for the reasons mentioned above. 

There are however some innovations coming around. Some that I like
personally

1. Cache DRAMs: DRAMs with an on chip SRAM cache[1]. This looks to the
outside world as a large memory (say 4Mbit) with an SRAM like average
access time (say 10ns). 

2. BiCMOS DRAMs: allows one to further decrease delays (1),(2),(4),(5)
by using bipolar transistors. The down side is the added process complexity.

If the market demands it (ie pays for it) I would say we can have a research
prototype of a 16Mbit 5ns BiCMOS cache DRAM today. (you may realize I have
my fantasy hat on)

Prasad

als@bohra.cpg.oz.au (Anthony Shipman) (05/10/91)

In article <RAJE.91May9115209@lattice.stanford.edu>, raje@lattice.stanford.edu (Prasad Raje) writes:
> 
> In article <9245@idunno.Princeton.EDU> ssr@stokes.Princeton.EDU (Steve S. Roy)
> asks:
>    Is it my imagination or is it true that while the storage capacity of
>    dynamic RAM chips has increased by orders of magnitude the speed has
>    not? 
.................
> 
> Speed:
> Very roughly
> Memory access time =   time spent in the decoders (1)
>                    + time spent to drive the word line (2)
>                    + time the dinky cell takes to drive the bit line (3)
>                    + time taken for column select and sensing (4)
>                    + other misc stuff before your bit is on the pins (5)
> 
> (1) decreases with each technology generation (faster transistors) but
> increases because there is more decoding to be done in the larger DRAM
> (2) decreases because of faster transistors but increases because word lines
> get longer because of the larger RAM array (remember chip areas are increasing)
> (3) DRAM cell designers kill themselves to keep cell capacitance constant.
> (see above) Bit line capacitance is increasing because of the larger array.
> Overall, there is a slight increase.
> (4) same as (1)
> (5) constant
> 
> The result? Total delay is constant.
................
> There are however some innovations coming around. Some that I like
> personally
> 
> 1. Cache DRAMs: DRAMs with an on chip SRAM cache[1]. This looks to the
> outside world as a large memory (say 4Mbit) with an SRAM like average
> access time (say 10ns). 
> 
> 2. BiCMOS DRAMs: allows one to further decrease delays (1),(2),(4),(5)
> by using bipolar transistors. The down side is the added process complexity.
> 
> If the market demands it (ie pays for it) I would say we can have a research
> prototype of a 16Mbit 5ns BiCMOS cache DRAM today. (you may realize I have
> my fantasy hat on)
> 
> Prasad


I read a while back that one improvement being considered for large dynamic
RAMS was removing the multiplexing of the row and column addresses, saving
some time.  Is this still on the cards?


-- 
Anthony Shipman                 "You've got to be taught before it's too late,
Computer Power Group             Before you are six or seven or eight,
19 Cato St., East Hawthorn,      To hate all the people your relatives hate,
Melbourne, Australia             You've got to be carefully taught."  R&H

raje@lattice.stanford.edu (Prasad Raje) (05/10/91)

In article <1991May10.035511.29155@bohra.cpg.oz.au> als@bohra.cpg.oz.au (Anthony Shipman) writes:

   I read a while back that one improvement being considered for large dynamic
   RAMS was removing the multiplexing of the row and column addresses, saving
   some time.  Is this still on the cards?


Having separate column addr pins wont buy a whole lot. This is because the
inherent signal flow in a DRAM access has a lag between the time that you 
present the row addr and the time that you need the column addr for the
column select. That is, the row decoding, word line driving, bit line swinging
all take up time before you can do anything useful with the column address.

Now if extra pins were available, what I would use them for is extra
data lines. In general, the memory array is partitioned so that as
many 1024 bits are available (inside the chip) after a single row
access - sensed and all ready to go. Then along comes the column
decode that selects one of these bits for the output.  Obviously
shipping 1024 bits out would be tough, but it seems like a waste to
ship out just one. 16 or 32 seems like a reasonable number and this
would dramatically increase the bandwidth of the DRAM. (the current
compromise for this is page mode, static column mode, nibble mode
etc.)

Prasad

dan@systech.bjorn.COM (Dan Gill) (05/10/91)

In article <9245@idunno.Princeton.EDU>, ssr@stokes.Princeton.EDU (Steve S. Roy) writes:
> I have a question for those out there who are in the know about
> dynamic RAM.
> 
> Is it my imagination or is it true that while the storage capacity of
> dynamic RAM chips has increased by orders of magnitude the speed has
> not?  Every workstation (or PC for that matter) depends heavily on a
> cache of some kind, and for many applications the limiting speed is
> not the cpu but the main memory.

Well, the saga of DRAM's has been going on for sometime.  DRAM's have actually
decreased in speed by a large amount.  There was a day when I was using
4Kx1 Dynamic memories that ran at 450ns.  I still have some just for 
historical sake :-).   You might be glad that you don't have to be using
core memory cards that get you 16K or maybe 64k on a board the size of a
9U VME board.

There has always been a paradox between Sram and Dram.  Sram is faster.  Stated
fact.  You can get sub-10 ns Sram's today.  They are expensive and run HOT, 
but you can get them, but the capacity is not to terrific.  It takes several
transistors to make a cell of storage in an Sram.  Maybe you'll see
256K or so per part.  It is just not efficient to make mass storage out of
Sram's.  Well, it was in the 6502 days.

Dram's on the other hand are high capacity parts, and lower pin counts
than much smaller capacity Sram's.  Dram's are able to mux the address bus
from 20 pins to 10 pins, thus making them smaller,  but the fastest Dram's
are around 60 ns.  These are 1Mb and 4Mb parts.  Dram's are just an array
of capacitors.  Not a huge array of transistors.  You can cram a bunch
more caps into a phonebooth than transistors.  But Dram's need to be
refreshed, after all caps don't remember forever :-(.

Dram vendors have been able to increase the capacity of them because of the
mere fact that more stuff can be crammed into a package.  The fact that speed
has been increased has also been a fallout of the same thing.  

No, 60ns is not fast enough.  Vendors have come up with nibble and page-mode
operations to speed Dram's up to sub 20ns, but this assumes either sequential
accesses or small ranges of addresses.

The world is just one big trade-off.  Either you can have speed or you can
have capacity.  What workstation guys do then is to meet in the middle by
making huge banks of Dram at a relatively low speed (say 80-100ns) and stick
a fairly large cache beside it at a blazing speed (say 15 ns) and rely
on one to help the other.

Maybe one day the Dram will run at 20 ns sustained, but then the 100mhz sparc
will be out and, well here we go again 8-).
-- 
-------------------------------------------------------------------------------
"On second thought, let us not go to Camelot.  It is a silly place"
Dan Gill                                          uunet!systech!dan
-------------------------------------------------------------------------------

scott@labtam.labtam.oz (Scott Colwell) (05/13/91)

als@bohra.cpg.oz.au (Anthony Shipman) writes:
>I read a while back that one improvement being considered for large dynamic
>RAMS was removing the multiplexing of the row and column addresses, saving
>some time.  Is this still on the cards?


See the Hitachi HM571000 parts.  35/40/45 ns,  1 M by 1 BiCmos dram.

It uses a non-multiplexed address bus.  The down side is that it comes in
a 28 pin package (300mil SOJ) rather than the 18 pin DIP, 20 pin ZIP etc
that a standard 1 M by 1 dram comes in.  Going from 10 address lines to
20 lines tends to do this. (i.e. board area goes up.)


What nobody has mentioned is that increasing the density of memory devices
while holding the access time constant has a significant effect on the access
time for a memory array of a constant size.  Since you are driving fewer drams,
the transmission line effects are reduced and overall access time comes down.

The problem is that the amount of memory that people expect keeps increasing
at the same rate as the density increases....

lethin@raisin-scone.ai.mit.edu (Richard A. Lethin) (05/17/91)

In article <RAJE.91May9115209@lattice.stanford.edu> raje@lattice.stanford.edu (Prasad Raje) writes:
>The cell size must get smaller, but the cell capacitance cannot.
>Why? 
>You need a large enough cell capacitance to guard against soft errors due
>to radiation. That is you need to store at some minimum number of electrons
>per cell. These days this number is in the few tens of thousands of electrons.

An interesting note: Don Speck, in his paper at the '91 Santa Cruz
VLSI conference had a paper about a DRAM design that he built for the
MOSIAC project at Caltech.  He did a bit of investigation of radiation
effects, and found evidence indicating that this argument might not be
correct.  

The idea is that the radiation hit creates some number of
electron-hole pairs that could corrupt the state of the capacitor.
However, the charged-pair creation happens below the surface of the
chip, so that by the time they diffuse up, they've also spread.  So
the amount of corrupting charge becomes a function of the area of the
capacitor (since the diffused area is much larger than a capacitor).

As the size of the cell decreases, the amount of the charge on the
cell decrease, but so does the amount of charge intercepted.

It was a nice paper.  I'd recommend it.

jallen@csserv2.ic.sunysb.edu (Joseph Allen) (05/18/91)

In article <RAJE.91May9115209@lattice.stanford.edu> raje@lattice.stanford.edu (Prasad Raje) writes:
>In article <9245@idunno.Princeton.EDU> ssr@stokes.Princeton.EDU (Steve S. Roy)
>asks:
>The cell size must get smaller, but the cell capacitance cannot.
>Why? 
>You need a large enough cell capacitance to guard against soft errors due
>to radiation. That is you need to store at some minimum number of electrons
>per cell. These days this number is in the few tens of thousands of electrons.
>The second reason you need a certain minimum charge is to be able to drive
>enough of charge onto the bit line. The cell state is sensed by charge sharing
>between the dinky cell capacitance (~ 50 fF) and the huge bit line capacitance
>( ~ 1 pF). 

Why is this a problem?  Just have more sense amplifiers.  Or equivelently,
break the dram into small banks (I believe this is done).

Also I don't know the exact numbers to calculate it, so which is better:

	Big cells which don't error

	Small cells with enough excess so that you can error correct (during
	refresh perhaps)

Hmm that reminds me:

I once worked for a company which made character generators for those stupid
cable channels.  One of the machines used CCD memory.  CCD memory is very
sensitive to alpha particles I think.  Anyway, it made mistakes so often that
the user had to look through the text every once in a while to fix things.  Of
course, no one ever reads those channels so it doesn't really matter.  

-- 
/*  jallen@ic.sunysb.edu  */     /* Amazing */     /* Joe Allen 129.49.12.74 */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}