[comp.arch] A Fast Memory Architecture

mrj@cluster.cs.su.oz.au (Mark James) (02/09/91)

Memory reference profiles for most applications show a moderately
sized set of very active clumps with little access elsewhere.
One possible DRAM configuration to match this would be to use
page mode DRAMS and spreading adjacent pages systematically
or randomly across the different chips.  Each chip has a
different currently active page allowing quick access to almost
as many pages as there are chips - assuming you have randomised
the pages across the chips properly.  All you would need to
implement this would be a more complex, MMU like, DRAM controller.

Motorola's DSP96002 uses a separate active page for each of its
three memory spaces, but does anyone know of any implementations
going the whole hog?

I think I read that someone has patented this configuration with
the addition of even quicker access to one location per chip through
chip enable.

Mark

henry@zoo.toronto.edu (Henry Spencer) (02/10/91)

In article <2012@cluster.cs.su.oz.au> mrj@cluster.cs.su.oz.au (Mark James) writes:
>Memory reference profiles for most applications show a moderately
>sized set of very active clumps with little access elsewhere.
>One possible DRAM configuration to match this would be to use
>page mode DRAMS and spreading adjacent pages systematically
>or randomly across the different chips.  Each chip has a
>different currently active page ...

A major problem with this is that DRAMs typically are only one bit wide.
So the minimum unit of memory is a 32-chip bank, not a single chip.
You can buy wider DRAMs, but in general they need extra pins, and this
means a bigger package that eats more board space; there is *very* strong
economic and practical pressure toward 1-bit-wide DRAMs, and this will
continue as long as a substantial number of DRAMs are needed to meet
a system's memory requirements.  (Of late, the software people's memory
needs have shown a regrettable ability to match or exceed the growth in
chip capacity.)
-- 
"Maybe we should tell the truth?"      | Henry Spencer at U of Toronto Zoology
"Surely we aren't that desperate yet." |  henry@zoo.toronto.edu   utzoo!henry

agn@unh.cs.cmu.edu (Andreas Nowatzyk) (02/11/91)

>> mrj@cluster.cs.su.oz.au (Mark James) writes:
>> ... proposal to use page-mode (or static column) mode access to speed up
>> DRAM systems in the presence of access locality. [something like the
>> memory system of the old Sun 4/110]

> henry@zoo.toronto.edu (Henry Spencer) replies:
> ... claims major problem due *very* strong economic incentive for *1 DRAMs

Historically there was a strong motivation for by 1 DRAMs, but things have
changed considerably. First, packaging technology has advanced: with TSOP
packaging, the 2 extra pins for a *4 organization do not increase package
size and are virtually free. In fact, even with the current SOJ packages,
there are several unused pin places near the center. Basically, the die size
of the DRAM chip dictates package size, not pin-count (for DRAMs > 1 Mbit).

Second, by 1 organizations become too cumbersome: consider a 64bit SIMM
with 16 Mbit chips: "Sorry, but 128 Mbytes IS our smallest SIMM" :-)

Third, by 1 organizations waste a lot of memory bandwidth. Internally, large
DRAMs are composed of many independent arrays, but only 1 bit of one array
is accessed per cycle.

Consequently, there are plenty by 4, by 8 and by 16 (!) DRAM devices in the
pipe, with very little difference in device cost and no difference in package
size.

The clever use of page mode and/or static column mode seems to be subject of
another Hyatt patent.

There are numerous other proposals to improve the DRAM bandwidth. For
example the integration of a fast SRAM into a DRAM with the ability to copy
large blocks of memory between the SRAM and the DRAM in one cycle.

Other ongoing efforts try to replace the normal TTL interface of DRAMs with
extremely fast signalling schemes to increase the bandwidth between the data
stored in the sense-amplified columns of the DRAM and the outside world.
Again, most of these schemes use 4 or 8 bit interfaces.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/12/91)

In article <11878@pt.cs.cmu.edu> agn@unh.cs.cmu.edu (Andreas Nowatzyk) writes:

| Second, by 1 organizations become too cumbersome: consider a 64bit SIMM
| with 16 Mbit chips: "Sorry, but 128 Mbytes IS our smallest SIMM" :-)

  What's your point? 

  If there's a market for something smaller vedors will sell SIMMs with
1mbit, or 256k, or even 64k if there's a demand. I guess I can't
imagine a machine with a 64 bit path which couldn't use a 128MB part.

  Cost is certainly not going to be a big deal, like any other part the
price will come down until it is related to actual cost. And honestly I
don't see production cost being large for any chip of any complexity, at
least in comparison to the development cost. Look at the prices of
memory five years ago, and today. I would expect $10/MB in 1-2 years,
with 4MB bing the usual unit of expansion.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  "I'll come home in one of two ways, the big parade or in a body bag.
   I prefer the former but I'll take the latter" -Sgt Marco Rodrigez

glew@pdx007.intel.com (Andy Glew) (02/12/91)

>Memory reference profiles for most applications show a moderately
>sized set of very active clumps with little access elsewhere.
>One possible DRAM configuration to match this would be to use
>page mode DRAMS and spreading adjacent pages systematically
>or randomly across the different chips.  Each chip has a
>different currently active page allowing quick access to almost
>as many pages as there are chips - assuming you have randomised
>the pages across the chips properly.  All you would need to
>implement this would be a more complex, MMU like, DRAM controller.

Wouldn't be nice if you could have several such "active sets" per
chip?  I've known at least one microprocessor architect to scream
about this.


--
Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

davidb@brac.inmos.co.uk (David Boreham) (02/12/91)

In article <1991Feb10.013525.1317@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>A major problem with this is that DRAMs typically are only one bit wide.
>So the minimum unit of memory is a 32-chip bank, not a single chip.
>You can buy wider DRAMs, but in general they need extra pins, and this
>means a bigger package that eats more board space; there is *very* strong
>economic and practical pressure toward 1-bit-wide DRAMs, and this will
>continue as long as a substantial number of DRAMs are needed to meet
>a system's memory requirements.  (Of late, the software people's memory
>needs have shown a regrettable ability to match or exceed the growth in
>chip capacity.)

I don't disagree strongly with anything Henry says but I think his
direction is a bit out of date. The number of chips, and certainly
the board area required, for a computer's memory has decreased over
the past ten years--steadily. The 4Mb on our ten-year-old VAX using
64K chips takes 512 chips. The 64Mb on the machines we use now takes
128 chips and those chips use about a quarter the board area of the
64K DIPs. The memory is only three times as fast though and this is
probably the real problem :)

There is a definite trand towards wider RAMs. At 64K you
could only get 'by 1' chips. At 256K you could get by 1
and by4. At 1M you can now get by1, by4, by8 and by16.
You can also get modules at up to by36.

If you have a processor which supports multiple memory
banks and RAS precharge interleaving then four banks
of 32-bit wide, done with by8 chips might well be faster
than one bank of 64-bit wide memory done with by4 chips.

Note that it is by no means obvious that running multiple
banks each with an ``open page'' held active (either using
page-mode or static column) will actually be a performance
win. A recent copy of Electronics carried an item which 
claimed that such a feature could speed up a memory system
by 300% ! In fact, since you need to provide RAS precharge
(which is about the same time as the memory access time)
when the page turns out to be wrong; and you can't tell
if it is wrong until you need the data, if the accesses
are hopping about then the precharge times will kill the
access time. 

Also you burn power in all the banks all the time.

David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

davidb@brac.inmos.co.uk (David Boreham) (02/14/91)

In article <3187@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <11878@pt.cs.cmu.edu> agn@unh.cs.cmu.edu (Andreas Nowatzyk) writes:
>
>| Second, by 1 organizations become too cumbersome: consider a 64bit SIMM
>| with 16 Mbit chips: "Sorry, but 128 Mbytes IS our smallest SIMM" :-)
>
>  What's your point? 

Memory factories are only making 16Mbit chips (this is in 1994, say).
Computer wants 64-bit wide memory. You want this on a SIMM and you
try making such a thing with by1 memory devices. 64 times 2Mbyte
is 128Mbyte. User actually wanted 16Mbyte or something less than 128Mbyte.
Can't build such a thing without memory chips wider than 1 bit.
Therfore need wide memory chips. That's the point.

As an example---PC VGA graphics boards need 512K of memory organised
by 32-bits. On the original boards this took 16 off 64K by 4 chips.
Now nobody wants to manufacture 256K DRAMs so they make 64K by 16
(1Mbit) chips instead and the VGA boards can use two chips.
Without the by-16 chips you would either have to carry on using
obsolete and expensive 256K devices or waste money by providing
more memory than is reqired.

>
>  If there's a market for something smaller vedors will sell SIMMs with
>1mbit, or 256k, or even 64k if there's a demand. I guess I can't
>imagine a machine with a 64 bit path which couldn't use a 128MB part.

!

>
>  Cost is certainly not going to be a big deal, like any other part the
>price will come down until it is related to actual cost. And honestly I
>don't see production cost being large for any chip of any complexity, at
>least in comparison to the development cost. Look at the prices of
>memory five years ago, and today. I would expect $10/MB in 1-2 years,

!! There already if you buy enough of them.

>with 4MB bing the usual unit of expansion.

If you want a unit of expansion of 4Mbyte and 64bit datapaths
then you need 32-bit wide chips at 16Mbit.

Sorry for pre-empting the original poster.
David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (02/15/91)

In article <14372@ganymede.inmos.co.uk> davidb@inmos.co.uk (David Boreham) writes:
>In article <3187@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>>In article <11878@pt.cs.cmu.edu> agn@unh.cs.cmu.edu (Andreas Nowatzyk) writes:
>>
>>| Second, by 1 organizations become too cumbersome: consider a 64bit SIMM
>>  What's your point? 
>Memory factories are only making 16Mbit chips (this is in 1994, say).
:
>As an example---PC VGA graphics boards need 512K of memory organised
>by 32-bits.
:
(discussion omitted)
>Sorry for pre-empting the original poster.

Agreed.  I would like to add, though, that I think some posters are ignoring
history, here.  Much of the above reasoning is along the lines of,
"What the world needs is a cheaper Apple II".  But, what really happens is that
for every price niche, what the world really wants (and buys, historically)
is more performance and functionality.

So, instead of worrying about wasting memory building a cheaper VGA board,
consider what you can do more memory, given the same number of chips, and,
therefore, the same cost points.  Rather than build a hypothetical board,
consider that SGI now sells an add-in board to bring to the PC market 
affordable high performance graphics.  Nobody really wanted the limitations
of VGA to begin with.  Everyone wants high resolution full color.  They just
couldn't afford it.  With new technologies, people will forget about VGA
in PC's, just as they have forgotten about 8086's.  (Or least they would
like to forget about the 8086!  :-)

Don't worry that you won't be able to buy memory in less than 64 MB units.
You will be able to afford it!

  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster@george.arc.nasa.gov 
  Phone:  415/604-6117