[comp.arch] Multiport Micro Memories

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (05/16/89)

In article <19661@winchester.mips.COM> mash@mips.COM (John Mashey) writes:

>(or anybody else really familiar with supercomputer architecture)
>could describe the memory systems of such things.  There are some
>fairly sensible reasons why the answers on 3A & 3B might be
>opposite....

That reminds me.  We have seen some postings recently to the effect that
connection technology is going to allow chips with some really large numbers
of inputs/outputs.  When is someone going to stop spending so much time on
the "easy" :-)   part of building what used to be a supercomputer in VLSI (CPU
speed), and start spending some time on the hard part - fast multiport
interleaved memory.  I would like to see someone come up with a cheap fast
multiport memory interconnect.  

For example, a Cray Y-MP has 4 ports/CPU, with 8 CPU's,
for a total of 32 CPU ports, 256 banks of memory in 32 MWords, with a 5
Clock period cycle time per bank (6 ns/clock).  The memory interconnect can
accept one memory operation per 6 ns clock (read or write) on each port, for
a total memory bandwidth (excluding bank conflicts) of 340 Gbits/second.
There are 32 1MW memory modules.  Access time is 15 ns, SECDED is implemented
on 64 bit words.

A reasonably inexpensive micro based system might need 4 CPU ports, 4
memory banks, and operating at 10ns (To handle all those 100 MHz RISC
processors that everyone says are coming.)  If we don't do something, we
are going to keep building what someone I know refers to as "The World's
Fastest Toy Computer".

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

loving@lanai.cs.ucla.edu (Mike Loving) (05/16/89)

In article <25395@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>
>When is someone going to stop spending so much time on
>the "easy" :-)   part of building what used to be a supercomputer in VLSI (CPU
>speed), and start spending some time on the hard part - fast multiport
>interleaved memory.  I would like to see someone come up with a cheap fast
>multiport memory interconnect.  
>

There is currently some interesting and (in my opinion) promising work going
on (last I heard) at the University of California at Davis on totally optical
interconnect which should go a long ways towards alleviating the processor
memory bottleneck.  The basic scheme allows the transfer of the entire contents
of the memory chip(s) (any size chip ya wanna build) in one fell swoop.  While
there will probably be difficulties with this technology (and all the others
trying anything new), when and if it works out it will greatly change the face
of the memory bandwidth problem.  For more info on this you should probably
contact Norm Matloff (matloff@iris.ucdavis.edu).


-------------------------------------------------------------------------------
Mike Loving          loving@lanai.cs.ucla.edu
                     . . . {hplabs,ucbvax,uunet}!cs.ucla.edu!loving
-------------------------------------------------------------------------------

aglew@mcdurb.Urbana.Gould.COM (05/17/89)

>That reminds me.  We have seen some postings recently to the effect that
>connection technology is going to allow chips with some really large numbers
>of inputs/outputs.  When is someone going to stop spending so much time on
>the "easy" :-)   part of building what used to be a supercomputer in VLSI (CPU
>speed), and start spending some time on the hard part - fast multiport
>interleaved memory.  I would like to see someone come up with a cheap fast
>multiport memory interconnect.  

Now you're talking!!!  Lots of pins give us multiple ports - now how do we
use them?

To begin with: we need to start moving the parallelism of interleaved memory 
systems *within* the memory chips.  We know how to make expensive multichip
memory systems run fast. Trouble is, chip count is very strongly related
to system cost, and everyone wants cheaper systems.

We need memory chips that can keep several memory transactions going within
the chip - that have fast and powerful logic to snarf the data on writes,
and drive the read data out to the bus really quickly when it surfaces.
The various burst modes (page mode, nibble mode) are great, but it's time
that memory handled independent transactions well too.
w

davidsen@sungod.steinmetz (William Davidsen) (05/18/89)

In article <28200315@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:

| Now you're talking!!!  Lots of pins give us multiple ports - now how do we
| use them?

  Let's look back a few years... the GE600 mainframes had eight port
memory controllers. You could connect any combination of CPU's and I/O
controllers (doing DMA) as long as you had at least one of each. The
cache was on the memory controller.

  If we update that to today's micro world, we could call it an
inteligent cache and bus controller or something. No problems of cache
validity between CPU's, the controller does it. Interleave? The
controller could interleave from multiple sources, write diferent
channels to diferent banks, etc.

  The problem I see is the bus. The mainframes had fat cables, and
could have an address and data path for each memory port. The i/o
controller(s), something like a DMA controller, had individual cables
going to the disk controller, tape controller, etc. Each device
controller had cables going to each device, which is still done today.

  What resulted was a "data tree," with multiple devices on each
controller, multiple device controllers going to the i/o controller,
multiple i/o controllers and CPU's going to the memory controller.
Everything having to do with the memory subsystem, the data, cache,
interleave, etc, was all controlled by the memory controller,
eliminating problems of cache validity between subsystems.

  To make use of this would require some new packaging, either going to
cables or a very complex bvus with multiple data and address
connections. This is more traditional on minis.

  I'm sure there are other ways to use multiport memory, but this is
certainly one which has proven useful.
	bill davidsen		(davidsen@crdos1.crd.GE.COM)
  {uunet | philabs}!crdgw1!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

rpw3@amdcad.AMD.COM (Rob Warnock) (05/23/89)

In article <13821@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
+---------------
| In article <28200315@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
| | Now you're talking!!!  Lots of pins give us multiple ports - now how do we
| | use them?
|   Let's look back a few years... the GE600 mainframes had eight port
| memory controllers. You could connect any combination of CPU's and I/O
| controllers (doing DMA) as long as you had at least one of each...
+---------------

The earliest PDP-10 memories (MA10, MB10, etc.) also had 8 ports. And you
could interleave up to 4-way by flipping switches in the memories. [Better
get the interleaving the same on all ports!] Of course, you only got 16K
36-bit words in each 30-inch-wide 6-foot-tall cabinet. ;-}  (Oh, yeah, the
later MD10 gave you 64K words per box. Big improvement... ;-}  )

But as you noted, the cabling was *monstrous*! Each set of cables was about
the same size/thickness as a set of IBM bus/tag cables. So a loaded memory
had 8 sets in, and of course, 8 sets daisy-chained out...

We could really use multi-port memory chips *today*! For example, in building
100 Mb/s FDDI node controllers (or for the upcoming gigabit rates) I'd really
like to have (at least) triple-ported memory capable of ~800 Mb/s bursts
on all three ports at the same time [the net, the host bus, and the node CPU].
The serial shift register of a video RAM doesn't take up all that much space;
you should be able to add a couple more. As it is, today's VRAMs really don't
cut it if you're trying to "stream" data from the host to/from the net -- the
normal RAM port doesn't have enough bandwidth to use the VRAM as a general FIFO,
and besides, you end up having to do lots of copying on the controller board,
and we all *know* about copying, don't we... ;-}  [VRAMs with just *two* serial
ports plus the normal RAM port would be a big help, but I'd still like three
burst ports...]


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403