[comp.arch] fast DRAMs and caches

mbutts@mentor.com (Mike Butts) (08/25/89)

From article <AGLEW.89Aug21205533@chant.urbana.mcd.mot.com>, by aglew@urbana.mcd.mot.com (Andy-Krazy-Glew):
> Now, then, what about these fast new DRAMs?
> 
> IBM announced a 16ns part, followed by Hitachi announcing a 20ns part.
> I believe that they were reasonably large (1 Mbit -- I just left the
> latest of a series of articles at home).
> 
> Apparently this big jump up in DRAM performance is attained by just doing
> things the sensible, brute-force way --- no more multiplexed lines, plus
> a bit of bipolar on the CMOS memory chip for drivers.
> 
> Anyone have more details?  Anyone care to speculate on what faster DRAMs
> will do for computer architecture?  Has anyone run simulations, either
> hardware (faster DRAMs let me do away with cache), or, probably more important,
> economic (fast DRAMs with lotsa pins will ride the technology curve down 
> 1 yr? 2 yrs? behind regular DRAMs)?

In particular, I haven't seen anything in the press yet about the *cycle* time,
only figures on access time.  DRAMs have always had cycle times much longer
than access times, mainly to write back the bits that were read and recharge
everything.  At least in the recent technology of 100ns access and 200 or
250ns cycle, it's often the cycle time that limits what can be done with the
memory system.

Anyone have the figures?

Another point about caches is that many fast RISC systems, such as Apollo's
DN10000 and the forthcoming Fujitsu SPARC-H, use separate instruction and data
caches to accomplish two memory operations in one cycle (so-called "Harvard
architecture").  You can't do that without caches unless you want to keep two
copies of main memory.
-- 
Michael Butts, Research Engineer       KC7IT           503-626-1302
Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton, OR 97005
...!{sequent,tessi,apollo}!mntgfx!mbutts  OR  mbutts@pdx.MENTOR.COM
Opinions are my own, not necessarily those of Mentor Graphics Corp.

tim@cayman.amd.com (Tim Olson) (08/25/89)

In article <1989Aug24.215104.156@mentor.com> mbutts@mentor.com (Mike Butts) writes:
| Another point about caches is that many fast RISC systems, such as Apollo's
| DN10000 and the forthcoming Fujitsu SPARC-H, use separate instruction and data
| caches to accomplish two memory operations in one cycle (so-called "Harvard
| architecture").  You can't do that without caches unless you want to keep two
| copies of main memory.

Check out VRAMs (Video-DRAMs).  They have two ports to the same memory
-- a standard, random-access port, and a serial port that can be used
to read sequential data concurrently with random accesses on the other
port.




	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

davec@cayman.amd.com (Dave Christie) (08/25/89)

In article <1989Aug24.215104.156@mentor.com> mbutts@mentor.com (Mike Butts) writes:
>
>Another point about caches is that many fast RISC systems, such as Apollo's
>DN10000 and the forthcoming Fujitsu SPARC-H, use separate instruction and data
>caches to accomplish two memory operations in one cycle (so-called "Harvard
>architecture").  You can't do that without caches unless you want to keep two
>copies of main memory.

Not two copies of main memory, just one copy of multi-ported
interleaved main memory.  One data port, one instruction port.
(While your at it, add an i/o port or two, what the hell.)
You will occasionally get a conflict, depending on the degree
of interleaving.  Of course, this isn't necessarily cheaper than
caches, certainly not as fast, and can only be justified cost-wise
with very large memories.

----------
Dave Christie
My opinions only, of course.

mbutts@mentor.com (Mike Butts) (08/26/89)

From article <26964@amdcad.AMD.COM>, by tim@cayman.amd.com (Tim Olson):
> In article <1989Aug24.215104.156@mentor.com> mbutts@mentor.com (Mike Butts) writes:
> | Another point about caches is that many fast RISC systems, such as Apollo's
> | DN10000 and the forthcoming Fujitsu SPARC-H, use separate instruction and data
> | caches to accomplish two memory operations in one cycle (so-called "Harvard
> | architecture").  You can't do that without caches unless you want to keep two
> | copies of main memory.
> 
> Check out VRAMs (Video-DRAMs).  They have two ports to the same memory
> -- a standard, random-access port, and a serial port that can be used
> to read sequential data concurrently with random accesses on the other
> port.

I take it you are proposing using the VRAM serial port for instruction fetches.
That would work out fine for linear code sequences, but would cost you a full
RAM cycle to do a branch.  Delayed branch architectures could mitigate that
cost somewhat, but it would still cost you.  It's cheaper than a cache, but not
as fast as a good one.  Has anyone tried using VRAMs like that in a real
system?
-- 
Michael Butts, Research Engineer       KC7IT           503-626-1302
Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton, OR 97005
...!{sequent,tessi,apollo}!mntgfx!mbutts  OR  mbutts@pdx.MENTOR.COM
Opinions are my own, not necessarily those of Mentor Graphics Corp.

tim@cayman.amd.com (Tim Olson) (08/29/89)

In article <1989Aug25.225511.828@mentor.com> mbutts@mentor.com (Mike Butts) writes:
| From article <26964@amdcad.AMD.COM>, by tim@cayman.amd.com (Tim Olson):
| > Check out VRAMs (Video-DRAMs).  They have two ports to the same memory
| > -- a standard, random-access port, and a serial port that can be used
| > to read sequential data concurrently with random accesses on the other
| > port.
| 
| I take it you are proposing using the VRAM serial port for instruction fetches.
| That would work out fine for linear code sequences, but would cost you a full
| RAM cycle to do a branch.  Delayed branch architectures could mitigate that
| cost somewhat, but it would still cost you.  It's cheaper than a cache, but not
| as fast as a good one.  Has anyone tried using VRAMs like that in a real
| system?

There are many 29K designs that use VRAMs as main memory.  The
performance hit you note for branches is mostly ameliorated by the
29K's branch target cache.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

phil@diablo.amd.com (Phil Ngai) (08/30/89)

In article <1989Aug25.225511.828@mentor.com> mbutts@mentor.com (Mike Butts) writes:
|I take it you are proposing using the VRAM serial port for instruction fetches.
|That would work out fine for linear code sequences, but would cost you a full
|RAM cycle to do a branch.  Delayed branch architectures could mitigate that
|cost somewhat, but it would still cost you.  It's cheaper than a cache, but not
|as fast as a good one. 

We started this discussion by talking about fast DRAMs which
approached the speed of rather fast SRAMs. So if this same high speed
DRAM technology were applied to VRAMs, then your "full RAM cycle
branch" would be comparable to a cache also. 

Also, some processors such as the Am29000 have a special Branch Target
Cache (tm) which further mitigate the cost of branches. 

On the other hand, it might be interesting to apply VRAM technology
to SRAMs...

| Has anyone tried using VRAMs like that in a real system?

Depends on what you mean by "real". If you mean something with as many
applications as a PC, not that I'm aware of. If you mean special boxes
that can run BSD and System V and DOS bridges etc, yes, I've built
hundreds. I'm also aware of other companies that have done so, I'm not
at liberty to talk about them but rumors have already been published
in widely circulated magazines. 

--
Phil Ngai, phil@diablo.amd.com		{uunet,decwrl,ucbvax}!amdcad!phil
"Today surgeons are highly respected but they were once just grave robbers."

mbutts@mentor.com (Mike Butts) (08/31/89)

From article <27013@amdcad.AMD.COM>, by phil@diablo.amd.com (Phil Ngai):
> In article <1989Aug25.225511.828@mentor.com> mbutts@mentor.com (Mike Butts) writes:
> |I take it you are proposing using the VRAM serial port for instruction fetches.
> |That would work out fine for linear code sequences, but would cost you a full
> |RAM cycle to do a branch.  Delayed branch architectures could mitigate that
> |cost somewhat, but it would still cost you.  It's cheaper than a cache, but not
> |as fast as a good one. 
> 
> We started this discussion by talking about fast DRAMs which
> approached the speed of rather fast SRAMs. So if this same high speed
> DRAM technology were applied to VRAMs, then your "full RAM cycle
> branch" would be comparable to a cache also. 

Only assuming the time to access the VRAM via the serial port and shift out the
first result is comparable to an SRAM access time.

I haven't received any data on the original question: How do the cycle times
compare to the access times on these new fast RAMs?
-- 
Michael Butts, Research Engineer       KC7IT           503-626-1302
Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton, OR 97005
...!{sequent,tessi,apollo}!mntgfx!mbutts  OR  mbutts@pdx.MENTOR.COM
Opinions are my own, not necessarily those of Mentor Graphics Corp.

acockcroft@pitstop.West.Sun.COM (Adrian Cockcroft) (09/01/89)

> I take it you are proposing using the VRAM serial port for instruction fetches.
> That would work out fine for linear code sequences, but would cost you a full
> RAM cycle to do a branch.  Delayed branch architectures could mitigate that
> cost somewhat, but it would still cost you.  It's cheaper than a cache, but not
> as fast as a good one.  Has anyone tried using VRAMs like that in a real
> system?
> -- 
> Michael Butts, Research Engineer       KC7IT           503-626-1302
> Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton, OR 97005

The Sun 4/110 is pretty close. It has no extra cache memory, just the main
DRAM banks which are built out of NMB2800 256K Static Column DRAMs or
1 M Fast page mode DRAMs with access times of about 80ns.

The effect is to have a cache with one line per bank of RAM where each line
is about 1K long. I think an  8Mb 4/110 using 256K RAMs had a total of 2K
effective cache and a 32 Mb 4/110 had a 4K effective cache. This is
cheap but it is not all that effective and Sun hasn't used that design again.

One side effect is that benchmark times can vary by as much as +/-30% in 
pathalogical cases and you need to average the results of lots of runs.
The problems occur when the current data page and the current intruction
page are both in the same banck of memory and are sharing one cache line
so that every load and store causes a cache miss.

Put a real cache in....

Note that for vector processors VRAMs can be used nicely to build
your vector pipe. This was done on the FPS T-series and (I think) on
the Sun/Trancept TAAC-1.

The most effective DRAM controller I have come accross is the Intel 82786
graphics chip, you can get about 40 Mbyte/s through a 16 bit wide interface
by using interleaved fast page mode DRAM's and you need hardly any glue logic.
It uses this bandwidth to implement "windows in hardware" which won't work
with VRAMs because you have to fetch data from all over the place.

Adrian

-- 
Adrian Cockcroft Sun Cambridge UK TSE sun!sunuk!acockcroft
Disclaimer: These are my own opinions

davidb@braa.inmos.co.uk (David Boreham) (09/03/89)

Mabe this will help the discussion about RAM types and speeds:

Known "best" RAMs of various kinds, devices which are available
commercially but may not yet be in volume prodn

SRAM-   about 10ns at 64K, 20ns at 256K. Cycle time = access time.
        Generally no latched address or data.

CMOS DRAM- about 60ns access, 120ns cycle at 1Mbit. 80ns access,
        160ns cycle at 4Mbit. (page mode and static column about 30ns cycle).

CMOS VRAM- Lags about 6--12 months behind normal DRAM in performance.
          Currently 100ns access, 190ns cycle. Serial port about 25MHz.
          Can expect to speed up to DRAM speeds in the next year.

BiCMOS DRAM- 35ns access, 70ns cycle. Very expensive. Non-multiplexed
          address. No static-column. 

As for the devices persented at ICCSD and other conferences (like
the 16ns DRAM mentioned awhile back), these device will not become
generally available for quite some time. (Usually conference paper
devices are at least 2 years away from production).

David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |      (us): uunet!inmos-c!davidb
+44 454 616616 ex 543        | Internet : @col.hp.com:davidb@inmos-c