[comp.arch] System clock rate vs. memory chip speed.

128b-4ba@web-4c.berkeley.edu (Iain Richard Tyrone McClatchie) (11/29/89)

Would someone please take a shot at explaining this to me:

A Mac Plus has a 68000 running at 8 Mhz. That's a clock time of ~126 ns. It
has 120 ns memory chips. Things seems clear in this case: It takes one clock
cycle for each memory access. OK.

A Mac SE/30 has a 68030 running at 16 Mhz. That's a clock time of ~63 ns. When
looking for SIMMs, I found 100 ns, 80 ns, and (I think) 70 ns versions. No-one
I have talked to has heard of 60 ns SIMMs.

If the ram takes 70 ns to answer, and the clock is 63 ns, shouldn't it take
two clock cycles to fetch a long word?

If that is the case, why sell faster than 120 ns ram, if 60 ns isn't available?
It would seem that 70ns acts just like 80ns acts just like 100ns.  Does the
68030 effectively run at 8 Mhz from RAM, and 16 Mhz from its cache? Finally,
why would 100ns RAM be _required_, like, 120 ns SIMMs won't work at all?
Do 60ns 1Mbit DRAMs exist, and how much are they? The 120ns variety are down
around $120/SIMM now, apparently.

If that is not the case, how is memory access asynchronous with the system
clock?

-Dazed and Confused
-Iain McClatchie

daveh@cbmvax.UUCP (Dave Haynie) (11/29/89)

in article <1989Nov28.222102.12113@agate.berkeley.edu>, 128b-4ba@web-4c.berkeley.edu (Iain Richard Tyrone McClatchie) says:

> A Mac Plus has a 68000 running at 8 Mhz. That's a clock time of ~126 ns. It
> has 120 ns memory chips. Things seems clear in this case: It takes one clock
> cycle for each memory access. OK.

Not even close.  Clock speed rating on a CPU has about as much to do with the 
actual memory bus speed as row access time (that 120ns) does with the actual
speed of the memory.

In the above case, the 68000 CPU takes 4 clock cycles for its fastest memory
cycle.  Rounding the Mac Plus's clock up to 8MHz (it's actually 7.8-something
MHz), you find that the minimum memory cycle for that Mac is 500ns.  DRAM
cycle time is usually just less than twice the TRAC time that's the standard
rating number you see.  So after you've run your 120ns for row address access,
you have probably another 80ns-100ns of row address precharge time before you
can access another memory cell.  Even if it comes out to 220ns, it should be
apparent that it doesn't take much cleverness to build a no wait state memory
system for a 68000 at 8MHz.

> A Mac SE/30 has a 68030 running at 16 Mhz. That's a clock time of ~63 ns. When
> looking for SIMMs, I found 100 ns, 80 ns, and (I think) 70 ns versions. No-one
> I have talked to has heard of 60 ns SIMMs.

Well, the rating on that part is 16.6667MHz, so the clock pulse comes out an
even 60ns.  All Motorola parts work out this way.  Apple runs this one at 
twice their 68000 machine speed -- 15-something MHz.  The fastest possible 
68030 cycle is two clocks, for a cycle time of 120ns at 16.666MHz.  But in the
SE/30, IIx, and IIcx, Apple's basically pretending that have a 68020 and 
using the asynchronous cycle, which runs in a minimum of 3 clocks, or 180ns
(the new Mac IIci treats the 68030 with more respect -- I think they run it
a tad faster than the NeXT machine).  

180ns is too fast for 100ns DRAMs without doing something clever, which they
don't.  So they'll definitely add a wait state, for a cycle time of 240ns.
But with 240ns, they can probably use 120ns DRAM without much trouble, which
at least back when the IIx came out, was cheaper than 100ns DRAM.

> If that is the case, why sell faster than 120 ns ram, if 60 ns isn't available?
> It would seem that 70ns acts just like 80ns acts just like 100ns.  

You have the right idea here, as I illustrated above, just the wrong granulatity.
If they couldn't spring for 90ns DRAM to meet the 180ns cycle time in that
16MHz machine, they might as well go for 120s -- 100ns parts would be a waste
of money.

> Finally, why would 100ns RAM be _required_, like, 120 ns SIMMs won't work 
> at all?

You should believe whatever they tell you is required.  The match of CPU to
memory gives you an idea of what's possible, knowing the cycle time of CPU
and the cycle time of the DRAM.  It doesn't tell you anything about how they
actually implemented this memory.  Perhaps their refresh logic or some other
activity takes a little time out of the memory cycle, pushing the requirements
over to 100ns parts instead of 120ns, even though on paper there's no 
obvious reason for a 16MHz 68030 to want 100ns DRAM.  You have to know
how the whole system works, not just the CPU to memory interface, to have
a chance of understanding why everything does together the way they did it.

> Do 60ns 1Mbit DRAMs exist, and how much are they? 

They are in 1 Meg x 1 packages now, in rather small quantities, and they
aren't cheap.  But they should be pretty soon.

> If that is not the case, how is memory access asynchronous with the system
> clock?

In many systems today you'll find the CPU clock is a multiple of the bus
speed.  One good reason for this is to make it much more reasonable to 
add wait states.  One wait becomes 33% or 25% of your cycle time, not 100%.
Of course, as clocks get faster, it's going the other way so that you
don't have to work with clocks of outrageous speed.  And so the new chips 
look faster than the old ones, since everyone sees "clock speed" as the
indicator of performance and cost, when they really should look at "bus
speed" as well.

In the old days (long ago, when things were 8 bit) we used to stretch the
CPU clocks to add wait states, at least in 6502 systems.  On the 6502, one
bus clock == one CPU clock, and so wait states were expensive if you
really waited one whole CPU clock.

> -Dazed and Confused
> -Iain McClatchie
-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough

alvitar@weasel.austin.ibm.com (Phillip L. Harbison) (11/30/89)

In article <8747@cbmvax.UUCP>, daveh@cbmvax.UUCP (Dave Haynie) writes:
> In the above case, the 68000 CPU takes 4 clock cycles for its fastest memory
> cycle.  Rounding the Mac Plus's clock up to 8MHz (it's actually 7.8-something
> MHz), you find that the minimum memory cycle for that Mac is 500ns.
> ...
> Even if it comes out to 220ns, it should be apparent that it doesn't take
> much cleverness to build a no wait state memory system for a 68000 at 8MHz.

While the 68000 can perform a bus transfer in a minimum of 4 clock cycles,
the memory system does not necessarily have 4 clock cycles to respond.  The
address bus is not valid until well into the second clock cycle, and the
data must be latched at the middle of the fourth cycle.  Therefore, at least
1.5 cycles are wasted, leaving 2.5 cycles (312.5 nsec. at 8MHz).  To further
complicate matters, the memory system must tell the 68000 that data will be
available (by asserting DTACK) no later than the middle of the third cycle
if the data is to be latched in the fourth cycle.  Building a no-wait-state
memory system is not quite as trivial as you seem to imply.  Several of the
early DRAM controller chips (like the 8408) had a difficult time of this.

> The fastest possible 68030 cycle is two clocks, for a cycle time of 120ns
> at 16.666MHz.  But in the SE/30, IIx, and IIcx, Apple's ... using the
> asynchronous cycle, which runs in a minimum of 3 clocks, or 180ns ...

Once again, the entire bus cycle is not available to the memory.  In the
case of synchronous bus cycles, the memory system must assert STERM by the
beginning of the second cycle for data to be latched in the middle of the
second cycle.  Also, the address bus is not valid until around the middle
of the first cycle, so the memory system only has one clock cycle to get
the job done.  I've found this to be difficult for anything other than very
fast SRAM (as used in a cache).

The asynchronous bus cycle timing is similar to the 68000 timing, except
they got rid of the dead first cycle.  The address bus is valid shortly
after the beginning of the first cycle.  DSACKn must be asserted by the
middle of the second cycle if data is to be latched in the middle of the
third cycle.  At best, the memory system has about 2 full cycles to get
the job done, less if it has to check the data (i.e. parity or ECC) before
it asserts DSACK.

Having spent most of the last two months working on 88000 memory timing, I
can sympathize with anyone having difficulty designing no-wait-state or few-
wait-state memory systems.  It is never as easy as they make it sound in
the data books.  :-(

----
Live: Phil Harbison
Mail: alvitar@weasel.austin.ibm.com

"Skin it back!"

henry@utzoo.uucp (Henry Spencer) (11/30/89)

In article <8747@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes:
>In the old days (long ago, when things were 8 bit) we used to stretch the
>CPU clocks to add wait states, at least in 6502 systems.  On the 6502, one
>bus clock == one CPU clock, and so wait states were expensive if you
>really waited one whole CPU clock.

The old days weren't so long ago.  The Sun 3/180 (also 160 and 75, which
used the same CPU board) did the same thing with the 68020, so they could
run with 1.5 wait states instead of 2.
-- 
That's not a joke, that's      |     Henry Spencer at U of Toronto Zoology
NASA.  -Nick Szabo             | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

daveh@cbmvax.UUCP (Dave Haynie) (12/07/89)

in article <3069@cello.UUCP>, alvitar@weasel.austin.ibm.com (Phillip L. Harbison) says:

> In article <8747@cbmvax.UUCP>, daveh@cbmvax.UUCP (Dave Haynie) writes:
>> In the above case, the 68000 CPU takes 4 clock cycles for its fastest memory
>> cycle.  Rounding the Mac Plus's clock up to 8MHz (it's actually 7.8-something
>> MHz), you find that the minimum memory cycle for that Mac is 500ns.
>> ...
>> Even if it comes out to 220ns, it should be apparent that it doesn't take
>> much cleverness to build a no wait state memory system for a 68000 at 8MHz.

> While the 68000 can perform a bus transfer in a minimum of 4 clock cycles,
> the memory system does not necessarily have 4 clock cycles to respond.  ...
> Building a no-wait-state memory system is not quite as trivial as you seem to 
> imply.  Several of the early DRAM controller chips (like the 8408) had a 
> difficult time of this.

Sure it is.  Don't even bother with a DRAM controller.  Build it with a couple
of PALs and TTL parts.  There are about 10 different expansion memory boards
for the Amiga 2000 that run with 0 wait states; most use hand-made memory
controllers.  Obviously you don't have the entire 4 clocks to access the memory,
but you don't need it either.  Using 150ns parts, it's possible to fit the
memory access by the 68000 and a CAS-before--RAS refresh cycle into one 
68000 memory cycle.

>> The fastest possible 68030 cycle is two clocks, for a cycle time of 120ns
>> at 16.666MHz.  But in the SE/30, IIx, and IIcx, Apple's ... using the
>> asynchronous cycle, which runs in a minimum of 3 clocks, or 180ns ...

> Once again, the entire bus cycle is not available to the memory.  In the
> case of synchronous bus cycles, the memory system must assert STERM by the
> beginning of the second cycle for data to be latched in the middle of the
> second cycle.  

Of course the memory system must assert the acknowledge quickly. So what? 
As long as you know when the memory read is going to be valid, that's no
problem.  You don't have to wait the entire memory access time before
acknowledging, you just have to take care of any nondeterministic stuff
before acknowledging the cycle.  In fact, with many cache designs, you
don't even worry about that -- you assume that it's a cache hit, assert
STERM before you really know, and re-run the cycle if it's a miss.

> Also, the address bus is not valid until around the middle of the first 
> cycle, so the memory system only has one clock cycle to get the job done.
> I've found this to be difficult for anything other than very fast SRAM 
> (as used in a cache).

That's still 60ns for the 16MHz 68030.  Fast static cache these days goes
around 35ns, fast PALs around 7.5ns; you can go faster if you want to
pay for it.  Certainly no real easy trick with SCRAM, but you still have
around 25ns from address valid to termination to decide, on an open 
page (access time of 35ns-50ns, depending on the part) whether or not you
want to run the cycle.  Which should be done in a single PAL.  Using a gate
array or other LSI-type controller will probably add a wait state no matter 
what you do.  Obviously it's not a piece of cake, or they'd just be hiring 
anyone off the street to design 68030 systems.  But designing for 16MHz or 
even 25MHz 68030 systems is hardly a black art, and there are plenty of 
parts out that will support these systems.

> The asynchronous bus cycle timing is similar to the 68000 timing, except
> they got rid of the dead first cycle.  The address bus is valid shortly
> after the beginning of the first cycle.  

Actually, addresses are always valid at the S0->S1 edge.  You had it right 
above; the chip doesn't know if it's a synchronous or asynchronous cycle 
until you terminate the cycle.  It's also good to have a few more clock
edges than what you get basing everything on the 68030 clock; I've done 
it both ways, and wouldn't attempt another system without at least 90
degree clocks around.

> At best, the memory system has about 2 full cycles to get the job done, 
> less if it has to check the data (i.e. parity or ECC) before it asserts 
> DSACK.

You never HAVE to check the parity or data before asserting the DSACKs;
you can always assume it's OK, and retry the cycle if it isn't.  Sure
that's more complicated, but it's also possibly faster.  Obviously you
can't let the cycle complete before the check is finished; same basic
idea as using retry for the cache -- you assume the thing that usually
happens is going to happen, and retry the cycle for the unusual case.

> Having spent most of the last two months working on 88000 memory timing, I
> can sympathize with anyone having difficulty designing no-wait-state or few-
> wait-state memory systems.  It is never as easy as they make it sound in
> the data books.  :-(

I have yet to play with 88ks unfortunately :-(, so I can't comment on that
timing.  It's certainly never as easy as the data book claims, but more 
than a few folks seem to have figured it all out.

> Live: Phil Harbison
> Mail: alvitar@weasel.austin.ibm.com
> 
> "Skin it back!"
-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough