[comp.arch] "zero wait states"

davidb@brad.inmos.co.uk (David Boreham) (04/06/90)

In article <10758@portia.Stanford.EDU> dhinds@portia.Stanford.EDU (David Hinds) writes:
>states.  However, I think a "x ns" DRAM takes "2x ns" for an access anyway,
>because the address lines are multiplexed and are strobed on two successive
>clock cycles.  So, with nothing special, you need 62 ns memory to get 0

Not quite. You are correct about the 2X access time for cycles. Not many
non-designers appreciate that. However, the reason is that the sense-amps
have to be precharged after every row access. Multiplexing the address is
not a significant speed hit until you get down to about 50ns access.

60ns DRAMs have been available for many years (first from INMOS) and have
very tight setup/hold times on the address inputs.

>incidentally.
>    On the subject of RAM chip specifications, there is something funny
>about quoting RAM speeds in multiples of 10 ns, with processor speeds in
>integral MHz numbers.  For example, I know that 100 ns memory is fast
>enough for 16MHz with 0 wait states when interleaved.  But this actually
>requires like 94 ns memory, by my calculations.  Is there some implicit
>tolerance in the RAM chip speeds, like are they always rounded up to the
>nearest 10 ns?

No. If the spec says 100ns then they can ship you 99.9ns parts. 
Of course the spec is for worst case temp and voltage margins and
includes a guard-band for tester innacuracy and device aging.
Because of pressures to extract the last ounce of performance,
PCs frequently overrun DRAMs. This is rather bad practice but
if you take the approach that the PC vendor can control the voltage
and so a certain extent the temperature, then mabe it's OK.
I would be surprised though that 100ns DRAMs would be overrun as
the major manufacturers have speed distributions centered on 80ns
therfore 80ns parts don't cost more than 100ns parts.

David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

mslater@cup.portal.com (Michael Z Slater) (04/07/90)

>:In article <6543@dell.dell.com>, sauer@dell.dell.com (Charlie Sauer) writes:
>:: 
>:: Which others have you looked at?  I would think they would be older parts
>:: since 386 pipelining is going/has gone away, I believe.
>:
>:    Where did you hear this?  ...
>
>I was misinformed.  My source says there had been a plan to discontinue suppor
t
>of pipelined addressing, but the plan was dropped.

When Intel moved to the 1-micron CHMOS-IV process last spring, they broke
some logic in the 386 that, under certain rare conditions, caused the
prefetch queue to be corrupted when running in pipelined bus mode. The
workaround is simply not to use pipelined mode.  This was deemed to be not
a big problem, since most systems were using cache at the higher clock rates,
and generally not using pipelining, and slow clock rate parts were still 
coming off the 1.5-micron process.  As of last July, Intel expected to ship
D1 parts with this bug fixed by the end of '89; I don't know what actually
happened.

Michael Slater, Microprocessor Report   mslater@cup.portal.com

daveh@cbmvax.commodore.com (Dave Haynie) (04/11/90)

In article <10758@portia.Stanford.EDU> dhinds@portia.Stanford.EDU (David Hinds) writes:
>In article <719@optis31.UUCP>, zepf@optis31.UUCP (Tom Zepf) writes:

>> What PC advertisements mean by "zero wait states" is really "zero IBM-XT
>> wait states". If you do a few simple calculations, you can see that
>> 80-100 ns. DRAMS don't stand a chance of producing REAL zero wait state
>> performance on anything like a 286 or 386. In fact, it is not clear to
>> me that the cached memory PCs run with REAL zero wait states either!

I don't know if the basis is really PC-XT, that sounds silly.  But it does
make lots of sense that these machines aren't running anything close to 
0 wait state memory.  First of all, if you had real 0WS memory, you could
throw out any cache on a PC machine -- the only other advantage of a 
cache would be in a multiprocessing system.  

>So, a 16MHz processor has a cycle time of 62.5 ns, giving 125 ns for a 
>read access with no wait states.  However, I think a "x ns" DRAM takes 
>"2x ns" for an access anyway, because the address lines are multiplexed 
>and are strobed on two successive clock cycles.  

Well, a DRAM has a cycle time roughly twice that of its row access time
(what the "100ns" on a 100ns DRAM really means), but that's not based on
the fact that addresses are multiplexed, it's based on another parameter
called "precharge time".  Simply put, you get data out of your 100ns DRAM
100ns after you strobe in the row address (assuming column times are all
done correctly).  Before you can next strobe in another address, you have
to wait about 80ns.  That's what the memory requires.  Then reality sets
in, and you have to figure how to design a system that gets addresses
out and multiplexed fast enough.  

>With interleaving, one wait state is hidden by address pipelining.  

Interleaving can hide the precharge time, which does probably correspond
to 1 wait state or more, depending on the system and the memory.  But 
interleaving isn't perfect; it only goes faster if you're always addressing 
every other bank.  In the worst case, it is no faster than non-interleaved
memory.  And it requires twice the number of devices and support logic.
Marketroids may claim it's zero wait state memory, and it may even look
that way sometimes, but it really isn't.

Another cool way to achieve faster access from plain old memories is to use
page-mode or static column parts.  A 100ns or 80ns DRAM generally has a 
column access time of 30ns-50ns.  Using static column parts really makes
life simple, because it eliminates the need to build a column address
strobe at a critical point.  The additional complexity is of course in
support logic that will detect a page hit (old row address == new row
address).  This can go much faster than a simple interleaved memory system,
probably dropping normal access time by a few wait states.  And it doesn't
require any additional devices.  The downside is that a page miss will
run slower than the basic memory cycle, since the memory controller will
have to fit a precharge cycle in during CPU access rather than CPU recycle
time.

>    On the subject of RAM chip specifications, there is something funny
>about quoting RAM speeds in multiples of 10 ns, with processor speeds in
>integral MHz numbers.  For example, I know that 100 ns memory is fast
>enough for 16MHz with 0 wait states when interleaved.  But this actually
>requires like 94 ns memory, by my calculations.  Is there some implicit
>tolerance in the RAM chip speeds, like are they always rounded up to the
>nearest 10 ns?

The memory speeds are rather arbitrary.  The DRAM rating really doesn't
tell you that much anyway.  In fact, you can find parts with the row
address time the same, but other vital numbers considerably different.
And the match between memory and any particular CPU depends on all kinds
of factors -- CPU speed, CPU memory interface, special CPU access modes
(eg, like "Burst" mode), wait state granularity, memory subsystem
architecture, etc.

> -David Hinds
>  dhinds@popserver.stanford.edu

-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough