[comp.arch] 68020 speeds and wait states

gnu@hoptoad.uucp (John Gilmore) (05/09/87)

> > according to the latest rumors the Pinnacle XL020/MPulse 20 runs a 68020
> > at 16MHz, 1 wait state.  Supposedly John Bremsteller has been working on
> > getting the 020 up to 25MHz with only 1 or 2 wait states.

In article <8003@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
> You will forgive us, I trust, for not being too impressed...  The Sun-2,
> now obsolete, ran a 68K at 12MHz with no wait states.  The early Sun-3
> models, starting to look dated, run a 68020 at 16MHz with 1.5 wait states
> (how in @#$%@ do they get half a wait state?...).

Actually, Sun-2's all ran at 10MHz with no wait states, using 64K RAMs
(1MB per packed Multibus board).  Today you could probably run the same 
design at 12MHz using faster 256K or 1MB RAMs.  Indeed, the original SUN
at Stanford was an 8MHz board which was goosed to the 10MHz Sun-1 as the
first Sun Microsystems product.

Early Sun-3's actually run 16.67MHz (60ns clock) at 1.5 wait states.
The half wait state is done by generating the CPU clock with a fast PAL
which stretches the cycle of interest to 90ns.  Running at 1.5 wait states
is actually a conservative design; it could probably have been done at 1 wait
state (indeed, ISI did it that way) but after years of design-on-the-edge,
there was some sentiment for making the Sun-3's easier to manufacture.

The Sun-3/50 (the $5000 model) runs at 15MHz at 1 wait state, which
comes out to almost the same speed (268ns versus 270ns memory cycles).
But its video is refreshed from main memory, which eats cycles.  If you
shut off the video, it should run within a few percent of a 3/160.

Sun's presentation at the introduction of the Sun-3/200 mentions these
possibilities for running fast 68020's (I added the Mem Cyc column, which
is calculated by adding 3 to the wait states and multiplying by the clock
rate in ns):

	System		Limit	w/s	Cyc/Inst Clock	MIPS	Mem Cyc

3/160	no cache	Memory	1.5	8.0	16.67MHz 2.1	270 ns
	no cache	Memory	2.0	8.6	20 MHz	2.3	250
	no cache	Memory	3.0	9.8	25 MHz	2.6	240
3/260	cache		CPU bus	0.1	6.3	25 MHz	4.0	155

This is Bremsteller's proposed machine:

	no cache	Memory	2.0	???	25 MHz	???	200

If he can get his RAMs to consistently run 200ns cycles, he may have
a winner.  It will only win by 20-25% over the 3/160 (not double like
the 3/260) but if he can build it and make it run, why not?  I suspect
he'd have more luck with the 68030 since it does burst fetches into
cache, which can be fed quickly with static column RAMs without building
a whole board full of cache like the 3/260.
-- 
Copyright 1987 John Gilmore; you may redistribute only if your recipients may.
(This is an effort to bend Stargate to work with Usenet, not against it.)
{sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu	       gnu@ingres.berkeley.edu

henry@utzoo.UUCP (Henry Spencer) (05/10/87)

> Actually, Sun-2's all ran at 10MHz with no wait states...

Wups, my mistake, John is right (as he should be, since he was at Sun when
all this was happening...).  Bit rot in my memory; I did know better.

> Early Sun-3's actually run 16.67MHz (60ns clock) at 1.5 wait states.

It gets so annoying adding the extra digits that I tend to abbreviate 16.67
to 16...

> ... it could probably have been done at 1 wait
> state (indeed, ISI did it that way) but after years of design-on-the-edge,
> there was some sentiment for making the Sun-3's easier to manufacture.

ISI in fact does 1 wait state on reads and 0 on writes, presumably by
buffering writes.  Interesting machine; we almost bought one.  Definitely
faster than the 16.67-MHz Sun-3s by a modest margin, but other factors
weighed in and we went with a Sun.
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

jeff@felix.UUCP (Jeff Wallace) (05/12/87)

> ...I suspect
> he'd have more luck with the 68030 since it does burst fetches into
> cache, which can be fed quickly with static column RAMs without building
> a whole board full of cache like the 3/260.

	I believe the 68030 cache fill requires the use of ripple-mode DRAMs
unless one wishes to generate the lower address signals external during
this process.  While on the subject, what are ripple-mode DRAMs popular
for?  Although the ripple cycle time is typically faster than that for
static column or page mode devices the fact that the ripple sequence uses
rowA8 and colA8 seems to prevent sequencial access, doesn't it?  Doesn't
it seem more logical to use colA1 and colA0?

-- 
					  Jeff Wallace
				{decvax,ucbvax}!trwrb!felix!jeff
				  FileNet Corp. Costa Mesa, CA

phil@amdcad.UUCP (05/13/87)

In article <2728@felix.UUCP> jeff@felix.UUCP (Jeff Wallace) writes:
>	I believe the 68030 cache fill requires the use of ripple-mode DRAMs
>unless one wishes to generate the lower address signals external during
>this process.

You must be talking about nibble mode. There is one company (Vitelic)
offering a ripplemode but it is not what you think. Their ripplemode
means the access time is mostly determined from when a stable column
address is ready rather than from when CAS is activated. Column
address access time is 55 nS while CAS access time is only 25 nS. The
column address is latched when CAS is activated; this makes address
pipelining easier and page mode faster. Page cycle times of 75 nS
(plus data setup time on the device using the data) are no sweat.
Contrast this to ordinary page mode, where first you have to provide
CAS precharge, set up the column address, and then access time is
determined from CAS activation, for a minimum cycle time of 120 nS. 

Ripplemode is almost as nice as static column access DRAMs.

> While on the subject, what are ripple-mode DRAMs popular
>for?  Although the ripple cycle time is typically faster than that for
>static column or page mode devices the fact that the ripple sequence uses
>rowA8 and colA8 seems to prevent sequencial access, doesn't it?  Doesn't
>it seem more logical to use colA1 and colA0?

You can use the pins as you see fit. The pin police will not arrest
you for reassigning the values. 

To be specific, you are assuming that your system's A0, A1, A2, etc
are mapped: A0 -> colA0, A1 -> colA1, A8 -> colA8, A9 -> rowA0, etc.

Just move the wires around to give: A0 -> colA8, A1 -> rowA8, and
connect the rest of the pins in any convenient way. 

If you want to understand why the nibble mode works the way it does,
read the chip's operation description. 

TI has a data sheet on an extended nibble mode DRAM, TMX4C1029, which
gives 1024 bits in nibble mode. This looks like it could be VERY
useful, particularly for AMD's 29000 which has a burst mode protocol
on its instruction and data bus. 

-- 
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com

wolfgang@haddock.UUCP (Wolfgang Rupprecht) (05/21/87)

>> ... the fact that the ripple sequence uses
>>rowA8 and colA8 seems to prevent sequencial access, doesn't it?  Doesn't
>>it seem more logical to use colA1 and colA0?
>You can use the pins as you see fit. The pin police will not arrest
>you for reassigning the values. 
>Just move the wires around to give: A0 -> colA8, A1 -> rowA8, and
>connect the rest of the pins in any convenient way. 

The pin police will have a good laugh though, if they see you
refreshing your drams by strobing 2**n SEQUENTIAL addresses
when some of them *don't* go to row-addresses! 

The cheapest way (hardware-wise) to refresh drams is to have a strip
of nops (ie 256 bytes of 'em) that the cpu executes once per refresh
time. This is usually just as fast as some hardware counter doing the
same thing. (Just make sure that your I-cache is turned off!).
-- 
Wolfgang Rupprecht 			haddock.ISC.COM!wolfgang

phil@amdcad.AMD.COM (Phil Ngai) (05/21/87)

In article <494@haddock.UUCP> wolfgang@haddock.ISC.COM.UUCP (Wolfgang Rupprecht) writes:
>The pin police will have a good laugh though, if they see you
>refreshing your drams by strobing 2**n SEQUENTIAL addresses
>when some of them *don't* go to row-addresses! 

Doesn't everyone know about CAS-before-RAS refresh? DRAMs these days
have refresh counters built into them so you don't have to supply a
refresh address. DRAMs these days are wonderously complicated and
getting more so continually. 

What you do in the privacy of your own chip address-wise is no
business of the pin police. 

>The cheapest way (hardware-wise) to refresh drams is to have a strip
>of nops (ie 256 bytes of 'em) that the cpu executes once per refresh
>time. This is usually just as fast as some hardware counter doing the
>same thing.

Yes, but when your software crashs, the contents of core evaporate. I
always believe in hardware controlled refresh. It's not that
expensive.  A refresh counter and a few extra states in the RAM
controller. 

-- 
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com