[comp.sys.atari.st] 68030 Chip Clarifications

daveh@cbmvax.commodore.com (Dave Haynie) (06/12/90)
In article <524@cvbnetPrime.COM> jshekhel@feds19.UUCP (Jerry Shekhel ) writes:
>In article <10373@batcomputer.tn.cornell.edu> (Daniel S. Riley) writes:

>>That wasn't mindless ravings.  Seriously, that 1/3 number confuses me--it
>>should be lots higher.  Going from a 16 bit to 32 bit machine and doubling
>>the clock speed (8 MHz to 16 MHz) should give you at least a factor of 4,
>>and then some because the '30 takes fewer clocks to do some things.

>Not really, becase at 16 MHz, an uncached system is no longer zero-wait,
>unless you use 60ns DRAMs, which I doubt these machines will use.  Am I
>wrong about this?  I'd like to know for sure.

Well, you're right about wait-states, but that doesn't matter.  In rough
terms, you get quite a few "factor of two" optimizations.  First there's
bus width.  The 32 bit vs. 16 bit bus does help considerably, mainly based
on the way the 68030 fetches, and its instruction pipeline works.  Much
plain 68000 code is 32 bit code/data, and even that which isn't is going to
get a boot from the pipeline.  Next comes the clock speeds, which are
typically 2x-3x that of a 68000's clock.  Then you get the memory cycle,
which is twice as fast at the same clock speed, without wait states: the
'030 fetches 32 bits in 2 clocks, the 68000 fetches 16 bits in 4 clocks.
Next come the caches.  A cache hit really takes 2 clocks, but it also has
the effect of freeing up the bus unit.  So if the CPU needs an instruction
and a datum, and one is in the cache, fetching one from the bus and one
from the cache is a big win.  In fact, in the case of cache hits for both
instruction and data, the 68030's Harvard architecture permits both to be
used at the same time -- it's basically 64 bits internally.  If you all up
all these possible "factor of 2" improvements, you conclude that a 68030
at 16MHz might run roughly 16 times faster than a 68000 at 8MHz.  That's
really a peak value, but considering the optimizations done on many of the
standard instructions, the peak performance might be even higher.  Now,
in reality, you're not always running simultaneous cache hits, you may not
have a full pipeline, and you may not be running with 0 wait state memory.
But it's quite reasonable to see a factor of 4-8 speedup in common, ordinary
CPU bound applications.

>-- JJS


-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"I have been given the freedom to do as I see fit" -REM