[net.micro.68k] 68020 Performance Revisited after reading IEEE Micro

falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) (10/28/84)
CC:	 


A Commentary on the 68020 Performance Figures in IEEE Micro

I've just finished a thorough reading of "The Motorola 68020" in the August
issue of IEEE Micro. It is a good article, and I recommend it to those
seeking more information about the 68020 microprocessor.

However, there are elements in the article which may mislead the casual
observer. In particular the performance figures provided in the last few
pages provide a glimpse of what the processor chip can do, but give virtaully
no understanding of how the 68020 will perform in a typical virtual-memory
networked workstation.

As for my interest, there have been many figures tossed about
comparing 68K family chips to the VAX-11/780, but few give an accurate 
picture of what is going on, largely due to the fact that no benchmark
job mix can be considered truly representative for all installations.
I'm going to take a number of excerpts from the 68020 article, and give a
personal rebuttal to them, with some rough comparisons.  The usual disclaimers 
apply with respect to Digital and Motorola.  I'm grinding my own axe.

    "Measurements were taken for an 8-MHz MC68010, a 16-MHz MC68020 with the
    [instruction] cache disabled, and a 16-MHz MC68020 in which all instruction
    accesses hit the cache. For each of these configurations, the performance
    --as measured in millions of instructions per second (MIPS) and as a 
    percentage of bus utilization--was obtained. All measurements were based on
    no-wait-state memory accesses and, for the MC68020, on the use of a 32-bit
    data bus. The MIPS value was measured in MC68000 MIPS and was not
    standardized to any other machine. The measurements on the MC68020 were for
    the processor executing MC68000 code with no instruction overlap; thus, the
    MC68020 using the instruction and addressing mode enhancements was not
    measured, making the results conservative." (pp. 116-117)

This paragraph sets the stage for my commentary.  It is important to note
the conditions under which the 68020 was tested -  no wait-state memory,
32-bit bus, non-standardized MIPS, and pure MC68000 instruction streams.

    "This and other improvements allow the interface to support a 180-ns bus
    cycle time at 16.67 MHz, with an address-valid-to-data-valid specification
    of 120 ns." (p. 107)

The 16.67 MHz clock together with a 3-clock bus cycle forces the 68020 into
a raw memory architecture (w/o MMU) with a 120 ns cycle. The only possible
way to satisfy this requirement economically is to augment the instruction
buffer with a data cache. Depending on the size and structure of the cache
and the nature of the applications, the resulting system will have a
performance degradation of about 15% to 30% over a completely no-wait-state
system.

    "Though the basic clock frequency is roughly doubled, performance is not
    correspondingly doubled, since there are dependencies on external factors
    such as memory access times... It is possible to see a compression of
    performance between clock speeds as the number of wait states increase."
    (p. 110)

In other words, it is hard to say what the performance will be until the
entire system is built.  For example, the addition of an MMU to a 68020
system will probably add at least one wait state. The effect of additional 
wait-states can be dramatic as indicated by this table derived from Figures
16 and 18 (pp. 116-117).

		        MC68000 MIPS
------------------------------------------------------
      MEMORY	100ns	200ns	300ns	400ns	500ns
  CPU -----------------------------------------------
8Mhz  68010	0.6	0.6	0.6	0.5	0.4
16MHz 68020*	2.1	1.5	1.2	1.0	0.8
16MHz 68020**	2.7	2.3	2.0	1.7	1.5
      ------------------------------------------------
68020 DECREASE	0%	15-30%	26-43%	38-53%	45-62%
------------------------------------------------------
*  I-cache disabled
** 100% I-cache hit ratio

The performance degradation is significant even for memory as fast as 200 ns.
The introduction of additional wait-states from the MMU and cache misses
could result in performance hits as high as 40%.  Although the authors 
provided a figure showing this, all of the benchmarks were based on 
no-wait-state memory architectures.

In the area of bus utilization, the 68020 also has problems.  The
following table, derived from Figures 17 and 19 (p. 116 & 118), illustrates
how the memory architectures increase bus utilization to essentially
intolerable levels.

		       BUS UTILIZATION
------------------------------------------------------
      MEMORY	100ns	200ns	300ns	400ns	500ns
  CPU ------------------------------------------------
8Mhz  68010	87%	87%	87%	89%	90.5%
16MHz 68020*	82%	87%	89%	91%	93%
16MHz 68020**   65%	71%	73%	77%	80%
      ------------------------------------------------
68020 INCREASE	0%	6-9%	8-12%	11-18%	13-23%
------------------------------------------------------
*  I-cache disabled
** 100% I-cache hit ratio

In the area of bus utilization, the 68020 also has problems. Many
high-performance systems require that a substantial amount of bus bandwidth
be available to other bus masters, such as network and disk DMA controllers.
While the 68020 does reduce bus utilization from the near 90% of the 68010
to around 70%, the remaining 30% may not be enough for some applications and
is not sufficient for a multi-processor architecture.  Again, the solution 
is a data cache to reduce bus accesses, as the instruction buffer reduces bus
utilization by as much as 17% in Motorola's best case.

The results in fact are doubly deceptive. First, the 2+ MIPS figures
bandied about are for a MC68020 with no-wait-state memory and 100%
instruction cache hits. This is unrealistic. Secondly, the results are given
in 68K MIPS, and can not be directly compared to other machines without
scaling. The use of the new instructions and addressing modes will probably
help to compensate for cache miss and memory architecture penalties, but
again you need a real system to determine this.

The real truth for the 68020 lies somewhere in between 1.5 and 2.7 68K MIPS.
Now the important step is to convert these figures into MIPS for your 
favorite machine, lets say the VAX-11/780.  After plowing through a number of
Baskett's and Patterson's figures, given comparable compilers the VAX-11/780
is about 3 to 5 times faster than an 8MHz 68K.  Using these factors to scale
the bottom line of Table 7, we get the following, with adjustments for memory
architectures:

                              "VAX MIPS"
---------------------------------------------------------
     68K MEMORY	100ns		200 ns		300ns
CPU  ----------------------------------------------------
8MHz  68010	0.14-0.23	0.14-0.23	0.14-0.23	
16MHz 68020*	0.42-0.7	0.3-0.5 	0.24-0.4
16Mhz 68020**	0.56-0.9	0.46-0.76	0.4-0.66
---------------------------------------------------------
VAX-11/780	    -->		0.7-1.0		<--
VAX-11/785	    -->		1.0-1.5		<--
---------------------------------------------------------
*  I-cache disabled
** 100% I-cache hit ratio

Even in the best case, with no wait states and 100% cache hits, the 68020
can only just match the performance of the very old and now obsolete 
VAX-11/780.  In the average cases, the 68020 hovers around 0.5 VAX MIPS, 
making it a powerful machine, but definitely less so than the VAX-11/780. 
Note how the increased memory access time adversely affects the 68020 
performance.

It will be interesting to compare the 68020 embedded in real systems to
the coming family of high-performance VLSI-based VAX and microVAX systems.
As I noted in a previous message, such comparisons will amount to a
"fair fight" among systems of comparable architecture and cost.

Joe Falcone
Eastern Research Laboratory		decwrl!
Digital Equipment Corporation		decvax!deccra!jrf
Hudson, Massachusetts			tardis!