falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) (10/28/84)
CC: A Commentary on the 68020 Performance Figures in IEEE Micro I've just finished a thorough reading of "The Motorola 68020" in the August issue of IEEE Micro. It is a good article, and I recommend it to those seeking more information about the 68020 microprocessor. However, there are elements in the article which may mislead the casual observer. In particular the performance figures provided in the last few pages provide a glimpse of what the processor chip can do, but give virtaully no understanding of how the 68020 will perform in a typical virtual-memory networked workstation. As for my interest, there have been many figures tossed about comparing 68K family chips to the VAX-11/780, but few give an accurate picture of what is going on, largely due to the fact that no benchmark job mix can be considered truly representative for all installations. I'm going to take a number of excerpts from the 68020 article, and give a personal rebuttal to them, with some rough comparisons. The usual disclaimers apply with respect to Digital and Motorola. I'm grinding my own axe. "Measurements were taken for an 8-MHz MC68010, a 16-MHz MC68020 with the [instruction] cache disabled, and a 16-MHz MC68020 in which all instruction accesses hit the cache. For each of these configurations, the performance --as measured in millions of instructions per second (MIPS) and as a percentage of bus utilization--was obtained. All measurements were based on no-wait-state memory accesses and, for the MC68020, on the use of a 32-bit data bus. The MIPS value was measured in MC68000 MIPS and was not standardized to any other machine. The measurements on the MC68020 were for the processor executing MC68000 code with no instruction overlap; thus, the MC68020 using the instruction and addressing mode enhancements was not measured, making the results conservative." (pp. 116-117) This paragraph sets the stage for my commentary. It is important to note the conditions under which the 68020 was tested - no wait-state memory, 32-bit bus, non-standardized MIPS, and pure MC68000 instruction streams. "This and other improvements allow the interface to support a 180-ns bus cycle time at 16.67 MHz, with an address-valid-to-data-valid specification of 120 ns." (p. 107) The 16.67 MHz clock together with a 3-clock bus cycle forces the 68020 into a raw memory architecture (w/o MMU) with a 120 ns cycle. The only possible way to satisfy this requirement economically is to augment the instruction buffer with a data cache. Depending on the size and structure of the cache and the nature of the applications, the resulting system will have a performance degradation of about 15% to 30% over a completely no-wait-state system. "Though the basic clock frequency is roughly doubled, performance is not correspondingly doubled, since there are dependencies on external factors such as memory access times... It is possible to see a compression of performance between clock speeds as the number of wait states increase." (p. 110) In other words, it is hard to say what the performance will be until the entire system is built. For example, the addition of an MMU to a 68020 system will probably add at least one wait state. The effect of additional wait-states can be dramatic as indicated by this table derived from Figures 16 and 18 (pp. 116-117). MC68000 MIPS ------------------------------------------------------ MEMORY 100ns 200ns 300ns 400ns 500ns CPU ----------------------------------------------- 8Mhz 68010 0.6 0.6 0.6 0.5 0.4 16MHz 68020* 2.1 1.5 1.2 1.0 0.8 16MHz 68020** 2.7 2.3 2.0 1.7 1.5 ------------------------------------------------ 68020 DECREASE 0% 15-30% 26-43% 38-53% 45-62% ------------------------------------------------------ * I-cache disabled ** 100% I-cache hit ratio The performance degradation is significant even for memory as fast as 200 ns. The introduction of additional wait-states from the MMU and cache misses could result in performance hits as high as 40%. Although the authors provided a figure showing this, all of the benchmarks were based on no-wait-state memory architectures. In the area of bus utilization, the 68020 also has problems. The following table, derived from Figures 17 and 19 (p. 116 & 118), illustrates how the memory architectures increase bus utilization to essentially intolerable levels. BUS UTILIZATION ------------------------------------------------------ MEMORY 100ns 200ns 300ns 400ns 500ns CPU ------------------------------------------------ 8Mhz 68010 87% 87% 87% 89% 90.5% 16MHz 68020* 82% 87% 89% 91% 93% 16MHz 68020** 65% 71% 73% 77% 80% ------------------------------------------------ 68020 INCREASE 0% 6-9% 8-12% 11-18% 13-23% ------------------------------------------------------ * I-cache disabled ** 100% I-cache hit ratio In the area of bus utilization, the 68020 also has problems. Many high-performance systems require that a substantial amount of bus bandwidth be available to other bus masters, such as network and disk DMA controllers. While the 68020 does reduce bus utilization from the near 90% of the 68010 to around 70%, the remaining 30% may not be enough for some applications and is not sufficient for a multi-processor architecture. Again, the solution is a data cache to reduce bus accesses, as the instruction buffer reduces bus utilization by as much as 17% in Motorola's best case. The results in fact are doubly deceptive. First, the 2+ MIPS figures bandied about are for a MC68020 with no-wait-state memory and 100% instruction cache hits. This is unrealistic. Secondly, the results are given in 68K MIPS, and can not be directly compared to other machines without scaling. The use of the new instructions and addressing modes will probably help to compensate for cache miss and memory architecture penalties, but again you need a real system to determine this. The real truth for the 68020 lies somewhere in between 1.5 and 2.7 68K MIPS. Now the important step is to convert these figures into MIPS for your favorite machine, lets say the VAX-11/780. After plowing through a number of Baskett's and Patterson's figures, given comparable compilers the VAX-11/780 is about 3 to 5 times faster than an 8MHz 68K. Using these factors to scale the bottom line of Table 7, we get the following, with adjustments for memory architectures: "VAX MIPS" --------------------------------------------------------- 68K MEMORY 100ns 200 ns 300ns CPU ---------------------------------------------------- 8MHz 68010 0.14-0.23 0.14-0.23 0.14-0.23 16MHz 68020* 0.42-0.7 0.3-0.5 0.24-0.4 16Mhz 68020** 0.56-0.9 0.46-0.76 0.4-0.66 --------------------------------------------------------- VAX-11/780 --> 0.7-1.0 <-- VAX-11/785 --> 1.0-1.5 <-- --------------------------------------------------------- * I-cache disabled ** 100% I-cache hit ratio Even in the best case, with no wait states and 100% cache hits, the 68020 can only just match the performance of the very old and now obsolete VAX-11/780. In the average cases, the 68020 hovers around 0.5 VAX MIPS, making it a powerful machine, but definitely less so than the VAX-11/780. Note how the increased memory access time adversely affects the 68020 performance. It will be interesting to compare the 68020 embedded in real systems to the coming family of high-performance VLSI-based VAX and microVAX systems. As I noted in a previous message, such comparisons will amount to a "fair fight" among systems of comparable architecture and cost. Joe Falcone Eastern Research Laboratory decwrl! Digital Equipment Corporation decvax!deccra!jrf Hudson, Massachusetts tardis!