curry@nsc.UUCP (03/23/87)
The following is a letter written by one of our architects to Electronic Design. I thought that it would be interesting reading on the net and would be good for creating some discussion. MC68030 COMPARISON WITH MC68020 I read with interest the "Assessing MC68030 and MC68882 Performance" letter from Roy Druian, Product Marketing Manager of Motorola, as well as Dave Bursky's "32-Bit Microprocessors 1987 Technology Forecast" in the Electronic Design magazine of Jan. 8, 1987. Mr. Druian's letter tried to demonstrate that Motorola's 68030 at 20MHz has twice the performance of 68020 at 16.67MHz. In Mr. Busky's survey, Motorola's 68030 is presented as having 3 times the performance of 68020. Using Motorola's own performance data for 68020 and the data available for 68030 (see bibliography below), I calculated the 68030 performance improvement relative to the 68020. These calculations, presented in detail below, show that the performance of 68030 is only 18% better than that of 68020 at the same frequency (20MHz) even if we believe the high hit ratios Motorola claims for its small (256 byte) internal caches. Using the data presented by Motorola in [2], the calculated 68030 performance is 3.3 MIPS at 20 MHz, while 68020 performance is 2.3 MIPS at 16.67 and 2.8 MIPS at 20MHz. The other 68030 problems that Motorola's papers do not stress are: -impossible timing for its synchronous bus protocol (very hard and expensive to run with zero wait-states at 20MHz). -virtual (logical) caches (see [1] for problems of virtual caches) -lack of hardware cache invalidation for the internal caches, which makes 68030 very hard to use in multiprocessing systems or where external devices (eg. DMA) may change memory values -long context switching time, because it has to save temporary data inside the CPU (problem inherited from 68020) -small TLB (Address Translation Cache) for the 68030 MMU (22 entries only) Actually 68030 has no real MMU, as the TLB misses are handled by the 68030 Execution Unit and not by the MMU itself, transparent to the execution pipe. The high price of a TLB miss (because it is handled in microcode, serial to the instruction execution), combined with the relatively low TLB hit ratio (should be 90 - 95% for the 22 entry TLB and not 98% as Motorola claims) makes the use of the on-chip 68030 MMU expensive in terms of performance. 68030 vs. 68020 Comparison The improvements of 68030 versus 68020 consist of improvement of the Instruction Cache hit ratio due to larger line size (16 bytes for 68030 vs. 4 bytes for 68020), addition of a small Data Cache and integration of the MMU on chip. As for the "internal Harvard architecture", 68020 allows also the overlap of instruction fetches with data operand access [3]. The 68030 left unchanged the Execution Unit, except for the addition of some instructions to support the TLB on chip [4]. As a consequence, the only change in the performance model of 68030 when compared with 68020 is the improvement in the performance penalty for addressing instructions and data in memory. Using Motorola's own data (Table 6 in [2]), the average number of operand memory access per instruction for 68020 are: -0.384 reads per instruction -0.242 writes per instruction The Data Cache with 48% hit ratio and the 2 clock bus cycle will improve the 68030 execution time by: 0.384 * 0.48 * 2 + 0.384 * 0.52 * 1 = 0.567 clocks where: - 0.384 represents the average number of operand reads per instruction, - 0.48 represents the Data Cach hit ratio, - 2 represents the difference, in clocks, between 68020 going to external memory (no wait states) and 68030 finding data in the internal cache, - 0.52 (1-0.48) represents the Data Cache miss ratio, - 1 represents the difference, in clocks, between the 3 clock bus cycle of 68020 and the 2 clock bus cycle of 68030. The writes have roughly the same influence on performance for 68020 and 68030, as the Data Cache is a write-through cache and writes are buffered by the BIU. The other 68030 improvement is the higher Instruction Cache hit ratio because of 16 byte line size (the effect of the burst is already taken into account in the hit ratio of the Instruction Cache and Data Cache). According to Table 4 in [2], the number of clocks-per-instruction for 68020 drops from 7.159 with a 64% hit ratio instruction cache to 6.373 with a 100% hit ratio instruction cache (an improvement of 0.786 clocks-per-instruction). The estimated improvement given to 68030 by its 82% hit ratio instruction cache and its 2 clock bus protocol (no wait state) is 0.5 clocks-per-instruc- tion. The overall performance improvements due to the architectural improvements of 68030 relative to 68020 is then: 0.567 + 0.5 = 1.067 clocks/instruction According to Motorola's own calculations in [2], the average performance of 68020 with the Instruction Cache ON for the workload in [2] is 7.159 clocks/ instruction (Table 4 in [2]) when no wait states are present. This translates into 2.3 MIPS at 16.67 MHz and 2.8 MIPS at 20MHz. The (7.159 -1.067) = 6.092 clocks/instruction for 68030 translates into 3.3 MIPS at 20MHz. The relative improvement in performance of 68030 versus 68020 at the same frequency is then: (3.3 - 2.8) * 100/ 2.8 = 18% If the 16.67 MHz 68020 is compared against the 20 MHz 68030, the performance improvement factor is still only: (3.3 - 2.3) * 100/ 2.3 = 43% 68030 Synchronous Bus Timing Even the 18% architectural improvements of 68030 versus 68020 is questionable because of the way the 68030 2-cycle synchronous bus protocol is designed. For a Read bus cycle, the 68030 issues the address at the beginning of the first clock cycle and expects the system to return the ready signal (named STERM) at the end of the same cycle. Data is sampled by 68030 in the middle of the second cycle. According to the 68030 spec [6], the read bus timing at 20MHz is: - Address-to-STERM time = 25 ns - Address-to-Data time = 40 ns For a Write bus cycle, 68030 issues the address at the beginning of the first clock cycle, data at the beginning of the second clock cycle and expects the system to return ready (STERM) at the end of the first cycle. According to the 68030 spec, the write bus timing at 20 MHz is : - Address-to-STERM time = 25 ns - Write Data Valid time = 25 ns It is hard to avoid wait-states at 20MHz with such timing even when using the fastest and most expensive static RAM's. No wonder the Motorola is unwilling to commit pushing 68030 beyond 20MHz; with the 68030 bus protocol wait states are unavoidable. Sorin Iacobovici Computer and System Architecture National Semiconductor Corp. Santa Clara, Ca. REFERENCES ---------- 1. A.J. Smith "Cache Memories", ACM Computing Surveys, vol.14, no. 3, September 1982, pp. 473-530 2. D. MacGregor, J. Rubinstein "A Performance Analysis of MC68020-based Systems", IEEE Micro, Dec. 1985, pp. 50-70 3. D. MacGregor, D. Mothersole and B. Moyer "The Motorola MC68020", IEEE Micro, Vol.4, no.4 Aug. 1984, pp. 101-118 4. J.T.Reinhart "Extra Functions and Higher Speed Push Microprocessor to Top", Electron Products, Oct. 1, 1986, pp.35-39 5. D. MacGregor "Diverse Applications Put Splotlight on 68020's Improvements", Electronic Design, Feb. 7, 1985, pp.155-164 6. Motorola Inc. "MC68030 -Second Generation 32-Bit Enhanced Microprocessor", Technical Data, Motorola Inc., 1986 pp. 1-27