falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) (09/07/84)
CC: When is 16Mhz, not 16Mhz? It is difficult to discuss performance of the 68020 chip "out of context", i.e., without information about the rest of the system components. Because of its potentially very great speed (16.67 Mhz), the 68020 places significant demands on the memory and I/O portions of a system. One can assume that the 68020 instruction buffer and perhaps a cache designed into the system can reduce the demands on the memory system, however this reduction is highly dependent on the workload and the design of the cache and memory. No two system designs incorporating the 68020 chips are likely to be the same. The memory demands have worried me ever since a processor called the HP 9000 claimed to be able to run at 18Mhz with a relatively memory intensive stack architecture and no cache. The HP 9000 got around the problems by decoding and presenting addresses extremely early in instruction execution to a heavily pipelined memory controller which sure enough could deliver a word every 110ns (every other processor cycle). The memories were specially developed 128k nmos rams which ran fast and hot. Although the scheme worked, it was a technological "house of cards" since every piece was critically dependent on the performance of other components. Although it was claimed that one could run the HP 9000 processor chips at speeds over 30Mhz in the lab, the operation of these chips at that speed would have necessitated a faster bus, memory controller, andrams. So even though the cpu chip remains unchanged, the rest of the system has a fit. Now the problem with the 68020 is that it simply does not present addresses soon enough to the memory. Therefore, to run with no wait states, one must be able to provide the requested data within the few cycles allotted. In many current 68000 systems, the cycles allotted are not sufficient to avoid wait states because of delays from the memory management unit or slow rams. Therein lies a dilemma, for all of us want memory management of some form, and the cheaper, denser rams tend to be a little slower until technology catches up. My SR-50 tells me that a 16.67MHz clock gives a 60ns processor cycle. Assuming 4 cycles per access, that gives you 240ns for the round trip (address out, data in). A 200 nanosecond multi-megabyte main memory and management unit would be prohibitively expensive to integrate into a moderately priced 68020 workstation (although it is possible to do it at a definite big price). Hence, the crying need for a cache to assist data accesses and perhaps supplement the instruction buffer. Now, unless someone designs a cache which gives 100% data and instruction hit rates, that 16.67Mhz clock will be degraded by misses, and unless the memory subsystems are especially fast, there would be multi-cycle waits. Just off the top of my head, using a 90% hit rate cache, one might be looking at performance degradation anywhere between 10 and 25% due to miss penalties (it all depends on how big your cache is, how fast your main memory is, and how quickly your cache cycles the main memory to get what you need). So the 16.67Mhz clock has fallen as low as 12Mhz. Sometimes I think the FTC should get involved with this stuff. As a final note, the following is my own opinion as an educated individual. Isn't it kind of ridiculous to compare a cpu chip set to a very large computer system with caches and high speed I/O buses. I'm sure there are a lot of cpus out there that can beat a VAX-11/780 one-on-one running some benchmark. On the other hand, how many of them can handle the cpu, virtual memory, and I/O demands of 20 to 40 users? The fact is that there is a lot of stuff in the 780 (special instructions, cache, SBI, massbus, unibus, etc) just for handling lots of users for long periods of time. Unfortunately, this stuff does tend to get in the way of tests of raw, single-user, processor performance (either directly or indirectly because of compromises in the design process). The 780 is not trying to pretend to be a single-user workstation. MORAL: If you want to compare the 68020 (or any microprocessor) to a VAX, wait for the figures on the forthcoming VAX chip sets (which were discussed at some of the chip conferences). At the system level, the microVAX line of small systems is designed and packaged more for one-on-one personal use. The microVAX and VAX chip sets offer some very interesting comparison opportunities for "fair fights" between Digital and the competition. In the meantime, as an exercise, one might want to examine the performance of the grand old pdp-11/70 vs. the J-11 (11/73) chip set, and both of them relative to other 16-bit processors. With its memory management and floating point support, the 11/73 performs very well as a few user Qbus system. But the 11/70 is clearly still the choice for large systems because of its cache, special memory architecture, and unibus/massbus I/O. Although the machines have similar performance, you just would not want to put 20 users on an 11/73 - it doesn't have all the extras to take care of all those people. So the next time you want to compare microprocessors, go pick on someone your own size. Then you will have a more valid comparison. Joe Falcone Eastern Research Laboratory decvax! Digital Equipment Corporation decwrl!deccra!jrf Hudson, Mass tardis!
henry@utzoo.UUCP (Henry Spencer) (09/15/84)
Joe, I agree with most of what you say -- clock rates are not a realistic measure until you look at memory access times, and i/o bandwidth is quite important when evaluating multi-user machines -- but I've got a couple of comments. > MORAL: If you want to compare the 68020 (or any microprocessor) to > a VAX, wait for the figures on the forthcoming VAX chip sets ... The question is, will we ever see those VAX chip sets? As chip sets, not as overpriced packaged cpus. Existence is no guarantee of commercial availability, and conference papers are a poor substitute for chips in hand. > ... At the system level, the > microVAX line of small systems is designed and packaged more for > one-on-one personal use. ... So is the VAX 730, in my opinion. We've had to forcibly dissuade some local groups who've been mesmerized by Dec marketing into thinking that a 730 could run a dozen users doing statistics, just because of that magic word "VAX". My lousy little 11/44 will run rings around a 730 on anything which doesn't have serious address-space problems. And the 44 is a quarter the size and half the price. When address space is not an issue, the 44 will give a 750 a good run for its money, and the size and price comparisons *there* don't bear mentioning. For that matter, I note that the Murray Hill research folks now (I'm told) budget for a 750 for every 4 users. When is Dec going to announce some reasonably-priced big-address-space processors? I'd sure like one. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
doug@oakhill.UUCP (Doug MacGregor) (11/01/84)
I understand the motivation behind the note authored by Joe Falcone concerning the comparison of the 68020 and a VAX. In principle I agree with his comments. I understand that my figures representing the performance of the 68020 can be taken out of context by those with a marketing or sales bent to unfairly compare a processor with a system. In all of the work that I have done, I have been very deliberate in avoiding comparisons with any system. Because comparisons of this sort are so simplistic and erroneous, I never felt that they were an issue to anyone who knew how computers worked. However, I strongly disagree with some of the interpretations that Mr. Falcone has drawn as well as some of the conclusions made. First, the interpretations of the 68020 performance are a bit narrow. The note implies that there are no applications except systems equivalent to a VAX. These non-system applications which include real-time robotics, graphics, and communication applications are a substantial portion of the 68020 market. These applications are concerned with the performance of the processor alone, not a system. Second, it is implied that general purpose computer systems are not capable of running without wait states. This is a dangerous underestimation of the capability of the various system implementors using the 68020. The notion of an off-chip cache is mentioned but then discarded. I do not believe that it is possible to dispose of the various system solutions available to the system designers, many of which are already being used on 68000 systems. Third, reviewing the performance figures cited in the note there are several very significant discrepancies. Assuming that we use the figure of 3-5x performance of a VAX-11/780 to an 8MHz 68000 then I don't understand how the table below was generated. > "VAX MIPS" > > CPU 100ns 200ns 300ns > ---------------------------------------------------------- > 8MHz 68000 0.14-0.23 0.14-0.23 0.14-0.23 > 16MHz 68020* 0.42-0.70 0.30-0.50 0.24-0.40 > 16MHz 68020** 0.56-0.90 0.46-0.76 0.40-0.66 > ---------------------------------------------------------- > VAX-11/780 0.7-1.0 > VAX-11/785 1.0-1.5 > ---------------------------------------------------------- > * I-cache disabled > ** 100% I-cache hit ratio If we start with the presumption that 3-5x is valid then I assume we use 4x for the 11/780 to 68000 comparison. This would give us the following performance figures: CPU 100ns 200ns 300ns ---------------------------------------------------------- 8MHz 68000 0.18-0.25 0.18-0.25 0.18-0.25 ---------------------------------------------------------- VAX-11/780 0.7-1.0 VAX-11/785 1.0-1.5 ---------------------------------------------------------- Using the 68000-68020 relationship shown in the performance tables in IEEE MICRO in terms of 68000 performance. CPU 100ns 200ns 300ns ---------------------------------------------------------- 8MHz 68000 0.6 (1x) 0.6 (1x) 0.6 (1x) 16MHz 68020* 2.1 (3.5x) 1.5 (2.5x) 1.3 (2.2x) 16MHz 68020** 2.7 (4.5x) 2.3 (3.7x) 2.0 (3.3x) ---------------------------------------------------------- When the two tables are combined, it shows a respectable cost/performance ratio. If a microprocessor based product performs at or above the level of a VAX, that seems significant. CPU 100ns 200ns 300ns ---------------------------------------------------------- 8MHz 68000 0.18-0.25 0.18-0.25 0.18-0.25 16MHz 68020* 0.61-0.88 0.44-0.63 0.39-0.55 16MHz 68020** 0.79-1.13 0.65-0.93 0.58-0.83 ---------------------------------------------------------- ------------------------- VAX-11/780 0.7-1.0 VAX-11/785 1.0-1.5 ------------------------- Fourth, the 68020 was designed to operate at 16MHz worst case, just as there are currently 10 and 12.5MHz 68000 and 68010's available now, it is not unrealistic to anticipate higher frequency 68020's in the years to come. In conclusion, we designed a chip which is a processor. We did not design a system. When we describe the performance of our chip it is only appropriate to describe it in terms of a processor. There are too many variables that can make a comprehensible evaluation impossible (i.e. compiler technology, system configuration, system access times, etc.) that don't have anything to do directly with the performance "capability" of the 68020. For this reason, I feel that this method of performance evaluation is not only appropriate, but essential. P.S. I would be interested in seeing a performance evaluation of the VLSI-based VAX and microVAX processors as well as the systems, are there any public descriptions of these? Doug MacGregor