tim@nucleus.amd.com (Tim Olson) (12/02/89)
Recently, Steve McGeady of Intel posted an inflammatory reply to my article discussing the benchmarking dispute between AMD and Intel. I have tried to reach him by email to get this cleared up, but I have not heard back from him (I don't know whether he has not seen it or has chosen not to reply). I don't like to use this forum for this, (as things are rapidly starting to approach zero technical content), but Steve made some accusations here that I feel warrent a public response. Hit 'n' now if you don't want to follow this further... First, some history: At their i960CA introduction, and later at the Microprocessor Forum sponsered by Microprocessor Review, Intel showed the results of a benchmark exercise where they compared the performance of boards based upon a 68030, an Am29000, an i960KA, and an i960CA. Their results showed the Am29000 running only as fast as the 68030 board, and slower than either the KA and CA processors. These results were so skewed that the trade press contacted AMD requesting our comments and numbers we felt reflected actual Am29000 performance. Even though most of the benchmarks that Intel used were based on standard, readily-available sources (although the results were reported slightly differently than normal), I requested the benchmark sources directly from Intel to ensure that the numbers I obtained were as accurate as possible. In article <5277@omepd.UUCP>, mcg@mipon2.intel.com (Steven McGeady)writes: | In article <28107@amdcad.AMD.COM>, tim@electron.amd.com (Tim Olson) writes: | > Intel compared its i960CA board running this benchmark suite with a | > 68030 (20MHz), an i960KA(20MHz), and an Am29000(16MHz) board. | > However, the board they used to benchmark the Am29000 was not designed | > for performance; rather, it was designed to test the functionality of | > ADAPT (Advanced Development and Prototyping Tool) hardware debuggers. | | This is an interesting piece of history re-invention. Step Engineering, | the current manufacturer of the STEB board, received the design of the | board from AMD (the board has an AMD copyright on it). The assertion that this is "history-reinvention" is totally false. AMD designed both the ADAPT and the STEB, which was *definitely* designed to allow customers to test the functionality of their ADAPTs. Both of these designs were subsequently licenced to Step Engineering. | Apparently, the | board was designed this way because it is impossible to build a 29K | system using normal DRAMs and achieve better performance. Another false assertion. YARC Systems builds an Am29000-based board that has only "normal DRAM" memory. The Am29000 was designed from the start to allow high performance without requiring fast SRAM. | We attempted | to put faster RAMs inthe STEB board, and to increase the clock speed to | 20MHz, and neither worked. I can't comment this, other than to say that we have run STEBs at 20MHz. However, here Steve admits that they didn't actually upgrade the board. In their benchmark report, Intel states that the STEB ".. was improved by replacing the 120ns memories the board was shipped with by 35ns memories...", implying that they were getting better performance with it than with the standard slow 120ns memory. This was shown to be false when in the next sentence they state: "Board runs 2 wait states at 16MHz from the SRAM." 2 wait-states are standard for the *120ns memory* supplied with the STEB. | We chose the STEB board not because it was | slow (even we didn't expect it to be so slow) but because it is the only | available board with a prototyping area on which we could add an SBX | connector to interface the graphics cards on which we displayed the | benchmark results. Was the overriding concern here for a catchy press demo, or for realistic performance results? In any case, the YARC card *does* have an expansion connector on it (we use it to drive a laser-printer engine). | > To provide a more fair comparison, I requested the benchmark sources | > from Intel, to run on a 30MHz Am29000 board (manufactured by YARC | > Systems). This board uses 2-way interleaved, 100ns DRAM memory for | > instructions and 35ns SRAM for data. | | This board contains separate Instruction and Data memory (using the | 29k's Hardvard bus), each of which is interleaved (according to published | data I've been able to find on the board). Yes, the memory design is a direct result of the Am29000's external Harvard Architecture. *Embedded Controllers* typically run one fixed program, either from ROM or downloaded into writeable memory at initialization time. This is the way most of our customers have designed their systems. | The 30MHz 29k's are apparently | hand-sorted - we know of no volume shipments of these parts. Well, we know of no volume shipments of 33MHz i960CA's, either ;-) We *have* shipped 30MHz parts, and will be announcing 33MHz. | This board is in no way comparable in cost, parts-count, interface | complexity, or usability to the 960CA board that was used. I stand by my claim that the YARC SRAM card and the 960CA evaluation board are comparable. 35ns SRAM is *much* less expensive than the 15ns SRAM used on the i960CA board, and fully half the cost of these boards is simply memory. | We supplied Mr. Olson with the sources to these benchmarks, as an effort | to bring an end to the warring that has been going on over benchmarking. Don't forget that Intel started all of this by presenting performance numbers for the Am29000 that were quite out of line with anything else anyone has claimed. We have no problem with Intel's quoted numbers for the KA and CA -- we assume that they can be verified. | In exchange for freely supplying these, Mr. Olson agreed that we would | be given the resulting source code back, along with a copy of the compiler | that produced it, prior to publication of the results. Mr. Olson has | chosen to ignore those commitments and publish numbers without noting | what compiler was used, and without providing us (or anyone else - we also | supplied the benchmarks to Michael Sleator of Microprocessor Report) | with the ability to check their validity. Another statement that is false. I never spoke to Steve McGeady about these benchmarks; I spoke to Lew Paceley and Tony Baker (of the i960 marketing group). My agreement with Tony was to, in exchange for the benchmark sources, give Intel the results of the benchmarks. Intel now has them. *Nothing* was ever said about returning source code, compilers, or prior publication. Someone at Intel made that up. However, we certainly have nothing to hide, and have given Intel the sources, the .s files, the Makefile, etc. that we used to generate the results. Michael Slater already has our results and the documentation for them; he is welcome to the rest of the files if he so requests. As far as the compiler version, I mentioned in my posting that it was compiled with the current release version of the MetaWare HighC compiler (2.0). I didn't want to bore the net with legalistic details, but all of the information is documented in a paper that I would be happy to give to anyone requesting it. | It should be noted that the 960CA benchmarks were compiled with the | current GNU GCC compiler, which does *no* instruction scheduling, and thus | fails to take advantage of the multiple-instruction issue capability of | the 960CA. We have been working on an instruction-scheduling compiler, | but it is not available for release at this time. Neither is the GNU compiler Intel used in the benchmarks. Even though the GNU compiler & tools are listed in the "Solutions960" catalog as available "now", several phone calls to Intel sales representatives confirmed that it was still unavailable. | The lesson that this has served to teach me, who argued with our marketing | department that we should release these benchmarks to AMD under the noted | restrictions, is that we were foolish to trust AMD's word regarding feedback | of the results from the benchmarks. Thus, I place no trust in these | numbers presented as representing any kind of objective reality. | Furthermore, I have learned my lesson with regard to cooperating. I really don't understand your position, here. All we did was ask for the source code of the standard benchmarks so that we could run them for the people in the trade press that were asking about them. Would you feel the same way if we had simply run them from the standard sources that are readily available (Stanford Integer suite, Dhrystone 1.1)? We could have done that, but I wanted to ensure that we were as "apples-to-apples" as possible. | The benchmark wars will now most certiainly be taken out of the hand of | technologists and be placed back in the hands of marketing departments. I won't even attempt to comment on this remark. | I will reiterate here my advice to customers attempting to determine the | relative speed of the two processors: run your own benchmarks on a board | with a memory system relevant to the design you plan to build. I agree 100%. | The Yarc | board's memory design is an example of the most-expensive memory system | design that one can attach to the 29k - it bears no resemblance to what | can be expected with a combined I&D DRAM memory system, which is where | the only true comparison lies. Huh? The YARC board's memory design is similar to many of our embedded control designs. And why is a combined I&D DRAM memory system the only "true comparison"? The i960CA may be limited to that, but that simply means there are more options to memory design for the Am29000. | In short, don't believe AMD's benchmark | numbers, and don't believe ours. I believe that all benchmark results should be taken with a grain of salt, but I also believe that if a vendor were to come out with wildly incorrect numbers, or numbers that could not be verified, that they would be discovered quickly. | Don't believe simulators, because AMD's | is well known at overstating performance. The Am29000 simulator is the exact same one we use in-house to do performance analysis of potential processor modifications. It is an RTL-level simulation of the processor that has been checked out cycle-for-cycle with the real chip in a number of different memory models. The only think I can think of that it doesn't really simulate is DRAM refresh time, and that is because it is so statistical. If it is overstating performance I want to know why. Do you have any references? Benchmarking embedded processors is harder than benchmarking UNIX systems -- there are many more variables to contend with. After going through this highly frustrating exercise, I hope something good can come of it. A larger collection of "standard" benchmarks would be a good starting point. -- Tim Olson Advanced Micro Devices (tim@amd.com)