dfh@scirtp.UUCP (David F. Hinnant) (09/06/85)
Most of you probably remember the discussion a couple of months ago in net.micro.68k and net.arch concerning Intel's advertising campaign comparing the 80286 to the 68010 and 68020. I caught the tail end of this discussion, and did not see the ads or the 'report' Intel used as the basis for the ads until a month or so after the ads first came out - at which point the discussion was dying out on USENET. I know this is rehashing a dead issue, but I think it's important. Enclosed below is a copy of a letter I have sent to BYTE (the publisher of the majority of the benchmarks Intel used), some of the magazines that published the Intel ad, and PC Week (which had a recent article on the "speedy-breed" 80286). I sent Intel an earlier version of this letter that outlined all the issues outlined below. Intel did contact me several times concerning my complaints, but they have not addressed them to my satiscation. Thus, my posting here, and the letters I have sent to the editors of selected magazines. Comments welcome. David Hinnant SCI Systems ====================================================================== 19-Aug-85 Dear Editor: There has been a lot of discussion lately (particularly on the UNIX 'Usenet' news network) concerning Intel's recent advertising campaign comparing the Intel 80286 to the Motorola 68010 and 68020. Intel has published a document entitled "iAPX 286 High Performance Benchmark Report" (hereafter referred to as 'the report') to support their claim that the 80286 offers superior performance over the Motorola 68010 and 68020 chips. Both their advertising and the report use the August 1984 BYTE benchmarks which appeared in the article I wrote, "Benchmarking UNIX Systems" as the basis for comparing the Intel and Motorola chips. After studying the Intel report, I believe there are several problems with Intel's approach to benchmarking that should be addressed. While the problems presented below may not prove to invalidate Intel's claim, they do raise doubts as to the objectivity and impartiality of Intel's benchmarking strategy. As author of the majority of the benchmarks Intel has used to make their claim, I feel compelled to bring these problems to the public's attention. On July 22nd I hand delivered to the local Intel office a list of problems with their benchmarking strategy and why I believe they cannot legitimately make the conclusion they did. As of today, I have not received a satisfactory response to most of these issues, as they are outlined below. 1) The listing for the pipes.c benchmark as published in their report is incorrect. If this listing is identical to the source code used to evaluate the 80286 based systems mentioned in their report, then the program will terminate prematurely resulting in invalid timings. This listing is as it was presented in the August 1984 BYTE. However an error was made on my part when furnishing the listing to BYTE, and a line was inadvertently deleted. I notified BYTE of the omission, and BYTE published a correction in the January 1985 issue (page 14). Intel should have used the corrected benchmark. Intel has responded favorably to this error, and has re-benchmarked their systems. I have been told that they will publish a correction. 2) Intel admits that the benchmark data used for the Masscomp and SUN Microsystems machines is the data as was presented in the August 1984 BYTE issue. The BYTE article was originally slated to appear in the February 1984 issue. Due to production delays it did not appear until August. Although I have no precise record, the benchmark data I furnished BYTE is probably as old as, if not older than, December 1983. This means that Intel is comparing benchmark results from 68010 machines over a year old to current 80286 benchmarks! Intel apparently did not make an effort to benchmark current 68010 machines other than the AT&T 7300. More recent, but still dated benchmark data I have shows that the SUN is much faster than reported in at least two benchmarks. Intel should have noted the benchmark dates of the SUN and Masscomp machines clearly as being old and benchmarked current production machines, as they did with the Intel based microcomputers. 3) The 80286 based microcomputers benchmarked all ran Xenix 3.0. The Motorola based microcomputers ran different operating systems: System III, System V, and Berkeley 4.1 BSD. The BYTE UNIX benchmarks, as stated in the August article (page 133), are UNIX operating system benchmarks. They are not microprocessor benchmarks and should not have been used as such. The consistently superior results obtained on the microcomputers running Xenix as compared to the microcomputers running other versions of UNIX indicate that performance differences may be due more to differences in operating system software rather than microprocessor design. For example, Xenix 3.0 uses an internal buffer size of 512 bytes. 4.2 BSD uses a 1024 byte buffer size. The pipes.c benchmark as published in BYTE does not take differing buffer sizes into account, and assumes a 512 byte buffer size. Read and write operations thus appear to be less efficient on the SUN as compared to other machines. In short, by not taking system differences into account, Intel did not employ the scientific method. Thus there are too many unknowns for a conclusion to be reached. Intel should have benchmarked a Motorola based microcomputer running Xenix or an Intel based microcomputer running something other than Xenix if they wanted to reach conclusions about CPU performance under similar circumstances and operating systems. On a related issue, Intel's version of the other benchmarks used in the report are flawed; some critically. Their 'C' translation of the Whetstone benchmark as published has several errors: 1) It is performing one loop more than necessary in module three. This is actually a detriment to Intel's results. 2) The Whetstone uses a single dimension array of four elements. These elements are correctly referenced using the subscripts 0, 1, 2 and 3. Intel's benchmark uses the subscripts 1, 2, 3, and 4. Intel's version of the Fibonacci recursion benchmark has a more substantial flaw. Because of an extra semicolon, the benchmark makes one iteration instead of the ten iterations as is implied in the listing. In all likelihood, the errors in the Whetstone benchmark did not significantly affect the results on the machines benchmarked in the report. However, because of these flaws the results from this industry standard benchmark can not be compared to data from other versions of the Whetstone. The same may be true for the errors in the popular Fibonacci benchmark. Both these instances raise doubts as to Intel's knowledge of the C language, which it has specifically selected for comparing microprocessors. Intel has adhered to two of the unwritten rules of benchmarking. They used benchmarks developed outside Intel, and they contracted an outside company to run the benchmarks on their machines. What they did not do is have the results interpreted by an objective, independent party. Intel did contact me prior to publication of the report, but only for permission to reprint the listings (which they trimmed the comments out of), and not in an advisory capacity. I gave them reprint permission. I expected that the benchmarks would be used carefully and according to the guidelines of my article. Clearly Intel could have avoided the problems mentioned above if they had an outside independent party evaluate their benchmarking methodology and their interpretation of results. At first, I was upset that Intel did not reference me as author of the BYTE benchmarks. Upon reflection, I am glad they did not. David Hinnant SCI Systems, Inc. ====================================================================== -- David Hinnant SCI Systems, Inc. {decvax, akgua}!mcnc!rti-sel!scirtp!dfh
davet@oakhill.UUCP (Dave Trissel) (09/13/85)
In article <405@scirtp.UUCP> dfh@scirtp.UUCP (David F. Hinnant) writes: > > used in the report are flawed; some critically. Their > 'C' translation of the Whetstone benchmark as published > has several errors: > Actually, there is a bias thrown in which is far larger than any errors mentioned here. The Whetstone is suppose to have an outer loop running from 1 to 10 to cause the generation of 1 million whetstones. However, if you examine Intel's code the outer loop only runs through two times. Since they give the time for the result and not the value in Whetstones this makes it easy to miss the 5 times off factor as normally a run time of one second means a value of 1,000 KWhets. Intel's time would relate to 625 KWhets which I knew was impossible. But it wasn't until several weeks later that I finally spotted the loop count change and realized that the value should really have been around 125. On the same subject, we have just completed an extensive analysis of the Intel benchmark report which goes into detail on the many irregularities found. The conclusions reached when up-to-date systems and proper procedures are used are quite a contrast to those reached by Intel. For those of you following the MIPS debate there is a section of interest. Intel tries to show that by looking only at instruction clock times the 286 is just as fast as a '020. About as believable as their claim based on their UNIX benchmark set that (and I quote) "The 6 MHz 286/310 outperforms all of the machines based on a 68010 as well as the VAX machines " (Pg 9.) Note this claim includes the VAX 780. Their conclusion puts the IBM PC/AT at 98 percent the performance of the 780. They further claim that a 12 MHz 286 is 2.4 times faster than a 780. Everyone expects marketing hype from vendors (Motorola included, of course) but this is just down-right silly. Our new benchmark report should be in the local Motorola sales offices in a week or so. Try to get the Intel benchmark booklet from Intel so you can see these things for yourself. -- Dave Trissel