dfh@scirtp.UUCP (David F. Hinnant) (09/06/85)
Most of you probably remember the discussion a couple of months ago in
net.micro.68k and net.arch concerning Intel's advertising campaign
comparing the 80286 to the 68010 and 68020. I caught the tail end of
this discussion, and did not see the ads or the 'report' Intel used as
the basis for the ads until a month or so after the ads first came out -
at which point the discussion was dying out on USENET. I know this is
rehashing a dead issue, but I think it's important.
Enclosed below is a copy of a letter I have sent to BYTE (the publisher
of the majority of the benchmarks Intel used), some of the magazines that
published the Intel ad, and PC Week (which had a recent article on the
"speedy-breed" 80286).
I sent Intel an earlier version of this letter that outlined all the
issues outlined below. Intel did contact me several times concerning my
complaints, but they have not addressed them to my satiscation. Thus,
my posting here, and the letters I have sent to the editors of selected
magazines.
Comments welcome.
David Hinnant
SCI Systems
======================================================================
19-Aug-85
Dear Editor:
There has been a lot of discussion lately (particularly
on the UNIX 'Usenet' news network) concerning Intel's
recent advertising campaign comparing the Intel 80286 to
the Motorola 68010 and 68020. Intel has published a
document entitled "iAPX 286 High Performance Benchmark
Report" (hereafter referred to as 'the report') to
support their claim that the 80286 offers superior
performance over the Motorola 68010 and 68020 chips.
Both their advertising and the report use the August 1984
BYTE benchmarks which appeared in the article I wrote,
"Benchmarking UNIX Systems" as the basis for comparing
the Intel and Motorola chips.
After studying the Intel report, I believe there are
several problems with Intel's approach to benchmarking
that should be addressed. While the problems presented
below may not prove to invalidate Intel's claim, they do
raise doubts as to the objectivity and impartiality of
Intel's benchmarking strategy. As author of the majority
of the benchmarks Intel has used to make their claim, I
feel compelled to bring these problems to the public's
attention.
On July 22nd I hand delivered to the local Intel office a
list of problems with their benchmarking strategy and why
I believe they cannot legitimately make the conclusion
they did. As of today, I have not received a
satisfactory response to most of these issues, as they
are outlined below.
1) The listing for the pipes.c benchmark as published
in their report is incorrect. If this listing is
identical to the source code used to evaluate the 80286
based systems mentioned in their report, then the program
will terminate prematurely resulting in invalid timings.
This listing is as it was presented in the August 1984
BYTE. However an error was made on my part when
furnishing the listing to BYTE, and a line was
inadvertently deleted. I notified BYTE of the omission,
and BYTE published a correction in the January 1985 issue
(page 14). Intel should have used the corrected
benchmark. Intel has responded favorably to this error,
and
has re-benchmarked their systems. I have been told that
they will publish a correction.
2) Intel admits that the benchmark data used for the
Masscomp and SUN Microsystems machines is the data as was
presented in the August 1984 BYTE issue. The BYTE
article was originally slated to appear in the February
1984 issue. Due to production delays it did not appear
until August. Although I have no precise record, the
benchmark data I furnished BYTE is probably as old as, if
not older than, December 1983. This means that Intel is
comparing benchmark results from 68010 machines over a
year old to current 80286 benchmarks! Intel apparently
did not make an effort to benchmark current 68010
machines other than the AT&T 7300. More recent, but
still dated benchmark data I have shows that the SUN is
much faster than reported in at least two benchmarks.
Intel should have noted the benchmark dates of the SUN
and Masscomp machines clearly as being old and
benchmarked current production machines, as they did with
the Intel based microcomputers.
3) The 80286 based microcomputers benchmarked all ran
Xenix 3.0. The Motorola based microcomputers ran
different operating systems: System III, System V, and
Berkeley 4.1 BSD. The BYTE UNIX benchmarks, as stated in
the August article (page 133), are UNIX operating system
benchmarks. They are not microprocessor benchmarks and
should not have been used as such. The consistently
superior results obtained on the microcomputers running
Xenix as compared to the microcomputers running other
versions of UNIX indicate that performance differences
may be due more to differences in operating system
software rather than microprocessor design. For example,
Xenix 3.0 uses an internal buffer size of 512 bytes. 4.2
BSD uses a 1024 byte buffer size. The pipes.c benchmark
as published in BYTE does not take differing buffer sizes
into account, and assumes a 512 byte buffer size. Read
and write operations thus appear to be less efficient on
the SUN as compared to other machines. In short, by not
taking system differences into account, Intel did not
employ the scientific method. Thus there are too many
unknowns for a conclusion to be reached. Intel should
have benchmarked a Motorola based microcomputer running
Xenix or an Intel based microcomputer running something
other than Xenix if they wanted to reach conclusions
about CPU performance under similar circumstances and
operating systems.
On a related issue, Intel's version of the other
benchmarks
used in the report are flawed; some critically. Their
'C' translation of the Whetstone benchmark as published
has several errors:
1) It is performing one loop more than necessary in
module three. This is actually a detriment to Intel's
results.
2) The Whetstone uses a single dimension array of
four elements. These elements are correctly referenced
using the subscripts 0, 1, 2 and 3. Intel's benchmark
uses the subscripts 1, 2, 3, and 4.
Intel's version of the Fibonacci recursion benchmark has
a more substantial flaw. Because of an extra semicolon,
the benchmark makes one iteration instead of the ten
iterations as is implied in the listing.
In all likelihood, the errors in the Whetstone benchmark
did not significantly affect the results on the machines
benchmarked in the report. However, because of these
flaws the results from this industry standard benchmark
can not be compared to data from other versions of the
Whetstone.
The same may be true for the errors in the popular
Fibonacci benchmark. Both these instances raise doubts
as to Intel's knowledge of the C language, which it has
specifically selected for comparing microprocessors.
Intel has adhered to two of the unwritten rules of
benchmarking. They used benchmarks developed outside
Intel, and they contracted an outside company to run the
benchmarks on their machines. What they did not do is
have the results interpreted by an objective, independent
party.
Intel did contact me prior to publication of the report,
but only for permission to reprint the listings (which
they trimmed the comments out of), and not in an advisory
capacity. I gave them reprint permission. I expected
that the benchmarks would be used carefully and according
to the guidelines of my article. Clearly Intel could have
avoided the problems mentioned above if they had an
outside independent party evaluate their benchmarking
methodology and their interpretation of results. At
first, I was upset that Intel did not reference me as
author of the BYTE benchmarks. Upon reflection, I am
glad they did not.
David Hinnant
SCI Systems, Inc.
======================================================================
--
David Hinnant
SCI Systems, Inc.
{decvax, akgua}!mcnc!rti-sel!scirtp!dfhdavet@oakhill.UUCP (Dave Trissel) (09/13/85)
In article <405@scirtp.UUCP> dfh@scirtp.UUCP (David F. Hinnant) writes: > > used in the report are flawed; some critically. Their > 'C' translation of the Whetstone benchmark as published > has several errors: > Actually, there is a bias thrown in which is far larger than any errors mentioned here. The Whetstone is suppose to have an outer loop running from 1 to 10 to cause the generation of 1 million whetstones. However, if you examine Intel's code the outer loop only runs through two times. Since they give the time for the result and not the value in Whetstones this makes it easy to miss the 5 times off factor as normally a run time of one second means a value of 1,000 KWhets. Intel's time would relate to 625 KWhets which I knew was impossible. But it wasn't until several weeks later that I finally spotted the loop count change and realized that the value should really have been around 125. On the same subject, we have just completed an extensive analysis of the Intel benchmark report which goes into detail on the many irregularities found. The conclusions reached when up-to-date systems and proper procedures are used are quite a contrast to those reached by Intel. For those of you following the MIPS debate there is a section of interest. Intel tries to show that by looking only at instruction clock times the 286 is just as fast as a '020. About as believable as their claim based on their UNIX benchmark set that (and I quote) "The 6 MHz 286/310 outperforms all of the machines based on a 68010 as well as the VAX machines " (Pg 9.) Note this claim includes the VAX 780. Their conclusion puts the IBM PC/AT at 98 percent the performance of the 780. They further claim that a 12 MHz 286 is 2.4 times faster than a 780. Everyone expects marketing hype from vendors (Motorola included, of course) but this is just down-right silly. Our new benchmark report should be in the local Motorola sales offices in a week or so. Try to get the Intel benchmark booklet from Intel so you can see these things for yourself. -- Dave Trissel