[net.micro.68k] 80286 v.s. 68010 -- the debate continues?

dfh@scirtp.UUCP (David F. Hinnant) (09/06/85)

  Most of you probably remember the discussion a couple of months ago in
net.micro.68k and net.arch concerning Intel's advertising campaign
comparing the 80286 to the 68010 and 68020.  I caught the tail end of
this discussion, and did not see the ads or the 'report' Intel used as
the basis for the ads until a month or so after the ads first came out -
at which point the discussion was dying out on USENET.  I know this is
rehashing a dead issue, but I think it's important. 

  Enclosed below is a copy of a letter I have sent to BYTE (the publisher
of the majority of the benchmarks Intel used), some of the magazines that
published the Intel ad, and PC Week (which had a recent article on the
"speedy-breed" 80286).

  I sent Intel an earlier version of this letter that outlined all the
issues outlined below.  Intel did contact me several times concerning my
complaints, but they have not addressed them to my satiscation.  Thus,
my posting here, and the letters I have sent to the editors of selected
magazines.

  Comments welcome.
		
						David Hinnant
						SCI Systems

======================================================================

        19-Aug-85


        Dear Editor:

        There has been a lot of discussion  lately  (particularly
        on  the  UNIX  'Usenet'  news network) concerning Intel's
        recent advertising campaign comparing the Intel 80286  to
        the  Motorola  68010  and  68020.   Intel has published a
        document entitled "iAPX 286  High  Performance  Benchmark
        Report"  (hereafter  referred  to  as  'the  report')  to
        support  their  claim  that  the  80286  offers  superior
        performance  over  the  Motorola  68010  and 68020 chips.
        Both their advertising and the report use the August 1984
        BYTE  benchmarks  which  appeared in the article I wrote,
        "Benchmarking UNIX Systems" as the  basis  for  comparing
        the Intel and Motorola chips.

        After studying the Intel  report,  I  believe  there  are
        several  problems  with  Intel's approach to benchmarking
        that should be addressed.  While the  problems  presented
        below  may not prove to invalidate Intel's claim, they do
        raise doubts as to the objectivity  and  impartiality  of
        Intel's  benchmarking strategy. As author of the majority
        of the benchmarks Intel has used to make their  claim,  I
        feel  compelled  to  bring these problems to the public's
        attention.

        On July 22nd I hand delivered to the local Intel office a
        list of problems with their benchmarking strategy and why
        I believe they cannot legitimately  make  the  conclusion
        they   did.    As   of  today,  I  have  not  received  a
        satisfactory response to most of these  issues,  as  they
        are outlined below.

          1)  The listing for the pipes.c benchmark as  published
        in  their  report  is  incorrect.   If  this  listing  is
        identical to the source code used to evaluate  the  80286
        based systems mentioned in their report, then the program
        will terminate prematurely resulting in invalid  timings.
        This  listing  is  as it was presented in the August 1984
        BYTE.  However  an  error  was  made  on  my  part   when
        furnishing   the   listing   to  BYTE,  and  a  line  was
        inadvertently deleted.  I notified BYTE of the  omission,
        and BYTE published a correction in the January 1985 issue
        (page  14).   Intel  should  have  used   the   corrected
        benchmark.   Intel has responded favorably to this error,
        and

        has re-benchmarked their systems.  I have been told  that
        they will publish a correction.

          2)  Intel admits that the benchmark data used  for  the
        Masscomp and SUN Microsystems machines is the data as was
        presented in  the  August  1984  BYTE  issue.   The  BYTE
        article  was  originally slated to appear in the February
        1984 issue.  Due to production delays it did  not  appear
        until  August.   Although  I  have no precise record, the
        benchmark data I furnished BYTE is probably as old as, if
        not  older  than, December 1983. This means that Intel is
        comparing benchmark results from 68010  machines  over  a
        year  old  to current 80286 benchmarks!  Intel apparently
        did  not  make  an  effort  to  benchmark  current  68010
        machines  other  than  the  AT&T  7300.  More recent, but
        still dated benchmark data I have shows that the  SUN  is
        much  faster  than  reported  in at least two benchmarks.
        Intel should have noted the benchmark dates  of  the  SUN
        and   Masscomp   machines   clearly   as  being  old  and
        benchmarked current production machines, as they did with
        the Intel based microcomputers.

          3)  The 80286 based microcomputers benchmarked all  ran
        Xenix   3.0.   The   Motorola  based  microcomputers  ran
        different operating systems: System III,  System  V,  and
        Berkeley  4.1 BSD. The BYTE UNIX benchmarks, as stated in
        the August article (page 133), are UNIX operating  system
        benchmarks.   They  are not microprocessor benchmarks and
        should not have  been  used  as  such.  The  consistently
        superior  results  obtained on the microcomputers running
        Xenix as compared to  the  microcomputers  running  other
        versions  of  UNIX  indicate that performance differences
        may be  due  more  to  differences  in  operating  system
        software rather than microprocessor design.  For example,
        Xenix 3.0 uses an internal buffer size of 512 bytes.  4.2
        BSD  uses a 1024 byte buffer size.  The pipes.c benchmark
        as published in BYTE does not take differing buffer sizes
        into  account,  and assumes a 512 byte buffer size.  Read
        and write operations thus appear to be less efficient  on
        the  SUN  as compared to other machines. In short, by not
        taking system differences into  account,  Intel  did  not
        employ  the  scientific  method.  Thus there are too many
        unknowns for a conclusion to be  reached.   Intel  should
        have  benchmarked  a Motorola based microcomputer running
        Xenix or an Intel based microcomputer  running  something
        other  than  Xenix  if  they  wanted to reach conclusions
        about CPU performance  under  similar  circumstances  and
        operating systems.

        On  a  related  issue,  Intel's  version  of  the   other
        benchmarks

        used in the report are flawed;  some  critically.   Their
        'C'  translation  of the Whetstone benchmark as published
        has several errors:

            1)  It is performing one loop more than necessary  in
          module  three.  This is actually a detriment to Intel's
          results.

            2)  The Whetstone uses a single  dimension  array  of
          four elements.  These elements are correctly referenced
          using the subscripts 0, 1, 2 and 3.  Intel's  benchmark
          uses the subscripts 1, 2, 3, and 4.

        Intel's version of the Fibonacci recursion benchmark  has
        a  more substantial flaw.  Because of an extra semicolon,
        the benchmark makes one  iteration  instead  of  the  ten
        iterations as is implied in the listing.

        In all likelihood, the errors in the Whetstone  benchmark
        did  not significantly affect the results on the machines
        benchmarked in the report.   However,  because  of  these
        flaws  the  results from this industry standard benchmark
        can not be compared to data from other  versions  of  the
        Whetstone.

        The same may be  true  for  the  errors  in  the  popular
        Fibonacci  benchmark.   Both these instances raise doubts
        as to Intel's knowledge of the C language, which  it  has
        specifically selected for comparing microprocessors.

        Intel has adhered  to  two  of  the  unwritten  rules  of
        benchmarking.  They  used  benchmarks  developed  outside
        Intel, and they contracted an outside company to run  the
        benchmarks  on  their  machines.  What they did not do is
        have the results interpreted by an objective, independent
        party.

        Intel did contact me prior to publication of the  report,
        but  only  for  permission to reprint the listings (which
        they trimmed the comments out of), and not in an advisory
        capacity.   I  gave  them reprint permission.  I expected
        that the benchmarks would be used carefully and according
        to the guidelines of my article. Clearly Intel could have
        avoided the problems  mentioned  above  if  they  had  an
        outside  independent  party  evaluate  their benchmarking
        methodology and  their  interpretation  of  results.   At
        first,  I  was  upset  that Intel did not reference me as
        author of the BYTE benchmarks.   Upon  reflection,  I  am
        glad they did not.


        David Hinnant
        SCI Systems, Inc.

======================================================================

-- 
				David Hinnant
				SCI Systems, Inc.
				{decvax, akgua}!mcnc!rti-sel!scirtp!dfh

davet@oakhill.UUCP (Dave Trissel) (09/13/85)

In article <405@scirtp.UUCP> dfh@scirtp.UUCP (David F. Hinnant) writes:

>
>        used in the report are flawed;  some  critically.   Their
>        'C'  translation  of the Whetstone benchmark as published
>        has several errors:
>

Actually, there is a bias thrown in which is far larger than any errors
mentioned here.  The Whetstone is suppose to have an outer loop running
from 1 to 10 to cause the generation of 1 million whetstones.  However,
if you examine Intel's code the outer loop only runs through two times.
Since they give the time for the result and not the value in Whetstones
this makes it easy to miss the 5 times off factor as normally a run time of
one second means a value of 1,000 KWhets.

Intel's time would relate to 625 KWhets which I knew was impossible.  But
it wasn't until several weeks later that I finally spotted the loop count
change and realized that the value should really have been around 125.

On the same subject, we have just completed an extensive analysis of the
Intel benchmark report which goes into detail on the many irregularities
found.  The conclusions reached when up-to-date systems and proper procedures
are used are quite a contrast to those reached by Intel.

For those of you following the MIPS debate there is a section of interest.
Intel tries to show that by looking only at instruction clock times the
286 is just as fast as a '020.  About as believable as their claim based
on their UNIX benchmark set that (and I quote) "The 6 MHz 286/310 outperforms
all of the machines based on a 68010 as well as the VAX machines " (Pg 9.)
Note this claim includes the VAX 780.

Their conclusion puts the IBM PC/AT at 98 percent the performance of the 780.
They further claim that a 12 MHz 286 is 2.4 times faster than a 780.
Everyone expects marketing hype from vendors (Motorola included, of course)
but this is just down-right silly.

Our new benchmark report should be in the local Motorola sales offices in
a week or so.  Try to get the Intel benchmark booklet from Intel so you can
see these things for yourself.

  --  Dave Trissel