[comp.arch] Reply to Steve McGeady

tim@nucleus.amd.com (Tim Olson) (12/02/89)

Recently, Steve McGeady of Intel posted an inflammatory reply to my
article discussing the benchmarking dispute between AMD and Intel.  I
have tried to reach him by email to get this cleared up, but I have
not heard back from him (I don't know whether he has not seen it or
has chosen not to reply).

I don't like to use this forum for this, (as things are rapidly
starting to approach zero technical content), but Steve made some
accusations here that I feel warrent a public response.  Hit 'n' now
if you don't want to follow this further...

First, some history:

At their i960CA introduction, and later at the Microprocessor Forum
sponsered by Microprocessor Review, Intel showed the results of a
benchmark exercise where they compared the performance of boards
based upon a 68030, an Am29000, an i960KA, and an i960CA.  Their
results showed the Am29000 running only as fast as the 68030 board,
and slower than either the KA and CA processors.

These results were so skewed that the trade press contacted AMD
requesting our comments and numbers we felt reflected actual Am29000
performance.  Even though most of the benchmarks that Intel used were
based on standard, readily-available sources (although the results
were reported slightly differently than normal), I requested the
benchmark sources directly from Intel to ensure that the numbers I
obtained were as accurate as possible.

In article <5277@omepd.UUCP>, mcg@mipon2.intel.com (Steven McGeady)writes:
| In article <28107@amdcad.AMD.COM>, tim@electron.amd.com (Tim Olson) writes:
| > Intel compared its i960CA board running this benchmark suite with a
| > 68030 (20MHz), an i960KA(20MHz), and an Am29000(16MHz) board.
| > However, the board they used to benchmark the Am29000 was not designed
| > for performance; rather, it was designed to test the functionality of
| > ADAPT (Advanced Development and Prototyping Tool) hardware debuggers.
| 
| This is an interesting piece of history re-invention.  Step Engineering,
| the current manufacturer of the STEB board,  received the design of the
| board from AMD (the board has an AMD copyright on it).

The assertion that this is "history-reinvention" is totally false.
AMD designed both the ADAPT and the STEB, which was *definitely*
designed to allow customers to test the functionality of their ADAPTs.
Both of these designs were subsequently licenced to Step Engineering.

| Apparently, the
| board was designed this way because it is impossible to build a 29K
| system using normal DRAMs and achieve better performance.

Another false assertion.  YARC Systems builds an Am29000-based
board that has only "normal DRAM" memory.  The Am29000 was designed
from the start to allow high performance without requiring fast SRAM.

| We attempted
| to put faster RAMs inthe STEB board, and to increase the clock speed to
| 20MHz, and neither worked.

I can't comment this, other than to say that we have run STEBs at
20MHz. However, here Steve admits that they didn't actually upgrade
the board.  In their benchmark report, Intel states that the STEB "..
was improved by replacing the 120ns memories the board was shipped
with by 35ns memories...", implying that they were getting better
performance with it than with the standard slow 120ns memory.  This
was shown to be false when in the next sentence they state: "Board
runs 2 wait states at 16MHz from the SRAM."  2 wait-states are
standard for the *120ns memory* supplied with the STEB.

| We chose the STEB board not because it was
| slow (even we didn't expect it to be so slow) but because it is the only
| available board with a prototyping area on which we could add an SBX
| connector to interface the graphics cards on which we displayed the
| benchmark results.

Was the overriding concern here for a catchy press demo, or for
realistic performance results?  In any case, the YARC card *does* have
an expansion connector on it (we use it to drive a laser-printer
engine).

| > To provide a more fair comparison, I requested the benchmark sources
| > from Intel, to run on a 30MHz Am29000 board (manufactured by YARC
| > Systems).  This board uses 2-way interleaved, 100ns DRAM memory for
| > instructions and 35ns SRAM for data.
| 
| This board contains separate Instruction and Data memory (using the
| 29k's Hardvard bus), each of which is interleaved (according to published
| data I've been able to find on the board).

Yes, the memory design is a direct result of the Am29000's external Harvard
Architecture.  *Embedded Controllers* typically run one fixed program,
either from ROM or downloaded into writeable memory at initialization
time.  This is the way most of our customers have designed their
systems.

| The 30MHz 29k's are apparently
| hand-sorted - we know of no volume shipments of these parts.

Well, we know of no volume shipments of 33MHz i960CA's, either ;-)
We *have* shipped 30MHz parts, and will be announcing 33MHz.

| This board is in no way comparable in cost, parts-count, interface
| complexity, or usability to the 960CA board that was used.

I stand by my claim that the YARC SRAM card and the 960CA evaluation
board are comparable.  35ns SRAM is *much* less expensive than the
15ns SRAM used on the i960CA board, and fully half the cost of these
boards is simply memory.

| We supplied Mr. Olson with the sources to these benchmarks, as an effort
| to bring an end to the warring that has been going on over benchmarking.

Don't forget that Intel started all of this by presenting performance
numbers for the Am29000 that were quite out of line with anything else
anyone has claimed. We have no problem with Intel's quoted numbers
for the KA and CA -- we assume that they can be verified.

| In exchange for freely supplying these, Mr. Olson agreed that we would
| be given the resulting source code back, along with a copy of the compiler
| that produced it, prior to publication of the results.  Mr. Olson has
| chosen to ignore those commitments and publish numbers without noting
| what compiler was used, and without providing us (or anyone else - we also
| supplied the benchmarks to Michael Sleator of Microprocessor Report)
| with the ability to check their validity.

Another statement that is false.  I never spoke to Steve McGeady about
these benchmarks; I spoke to Lew Paceley and Tony Baker (of the i960
marketing group).  My agreement with Tony was to, in exchange for the
benchmark sources, give Intel the results of the benchmarks.  Intel
now has them.  *Nothing* was ever said about returning source code,
compilers, or prior publication.  Someone at Intel made that up.

However, we certainly have nothing to hide, and have given Intel the
sources, the .s files, the Makefile, etc. that we used to generate the
results.  Michael Slater already has our results and the documentation
for them; he is welcome to the rest of the files if he so requests.

As far as the compiler version, I mentioned in my posting that it was
compiled with the current release version of the MetaWare HighC
compiler (2.0).  I didn't want to bore the net with legalistic
details, but all of the information is documented in a paper that I
would be happy to give to anyone requesting it.

| It should be noted that the 960CA benchmarks were compiled with the
| current GNU GCC compiler, which does *no* instruction scheduling, and thus
| fails to take advantage of the multiple-instruction issue capability of
| the 960CA.  We have been working on an instruction-scheduling compiler,
| but it is not available for release at this time.

Neither is the GNU compiler Intel used in the benchmarks.  Even though the
GNU compiler & tools are listed in the "Solutions960" catalog as
available "now", several phone calls to Intel sales representatives
confirmed that it was still unavailable.

| The lesson that this has served to teach me, who argued with our marketing
| department that we should release these benchmarks to AMD under the noted
| restrictions, is that we were foolish to trust AMD's word regarding feedback
| of the results from the benchmarks.  Thus, I place no trust in these
| numbers presented as representing any kind of objective reality.
| Furthermore, I have learned my lesson with regard to cooperating.

I really don't understand your position, here.  All we did was ask for
the source code of the standard benchmarks so that we could run them
for the people in the trade press that were asking about them.  Would
you feel the same way if we had simply run them from the standard
sources that are readily available (Stanford Integer suite, Dhrystone
1.1)?  We could have done that, but I wanted to ensure that we were as
"apples-to-apples" as possible.

| The benchmark wars will now most certiainly be taken out of the hand of
| technologists and be placed back in the hands of marketing departments.

I won't even attempt to comment on this remark.

| I will reiterate here my advice to customers attempting to determine the
| relative speed of the two processors:  run your own benchmarks on a board
| with a memory system relevant to the design you plan to build.

I agree 100%.

| The Yarc
| board's memory design is an example of the most-expensive memory system
| design that one can attach to the 29k - it bears no resemblance to what
| can be expected with a combined I&D DRAM memory system, which is where
| the only true comparison lies.

Huh?  The YARC board's memory design is similar to many of our
embedded control designs.  And why is a combined I&D DRAM memory
system the only "true comparison"?  The i960CA may be limited to that,
but that simply means there are more options to memory design for the
Am29000.

| In short, don't believe AMD's benchmark
| numbers, and don't believe ours.

I believe that all benchmark results should be taken with a grain of
salt, but I also believe that if a vendor were to come out with wildly
incorrect numbers, or numbers that could not be verified, that they
would be discovered quickly.

| Don't believe simulators, because AMD's
| is well known at overstating performance.

The Am29000 simulator is the exact same one we use in-house to do
performance analysis of potential processor modifications.  It is an
RTL-level simulation of the processor that has been checked out
cycle-for-cycle with the real chip in a number of different memory
models.  The only think I can think of that it doesn't really simulate
is DRAM refresh time, and that is because it is so statistical.
If it is overstating performance I want to know why.  Do you have any
references?


Benchmarking embedded processors is harder than benchmarking UNIX
systems -- there are many more variables to contend with. After going
through this highly frustrating exercise, I hope something good can
come of it.  A larger collection of "standard" benchmarks would be a
good starting point.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)