[net.arch] Benchmarks in August IEEE Micro

paf@oblio.UUCP (Paul Fronberg) (09/20/86)

In the August issue of IEEE Micro there is a very interesting article
concerning benchmarking 32-bit microprocessors. The following table is
abstracted from page 57. Numbers are time in seconds. (N=no cache enabled;
C=cache enabled). This table reflects the results for dynamic memory.

	 MHz	  E	  F	 H	  I	  K
80286	(10)	 4.89	13.63	6.59	11.20	19.39
80386	(16)	 3.57	 5.16	3.63	 6.86	 6.20
68000	(8)	13.73	14.61	8.79	12.08	16.59
68020 N	(16)	 8.02	 5.55	3.84	 5.65	 4.78
68020 C (16)	 3.84	 2.47	2.14	 2.75	 3.02
32032	(10)	12.52	13.07	6.21	 8.57	13.07
32100 N (18)	16.81	 8.84	5.05	 8.57	 9.17
32100 C (18)	 6.75	 4.29	2.74	 3.63	 4.45

Benchmarks are EDN 16 benchmarks modified for 32 bits. The benchmarks were
coded in assembly code for each processor.

The following EDN programs were used.

Test E is a character-string search routine.

Test F is a bit test, set, and reset routine.

Test H is a linked-list insertion routine.

Test I is a quicksort routine.

Test K is a bit-matrix transposition routine.

wiemann@hplabsb.UUCP (Alan Wiemann) (09/21/86)

Benchmark comparisons give valid results only if the same program is presented
to each machine.  The compiler is considered part of the "machine" and its
performance contributes to the overall performance of the machine.  This study
did not present the same program to each machine.  Instead "[they] had the
same person modify or write all the tests so [they] could be sure that the same
algorithms would be used for all the processors" (page 56 of the IEEE article).
Thus the benchmark results reflect not only the individual processors' ability
to execute instructions but also the cleverness of this programmer in using
each microprocessor's instruction set and architecture.  The results reported
should not be considered true measures of the relative performance of these
microprocessors.
   Unfortunately benchmark comparisons often suffer from commercial hype and
unscientific methods.  Let the buyer beware!

                                   Alan Wiemann
                                   Aitchpeelabs
                                   Palowaltocalifornya
                                   hplabs!wiemann

crowl@rochester.ARPA (Lawrence Crowl) (09/21/86)

The table below is a reorganization of the following table.

				Relative Performance

processor	80286	80386	68000	68020	68020	32032	32100	32100
cache (MHz)	(10)	(16)	(8)	N (16)	C (16)	(10)	N (18)	C (18)

string search	1.37	1.00	3.85	2.25	1.08	3.51	4.71	1.89
bit manipulate	5.52	2.09	5.91	2.25	1.00	5.29	3.58	1.74
linked list	3.08	1.70	4.11	1.79	1.00	2.90	2.36	1.28
quicksort	4.07	2.49	4.39	2.05	1.00	3.12	3.12	1.32
matrix trans	6.42	2.05	5.49	1.58	1.00	4.33	3.04	1.47

average		4.09	1.87	4.75	1.98	1.02	3.83	3.36	1.54

In article <322@oblio.UUCP> paf@oblio.UUCP (Paul Fronberg) writes:
)In the August issue of IEEE Micro there is a very interesting article
)concerning benchmarking 32-bit microprocessors. The following table is
)abstracted from page 57. Numbers are time in seconds. (N=no cache enabled;
)C=cache enabled). This table reflects the results for dynamic memory.
)
)	 MHz	  E	  F	 H	  I	  K
)80286	(10)	 4.89	13.63	6.59	11.20	19.39
)80386	(16)	 3.57	 5.16	3.63	 6.86	 6.20
)68000	(8)	13.73	14.61	8.79	12.08	16.59
)68020 N	(16)	 8.02	 5.55	3.84	 5.65	 4.78
)68020 C (16)	 3.84	 2.47	2.14	 2.75	 3.02
)32032	(10)	12.52	13.07	6.21	 8.57	13.07
)32100 N (18)	16.81	 8.84	5.05	 8.57	 9.17
)32100 C (18)	 6.75	 4.29	2.74	 3.63	 4.45
)
)Benchmarks are EDN 16 benchmarks modified for 32 bits. The benchmarks were
)coded in assembly code for each processor.
)
)The following EDN programs were used.
)
)Test E is a character-string search routine.
)
)Test F is a bit test, set, and reset routine.
)
)Test H is a linked-list insertion routine.
)
)Test I is a quicksort routine.
)
)Test K is a bit-matrix transposition routine.


-- 
  Lawrence Crowl		716-275-5766	University of Rochester
			crowl@rochester.arpa	Computer Science Department
 ...!{allegra,decvax,seismo}!rochester!crowl	Rochester, New York,  14627

aglew@ccvaxa.UUCP (09/26/86)

..> `Averaging' benchmarks using the arithmetic or geometric mean

Whatever you do, you shouldn't combine benchmarks without weight factors
proportional to the importance of the jobs characterized by the benchmark
to you. You can use any weights in any type of characteristic you want,
but not all are meaningful. For example, if throughput is what you want 
to optimize, and your weights are the percentage of your job mix that is
characterized by each benchmark, then a linearly weighted arithmetic mean
is the average to use.

That, of course, assumes that there are no interactions between jobs.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

kendalla@blast.gwd.tek.com (Kendall Auel) (10/01/86)

In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes:
>                                                   Instead "[they] had the
>same person modify or write all the tests so [they] could be sure that the same
>algorithms would be used for all the processors" (page 56 of the IEEE article).
>Thus the benchmark results reflect not only the individual processors' ability
>to execute instructions but also the cleverness of this programmer in using
>each microprocessor's instruction set and architecture.  The results reported
>should not be considered true measures of the relative performance of these
>microprocessors.

The Computer Family Architecture (CFA) project of the Army and Navy in the
70's attempted to measure the performance of various computers. This is
considered, I believe, to be one of the ``classic'' benchmark efforts.

    "c	Programmers were not permitted to make _algorithmic_ improvements
	or modifications, but rather were required to translate the PDL
	descriptions into assembly language. Programmers were free to
	optimize their test programs to the extent possible with highly
	optimizing compilers. This ``hand translation'' of strictly defined
	algorithms was expected to reduce variations due to programmer skill."

	"Computer Structures: Principles and Examples" pg.58
	_Siewiorek,_Bell,_and_Newell_, McGraw-Hill 1982

If you are measuring the performance of a processor, then it is not
necessarily desireable to compile a standard program. If instead you
are measuring compiler/processor combined performance, then you
should certainly use the same source for all measurements. It is
much easier to rewrite a poor compiler than to redesign a poor architecture.

Kendall Auel
Tektronix, Inc.
(I don't claim or disclaim anything)

campbell@sauron.UUCP (Mark Campbell) (10/02/86)

In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes:
>Benchmark comparisons give valid results only if the same program is presented
>to each machine.  The compiler is considered part of the "machine" and its
>performance contributes to the overall performance of the machine.
> [...]

I don't believe that this is a valid point.  The article benchmarks
*processors*, not systems.  While the compiler is part of a system, it is
not part of a processor.  You are correct that the skill of the assembler
programmer is quite important -- however, using a compiler would only have
raised an issue concerning the skill of the compiler writer.

What I found ludicrous in the article was that they found that (to paraphrase)
'an internal cache was quite helpful'.  The way they set up the hardware it
should have been pretty damned obvious that an internal cache would be helpful.
The I80386 results were not made making use of 2 cycle external memory accesses.
With the delays induced for external memory accesses, those machines with
internal caches were clearly superior.  I guess that helps a lot if you are
going to build a 16/20MHz M68020/I80386 system with ~150ns DRAM and no external
cache (ala Compaq).  But it sure doesn't mean much to most of the Unix boxes
out there.
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

crowl@rochester.ARPA (Lawrence Crowl) (10/06/86)

In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes:
>Benchmark comparisons give valid results only if the same program is presented
>to each machine.  The compiler is considered part of the "machine" and its
>performance contributes to the overall performance of the machine.  This study
>did not present the same program to each machine.  Instead "[they] had the
>same person modify or write all the tests so [they] could be sure that the same
>algorithms would be used for all the processors" (page 56 of the IEEE article).
>Thus the benchmark results reflect not only the individual processors' ability
>to execute instructions but also the cleverness of this programmer in using
>each microprocessor's instruction set and architecture.  The results reported
>should not be considered true measures of the relative performance of these
>microprocessors.

How else do you compare assembly language performance between two machines with
different architectures?  Often critical sections of code will be coded in
assembler to increase speed.  The capability of the architecture to support
fast hand-coded assembler can have a significant effect on the performance of
the program.  So we need to do assembly language benchmarks.  I submit it is a
valid comparison to code the same algorithm into assembly on each machine.  
However, this coding must be done by and individual with equivalent experience
on each machine, spending the same amount of time programming.  That is, the
programmer is not allowed to bias the results by spending unfair amounts of
time optimizing his favorite processor.  The bottom line is that we must trust
the benchmarks, correlate them with other benchmarks, or do them ourselves.

By the same token, including the is often an invalid comparison because the
compiler can have a significant effect on the resulting performance.  Suppose
I take a student built, unoptimizing compiler for machine A and a highly tuned
optimizing compiler for machine B.  Now, if the two machines are anywhere close
in performance, machine B will win.  Here again, the bottom line is that we
must trust the benchmarks, correlate them with other benchmarks, or do them
ourselves.

Of coarse, we could have a competitive benchmark between interested parties.
Any takers?
-- 
  Lawrence Crowl		716-275-5766	University of Rochester
			crowl@rochester.arpa	Computer Science Department
 ...!{allegra,decvax,seismo}!rochester!crowl	Rochester, New York,  14627

franka@mmintl.UUCP (Frank Adams) (10/08/86)

In article <21344@rochester.ARPA> crowl@rochtest.UUCP (Lawrence Crowl) writes:
>By the same token, including the [compiler] is often an invalid comparison
>because the compiler can have a significant effect on the resulting
>performance.  Suppose I take a student built, unoptimizing compiler for
>machine A and a highly tuned optimizing compiler for machine B.  Now, if
>the two machines are anywhere close in performance, machine B will win.
>Here again, the bottom line is that we must trust the benchmarks, correlate
>them with other benchmarks, or do them ourselves.

The issue here is who is using the benchmarks, and for what?  If, as for most
of us, writing one's own compiler is not an option, then all that matters is
how our program will perform using the compilers available for it.  It
doesn't much matter if machine A is really faster than machine B, if the
only compilers available for machine A generate code which is so much worse
than that available on machine B that the programs actually run slower.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108