paf@oblio.UUCP (Paul Fronberg) (09/20/86)
In the August issue of IEEE Micro there is a very interesting article concerning benchmarking 32-bit microprocessors. The following table is abstracted from page 57. Numbers are time in seconds. (N=no cache enabled; C=cache enabled). This table reflects the results for dynamic memory. MHz E F H I K 80286 (10) 4.89 13.63 6.59 11.20 19.39 80386 (16) 3.57 5.16 3.63 6.86 6.20 68000 (8) 13.73 14.61 8.79 12.08 16.59 68020 N (16) 8.02 5.55 3.84 5.65 4.78 68020 C (16) 3.84 2.47 2.14 2.75 3.02 32032 (10) 12.52 13.07 6.21 8.57 13.07 32100 N (18) 16.81 8.84 5.05 8.57 9.17 32100 C (18) 6.75 4.29 2.74 3.63 4.45 Benchmarks are EDN 16 benchmarks modified for 32 bits. The benchmarks were coded in assembly code for each processor. The following EDN programs were used. Test E is a character-string search routine. Test F is a bit test, set, and reset routine. Test H is a linked-list insertion routine. Test I is a quicksort routine. Test K is a bit-matrix transposition routine.
wiemann@hplabsb.UUCP (Alan Wiemann) (09/21/86)
Benchmark comparisons give valid results only if the same program is presented to each machine. The compiler is considered part of the "machine" and its performance contributes to the overall performance of the machine. This study did not present the same program to each machine. Instead "[they] had the same person modify or write all the tests so [they] could be sure that the same algorithms would be used for all the processors" (page 56 of the IEEE article). Thus the benchmark results reflect not only the individual processors' ability to execute instructions but also the cleverness of this programmer in using each microprocessor's instruction set and architecture. The results reported should not be considered true measures of the relative performance of these microprocessors. Unfortunately benchmark comparisons often suffer from commercial hype and unscientific methods. Let the buyer beware! Alan Wiemann Aitchpeelabs Palowaltocalifornya hplabs!wiemann
crowl@rochester.ARPA (Lawrence Crowl) (09/21/86)
The table below is a reorganization of the following table.
Relative Performance
processor 80286 80386 68000 68020 68020 32032 32100 32100
cache (MHz) (10) (16) (8) N (16) C (16) (10) N (18) C (18)
string search 1.37 1.00 3.85 2.25 1.08 3.51 4.71 1.89
bit manipulate 5.52 2.09 5.91 2.25 1.00 5.29 3.58 1.74
linked list 3.08 1.70 4.11 1.79 1.00 2.90 2.36 1.28
quicksort 4.07 2.49 4.39 2.05 1.00 3.12 3.12 1.32
matrix trans 6.42 2.05 5.49 1.58 1.00 4.33 3.04 1.47
average 4.09 1.87 4.75 1.98 1.02 3.83 3.36 1.54
In article <322@oblio.UUCP> paf@oblio.UUCP (Paul Fronberg) writes:
)In the August issue of IEEE Micro there is a very interesting article
)concerning benchmarking 32-bit microprocessors. The following table is
)abstracted from page 57. Numbers are time in seconds. (N=no cache enabled;
)C=cache enabled). This table reflects the results for dynamic memory.
)
) MHz E F H I K
)80286 (10) 4.89 13.63 6.59 11.20 19.39
)80386 (16) 3.57 5.16 3.63 6.86 6.20
)68000 (8) 13.73 14.61 8.79 12.08 16.59
)68020 N (16) 8.02 5.55 3.84 5.65 4.78
)68020 C (16) 3.84 2.47 2.14 2.75 3.02
)32032 (10) 12.52 13.07 6.21 8.57 13.07
)32100 N (18) 16.81 8.84 5.05 8.57 9.17
)32100 C (18) 6.75 4.29 2.74 3.63 4.45
)
)Benchmarks are EDN 16 benchmarks modified for 32 bits. The benchmarks were
)coded in assembly code for each processor.
)
)The following EDN programs were used.
)
)Test E is a character-string search routine.
)
)Test F is a bit test, set, and reset routine.
)
)Test H is a linked-list insertion routine.
)
)Test I is a quicksort routine.
)
)Test K is a bit-matrix transposition routine.
--
Lawrence Crowl 716-275-5766 University of Rochester
crowl@rochester.arpa Computer Science Department
...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
aglew@ccvaxa.UUCP (09/26/86)
..> `Averaging' benchmarks using the arithmetic or geometric mean Whatever you do, you shouldn't combine benchmarks without weight factors proportional to the importance of the jobs characterized by the benchmark to you. You can use any weights in any type of characteristic you want, but not all are meaningful. For example, if throughput is what you want to optimize, and your weights are the percentage of your job mix that is characterized by each benchmark, then a linearly weighted arithmetic mean is the average to use. That, of course, assumes that there are no interactions between jobs. Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms
kendalla@blast.gwd.tek.com (Kendall Auel) (10/01/86)
In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes: > Instead "[they] had the >same person modify or write all the tests so [they] could be sure that the same >algorithms would be used for all the processors" (page 56 of the IEEE article). >Thus the benchmark results reflect not only the individual processors' ability >to execute instructions but also the cleverness of this programmer in using >each microprocessor's instruction set and architecture. The results reported >should not be considered true measures of the relative performance of these >microprocessors. The Computer Family Architecture (CFA) project of the Army and Navy in the 70's attempted to measure the performance of various computers. This is considered, I believe, to be one of the ``classic'' benchmark efforts. "c Programmers were not permitted to make _algorithmic_ improvements or modifications, but rather were required to translate the PDL descriptions into assembly language. Programmers were free to optimize their test programs to the extent possible with highly optimizing compilers. This ``hand translation'' of strictly defined algorithms was expected to reduce variations due to programmer skill." "Computer Structures: Principles and Examples" pg.58 _Siewiorek,_Bell,_and_Newell_, McGraw-Hill 1982 If you are measuring the performance of a processor, then it is not necessarily desireable to compile a standard program. If instead you are measuring compiler/processor combined performance, then you should certainly use the same source for all measurements. It is much easier to rewrite a poor compiler than to redesign a poor architecture. Kendall Auel Tektronix, Inc. (I don't claim or disclaim anything)
campbell@sauron.UUCP (Mark Campbell) (10/02/86)
In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes: >Benchmark comparisons give valid results only if the same program is presented >to each machine. The compiler is considered part of the "machine" and its >performance contributes to the overall performance of the machine. > [...] I don't believe that this is a valid point. The article benchmarks *processors*, not systems. While the compiler is part of a system, it is not part of a processor. You are correct that the skill of the assembler programmer is quite important -- however, using a compiler would only have raised an issue concerning the skill of the compiler writer. What I found ludicrous in the article was that they found that (to paraphrase) 'an internal cache was quite helpful'. The way they set up the hardware it should have been pretty damned obvious that an internal cache would be helpful. The I80386 results were not made making use of 2 cycle external memory accesses. With the delays induced for external memory accesses, those machines with internal caches were clearly superior. I guess that helps a lot if you are going to build a 16/20MHz M68020/I80386 system with ~150ns DRAM and no external cache (ala Compaq). But it sure doesn't mean much to most of the Unix boxes out there. -- Mark Campbell Phone: (803)-791-6697 E-Mail: !ncsu!ncrcae!sauron!campbell
crowl@rochester.ARPA (Lawrence Crowl) (10/06/86)
In article <3600003@hplabsb.UUCP> wiemann@hplabsb.UUCP (Alan Wiemann) writes: >Benchmark comparisons give valid results only if the same program is presented >to each machine. The compiler is considered part of the "machine" and its >performance contributes to the overall performance of the machine. This study >did not present the same program to each machine. Instead "[they] had the >same person modify or write all the tests so [they] could be sure that the same >algorithms would be used for all the processors" (page 56 of the IEEE article). >Thus the benchmark results reflect not only the individual processors' ability >to execute instructions but also the cleverness of this programmer in using >each microprocessor's instruction set and architecture. The results reported >should not be considered true measures of the relative performance of these >microprocessors. How else do you compare assembly language performance between two machines with different architectures? Often critical sections of code will be coded in assembler to increase speed. The capability of the architecture to support fast hand-coded assembler can have a significant effect on the performance of the program. So we need to do assembly language benchmarks. I submit it is a valid comparison to code the same algorithm into assembly on each machine. However, this coding must be done by and individual with equivalent experience on each machine, spending the same amount of time programming. That is, the programmer is not allowed to bias the results by spending unfair amounts of time optimizing his favorite processor. The bottom line is that we must trust the benchmarks, correlate them with other benchmarks, or do them ourselves. By the same token, including the is often an invalid comparison because the compiler can have a significant effect on the resulting performance. Suppose I take a student built, unoptimizing compiler for machine A and a highly tuned optimizing compiler for machine B. Now, if the two machines are anywhere close in performance, machine B will win. Here again, the bottom line is that we must trust the benchmarks, correlate them with other benchmarks, or do them ourselves. Of coarse, we could have a competitive benchmark between interested parties. Any takers? -- Lawrence Crowl 716-275-5766 University of Rochester crowl@rochester.arpa Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
franka@mmintl.UUCP (Frank Adams) (10/08/86)
In article <21344@rochester.ARPA> crowl@rochtest.UUCP (Lawrence Crowl) writes: >By the same token, including the [compiler] is often an invalid comparison >because the compiler can have a significant effect on the resulting >performance. Suppose I take a student built, unoptimizing compiler for >machine A and a highly tuned optimizing compiler for machine B. Now, if >the two machines are anywhere close in performance, machine B will win. >Here again, the bottom line is that we must trust the benchmarks, correlate >them with other benchmarks, or do them ourselves. The issue here is who is using the benchmarks, and for what? If, as for most of us, writing one's own compiler is not an option, then all that matters is how our program will perform using the compilers available for it. It doesn't much matter if machine A is really faster than machine B, if the only compilers available for machine A generate code which is so much worse than that available on machine B that the programs actually run slower. Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108