david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/06/87)
what is needed is a way of weighting the instructions on various processors against some absolute scale. then you could do something like: sum-for-i-from-0-to-n (wieght-of-instruction-i * #-of-times-executed) --------------------------------------------------------------------- number-of-seconds and have a really useful measure. (Wish this terminal would do capital-sigma's) My vague memory of calculus reminds me of "moment arms" Of course the "weight-of-instruction-i" term is non-trivial. -- <---- David Herron, Local E-Mail Hack, david@ms.uky.edu, david@ms.uky.csnet <---- {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET <---- I thought that time was this neat invention that kept everything <---- from happening at once. Why doesn't this work in practice?
steven@cwi.nl (Steven Pemberton) (10/09/87)
In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes: > what is needed is a way of weighting the instructions on various > processors against some absolute scale. then you could [...] have a > really useful measure. Well, surely this is the purpose of the Dhrystone benchmark. Of course, the quality of the compiler distorts the figure, but at least you get a reasonable figure with which to compare machines. For instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around 1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000, and so on. Steven Pemberton, CWI, Amsterdam; steven@cwi.nl
jdow@gryphon.CTS.COM (Joanne Dow) (10/12/87)
In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes: >In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes: >> what is needed is a way of weighting the instructions on various >> processors against some absolute scale. then you could [...] have a >> really useful measure. > >Well, surely this is the purpose of the Dhrystone benchmark. Of >course, the quality of the compiler distorts the figure, but at least >you get a reasonable figure with which to compare machines. For >instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around >1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000, >and so on. > Trevor Marshall tested one of his 68020/68881 plugins for the PC at 35MHz. That baby turned out over 7000 dhrystones. Eat hot bytes 80386! >Steven Pemberton, CWI, Amsterdam; steven@cwi.nl -- <@_@> BIX:jdow INTERNET:jdow@gryphon.CTS.COM UUCP:{akgua, hplabs!hp-sdd, sdcsvax, ihnp4, nosc}!crash!gryphon!jdow Remember - A bird in the hand often leaves a sticky deposit. Perhaps it was better you left it in the bush with the other one.
mph@rover.UUCP (Mark Huth) (10/15/87)
In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes: > >Well, surely this is the purpose of the Dhrystone benchmark. Of >course, the quality of the compiler distorts the figure, but at least It seems to me that if one programs in C, then the C compiler is part of the environment. The fact that the compiler distorts the raw machine power to some extent is true, but unless you are an assembly guru (not just think that you are, since the CISC machines are quite complex and have timings that are no longer obvious due to caches and pipelines) you cannot generate code to fully utilize a giver archetecture'r power. Therefore, the high-level language benchmarks are very useful. We are able to improve the Dhrystone ratings of our systems by as much as 33% by improving the compiler. This is real good news, as all programs get some considerable performance gain by recompilation as better compilers become available. A couple of comments about RISC - Usually RISC is indicative of a design philosophy which uses little or no microcode. Most instructions are 2 or 3 address register to register instructions, with memory accesses limited to a few simple addressing modes of a load or store instruction. The simple instructions allow them to be organized to require the same length pipeline. Often pipline interlocks are left for the compiler to worry about. As a result, once the pipe is full, RISC will complete one instruction per clock. Normally, the instruction after a branch is executed whether the branch is take or not, leading to a significant performance improvement (by keeping the pipe full) provided the compiler can find a useful instruction to execute regardless of whether the branch is taken or not. This appears to be possible about 90% of the time. The other 10% is a nop - which is no loss, as the pipe would have otherwise been disrupted anyway. It is argued that the simpler instruction sets allow the compiler a better shot at optimization during code generation than trying to find exactly the right CISC instruction for a particular purpose. In essence, the compiler works at a level similar to the microcode level of a CISC architecture. Complex addressing modes are generated by multiple simple instructions. For example, the compiler generates MOVE.L ([ptr],offset),D0 to load a value given by the c statements register int D0; struct TMP *ptr; D0 = ptr -> offset; while the RISC machine might need to do LOAD #ptr,R20 Load immediate (use value from instruction stream) LOAD (R20),R24 Load indirect (use address in) R20 LOAD #offset,R21 ADD R20,R21,R22 Add R20 and R21, leaving value in R22 LOAD (R22),R0 Now get actual value (previous stuff was address) Of course, due to the (normally) large register set of the RISC machine, the constants and variables may already be in the registers, considerably reducing the number of instructions needed. The compiler is supposed to make this choice. Of course, since RISC often requires more instructions to accomplish its task, it is common to find RISC machines belonging to the Harvard class (separate instruction and data memory streams). Mark Huth
steven@cwi.nl (Steven Pemberton) (10/20/87)
In article <557@rover.UUCP> mph@rover.UUCP (Mark Huth) writes: > It seems to me that if one programs in C, then the C compiler is part > of the environment. The fact that the compiler distorts the raw > machine power to some extent is true, but unless you are an assembly > guru [...] you cannot generate code to fully utilize a giver > archetecture'r power. Therefore, the high-level language benchmarks > are very useful. > > We are able to improve the Dhrystone ratings of our systems by as much > as 33% by improving the compiler. This is real good news, as all > programs get some considerable performance gain by recompilation as > better compilers become available. Well, this is exactly what I meant. The problem with a MIPS rating is that you know little about what sort of instructions are involved. At least a Dhrystone rating gives you an objective number, but it is still not a completely good indication of pure machine power because the compiler distorts the figure (although it does at least give a lower bound). Just look at the figures for different C compilers on different makes of 8 MHz 68000: they run from 330 to 1370! The fact that you could tune your compiler to give 33% better performance only goes to show that the Dhrystone figure is only loosely related to the machine performance. Steven Pemberton, CWI, Amsterdam; steven@cwi.nl