david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/06/87)
what is needed is a way of weighting the instructions on various processors against some absolute scale. then you could do something like: sum-for-i-from-0-to-n (wieght-of-instruction-i * #-of-times-executed) --------------------------------------------------------------------- number-of-seconds and have a really useful measure. (Wish this terminal would do capital-sigma's) My vague memory of calculus reminds me of "moment arms" Of course the "weight-of-instruction-i" term is non-trivial. -- <---- David Herron, Local E-Mail Hack, david@ms.uky.edu, david@ms.uky.csnet <---- {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET <---- I thought that time was this neat invention that kept everything <---- from happening at once. Why doesn't this work in practice?
steven@cwi.nl (Steven Pemberton) (10/09/87)
In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes: > what is needed is a way of weighting the instructions on various > processors against some absolute scale. then you could [...] have a > really useful measure. Well, surely this is the purpose of the Dhrystone benchmark. Of course, the quality of the compiler distorts the figure, but at least you get a reasonable figure with which to compare machines. For instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around 1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000, and so on. Steven Pemberton, CWI, Amsterdam; steven@cwi.nl
jdow@gryphon.CTS.COM (Joanne Dow) (10/12/87)
In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes: >In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes: >> what is needed is a way of weighting the instructions on various >> processors against some absolute scale. then you could [...] have a >> really useful measure. > >Well, surely this is the purpose of the Dhrystone benchmark. Of >course, the quality of the compiler distorts the figure, but at least >you get a reasonable figure with which to compare machines. For >instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around >1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000, >and so on. > Trevor Marshall tested one of his 68020/68881 plugins for the PC at 35MHz. That baby turned out over 7000 dhrystones. Eat hot bytes 80386! >Steven Pemberton, CWI, Amsterdam; steven@cwi.nl -- <@_@> BIX:jdow INTERNET:jdow@gryphon.CTS.COM UUCP:{akgua, hplabs!hp-sdd, sdcsvax, ihnp4, nosc}!crash!gryphon!jdow Remember - A bird in the hand often leaves a sticky deposit. Perhaps it was better you left it in the bush with the other one.
ralf@B.GP.CS.CMU.EDU (Ralf Brown) (10/13/87)
In article <1866@gryphon.CTS.COM> jdow@gryphon.CTS.COM (Joanne Dow) writes: >Trevor Marshall tested one of his 68020/68881 plugins for the PC at 35MHz. >That baby turned out over 7000 dhrystones. Eat hot bytes 80386! > >>Steven Pemberton, CWI, Amsterdam; steven@cwi.nl >-- > BIX:jdow > INTERNET:jdow@gryphon.CTS.COM > UUCP:{akgua, hplabs!hp-sdd, sdcsvax, ihnp4, nosc}!crash!gryphon!jdow Hmm... a 16 MHz 80386 turns out ~5500 dhrystones using 32-bit instructions*, so it will churn out ~7000 dhrystones at 20 MHz, and over 12,000 dhrystones at 35 MHz. What was that about "eat hot bytes"? And I've heard that Intel intends to keep upping the speed rating of 386's until they reach 32 MHz-- which should allow 40 MHz operation with selected chips. [I wouldn't mind having a 40 MHz 386 machine. Norton SI of 42, anyone?] I hope this doesn't degenerate into another MCIBTYC war.... [*] in fact, there are two entries in the March 1987 Dhrystone database for 16 MHz 386's getting ~7000 dhrystones with UNIX SVr3 and the Green Hills C 386 compiler.
ralf@B.GP.CS.CMU.EDU (Ralf Brown) (10/13/87)
[the .sig line-counter hits! Your .signature is not included. --More--] It seems that the facilities staff just installed a new version of the news poster with the 4 line .sig restriction. So here it is: -=-=-=-=-=-=-=-= {harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf =-=-=-=-=-=-=-=- ARPAnet: RALF@B.GP.CS.CMU.EDU BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA AT&Tnet: (412) 268-3053 (school) FIDOnet: Ralf Brown at 129/31 DISCLAIMER? Who ever said I claimed anything? "I do not fear computers. I fear the lack of them..." -- Isaac Asimov
chris@mimsy.UUCP (Chris Torek) (10/14/87)
In article <156@PT.CS.CMU.EDU> ralf@B.GP.CS.CMU.EDU (Ralf Brown) writes: >... ~7000 dhrystones with ... the Green Hills C 386 compiler. Be careful with these numbers. A *really good* optimising compiler will reduce Dhrystone to two `time' system calls and one bit of arithmetic and a printf: approximately infinity dhrystones. The Green Hills compilers are not *that* good (yet?), but they are good. <insert story about FORTRAN optimiser that reduced a benchmark to `print 3.1415926...'> -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mph@rover.UUCP (Mark Huth) (10/15/87)
In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes: > >Well, surely this is the purpose of the Dhrystone benchmark. Of >course, the quality of the compiler distorts the figure, but at least It seems to me that if one programs in C, then the C compiler is part of the environment. The fact that the compiler distorts the raw machine power to some extent is true, but unless you are an assembly guru (not just think that you are, since the CISC machines are quite complex and have timings that are no longer obvious due to caches and pipelines) you cannot generate code to fully utilize a giver archetecture'r power. Therefore, the high-level language benchmarks are very useful. We are able to improve the Dhrystone ratings of our systems by as much as 33% by improving the compiler. This is real good news, as all programs get some considerable performance gain by recompilation as better compilers become available. A couple of comments about RISC - Usually RISC is indicative of a design philosophy which uses little or no microcode. Most instructions are 2 or 3 address register to register instructions, with memory accesses limited to a few simple addressing modes of a load or store instruction. The simple instructions allow them to be organized to require the same length pipeline. Often pipline interlocks are left for the compiler to worry about. As a result, once the pipe is full, RISC will complete one instruction per clock. Normally, the instruction after a branch is executed whether the branch is take or not, leading to a significant performance improvement (by keeping the pipe full) provided the compiler can find a useful instruction to execute regardless of whether the branch is taken or not. This appears to be possible about 90% of the time. The other 10% is a nop - which is no loss, as the pipe would have otherwise been disrupted anyway. It is argued that the simpler instruction sets allow the compiler a better shot at optimization during code generation than trying to find exactly the right CISC instruction for a particular purpose. In essence, the compiler works at a level similar to the microcode level of a CISC architecture. Complex addressing modes are generated by multiple simple instructions. For example, the compiler generates MOVE.L ([ptr],offset),D0 to load a value given by the c statements register int D0; struct TMP *ptr; D0 = ptr -> offset; while the RISC machine might need to do LOAD #ptr,R20 Load immediate (use value from instruction stream) LOAD (R20),R24 Load indirect (use address in) R20 LOAD #offset,R21 ADD R20,R21,R22 Add R20 and R21, leaving value in R22 LOAD (R22),R0 Now get actual value (previous stuff was address) Of course, due to the (normally) large register set of the RISC machine, the constants and variables may already be in the registers, considerably reducing the number of instructions needed. The compiler is supposed to make this choice. Of course, since RISC often requires more instructions to accomplish its task, it is common to find RISC machines belonging to the Harvard class (separate instruction and data memory streams). Mark Huth
steven@cwi.nl (Steven Pemberton) (10/20/87)
In article <557@rover.UUCP> mph@rover.UUCP (Mark Huth) writes: > It seems to me that if one programs in C, then the C compiler is part > of the environment. The fact that the compiler distorts the raw > machine power to some extent is true, but unless you are an assembly > guru [...] you cannot generate code to fully utilize a giver > archetecture'r power. Therefore, the high-level language benchmarks > are very useful. > > We are able to improve the Dhrystone ratings of our systems by as much > as 33% by improving the compiler. This is real good news, as all > programs get some considerable performance gain by recompilation as > better compilers become available. Well, this is exactly what I meant. The problem with a MIPS rating is that you know little about what sort of instructions are involved. At least a Dhrystone rating gives you an objective number, but it is still not a completely good indication of pure machine power because the compiler distorts the figure (although it does at least give a lower bound). Just look at the figures for different C compilers on different makes of 8 MHz 68000: they run from 330 to 1370! The fact that you could tune your compiler to give 33% better performance only goes to show that the Dhrystone figure is only loosely related to the machine performance. Steven Pemberton, CWI, Amsterdam; steven@cwi.nl