[comp.sys.misc] Weighty instructions

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/06/87)

what is needed is a way of weighting the instructions on various
processors against some absolute scale.  then you could do something
like:

	sum-for-i-from-0-to-n (wieght-of-instruction-i * #-of-times-executed)
	---------------------------------------------------------------------
	number-of-seconds

and have a really useful measure.

(Wish this terminal would do capital-sigma's)


My vague memory of calculus reminds me of "moment arms"


Of course the "weight-of-instruction-i" term is non-trivial.
-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

steven@cwi.nl (Steven Pemberton) (10/09/87)

In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes:
> what is needed is a way of weighting the instructions on various
> processors against some absolute scale.  then you could [...] have a
> really useful measure. 

Well, surely this is the purpose of the Dhrystone benchmark. Of
course, the quality of the compiler distorts the figure, but at least
you get a reasonable figure with which to compare machines. For
instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around
1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000,
and so on.

Steven Pemberton, CWI, Amsterdam; steven@cwi.nl

jdow@gryphon.CTS.COM (Joanne Dow) (10/12/87)

In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes:
>In article <7422@e.ms.uky.edu> david@ms.uky.edu (David Herron) writes:
>> what is needed is a way of weighting the instructions on various
>> processors against some absolute scale.  then you could [...] have a
>> really useful measure. 
>
>Well, surely this is the purpose of the Dhrystone benchmark. Of
>course, the quality of the compiler distorts the figure, but at least
>you get a reasonable figure with which to compare machines. For
>instance, a 8MHz 68000 is around 900 dhrystones, a VAX 780 is around
>1500, a 20MHz 414 transputer around 3300, a 25Mhz 68020 around 6000,
>and so on.
>
Trevor Marshall tested one of his 68020/68881 plugins for the PC at 35MHz.
That baby turned out over 7000 dhrystones. Eat hot bytes 80386!

>Steven Pemberton, CWI, Amsterdam; steven@cwi.nl


-- 
<@_@>
	BIX:jdow
	INTERNET:jdow@gryphon.CTS.COM
	UUCP:{akgua, hplabs!hp-sdd, sdcsvax, ihnp4, nosc}!crash!gryphon!jdow

Remember - A bird in the hand often leaves a sticky deposit. Perhaps it was
better you left it in the bush with the other one.

ralf@B.GP.CS.CMU.EDU (Ralf Brown) (10/13/87)

In article <1866@gryphon.CTS.COM> jdow@gryphon.CTS.COM (Joanne Dow) writes:
>Trevor Marshall tested one of his 68020/68881 plugins for the PC at 35MHz.
>That baby turned out over 7000 dhrystones. Eat hot bytes 80386!
>
>>Steven Pemberton, CWI, Amsterdam; steven@cwi.nl
>-- 
>	BIX:jdow
>	INTERNET:jdow@gryphon.CTS.COM
>	UUCP:{akgua, hplabs!hp-sdd, sdcsvax, ihnp4, nosc}!crash!gryphon!jdow

Hmm... a 16 MHz 80386 turns out ~5500 dhrystones using 32-bit instructions*,
so it will churn out ~7000 dhrystones at 20 MHz, and over 12,000 dhrystones 
at 35 MHz.  What was that about "eat hot bytes"?  And I've heard that Intel 
intends to keep upping the speed rating of 386's until they reach 32 MHz--
which should allow 40 MHz operation with selected chips.  [I wouldn't mind
having a 40 MHz 386 machine.  Norton SI of 42, anyone?]

I hope this doesn't degenerate into another MCIBTYC war....

[*] in fact, there are two entries in the March 1987 Dhrystone database for
16 MHz 386's getting ~7000 dhrystones with UNIX SVr3 and the Green Hills C 386
compiler.

ralf@B.GP.CS.CMU.EDU (Ralf Brown) (10/13/87)

[the .sig line-counter hits!  Your .signature is not included.  --More--]

It seems that the facilities staff just installed a new version of the
news poster with the 4 line .sig restriction.  So here it is:

-=-=-=-=-=-=-=-= {harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf =-=-=-=-=-=-=-=-
ARPAnet: RALF@B.GP.CS.CMU.EDU           BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA
AT&Tnet: (412) 268-3053 (school)        FIDOnet: Ralf Brown at 129/31
	        DISCLAIMER?  Who ever said I claimed anything? 
"I do not fear computers.  I fear the lack of them..." -- Isaac Asimov

chris@mimsy.UUCP (Chris Torek) (10/14/87)

In article <156@PT.CS.CMU.EDU> ralf@B.GP.CS.CMU.EDU (Ralf Brown) writes:
>... ~7000 dhrystones with ... the Green Hills C 386 compiler.

Be careful with these numbers.  A *really good* optimising compiler
will reduce Dhrystone to two `time' system calls and one bit of
arithmetic and a printf: approximately infinity dhrystones.  The
Green Hills compilers are not *that* good (yet?), but they are
good.

<insert story about FORTRAN optimiser that reduced a benchmark to
`print 3.1415926...'>
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

mph@rover.UUCP (Mark Huth) (10/15/87)

In article <90@piring.cwi.nl> steven@cwi.nl (Steven Pemberton) writes:
>
>Well, surely this is the purpose of the Dhrystone benchmark. Of
>course, the quality of the compiler distorts the figure, but at least

It seems to me that if one programs in C, then the C compiler is part
of the environment.  The fact that the compiler distorts the raw
machine power to some extent is true, but unless you are an assembly
guru (not just think that you are, since the CISC machines are quite
complex and have timings that are no longer obvious due to caches and
pipelines) you cannot generate code to fully utilize a giver
archetecture'r power.  Therefore, the high-level language benchmarks
are very useful.

We are able to improve the Dhrystone ratings of our systems by as much
as 33% by improving the compiler.  This is real good news, as all
programs get some considerable performance gain by recompilation as
better compilers become available.

A couple of comments about RISC - 

Usually RISC is indicative of a design philosophy which uses little or
no microcode.  Most instructions are 2 or 3 address register to
register instructions, with memory accesses limited to a few simple
addressing modes of a load or store instruction.  The simple
instructions allow them to be organized to require the same length
pipeline.  Often pipline interlocks are left for the compiler to worry
about.  As a result, once the pipe is full, RISC will complete one
instruction per clock.  Normally, the instruction after a branch is
executed whether the branch is take or not, leading to a significant
performance improvement (by keeping the pipe full) provided the
compiler can find a useful instruction to execute regardless of
whether the branch is taken or not.  This appears to be possible about
90% of the time.  The other 10% is a nop - which is no loss, as the
pipe would have otherwise been disrupted anyway.

It is argued that the simpler instruction sets allow the compiler a
better shot at optimization during code generation than trying to find
exactly the right CISC instruction for a particular purpose.  In
essence, the compiler works at a level similar to the microcode level
of a CISC architecture.  Complex addressing modes are generated by
multiple simple instructions.

For example, the compiler generates MOVE.L ([ptr],offset),D0 to load a
value given by the c statements

register int D0;
struct TMP *ptr;

D0 = ptr -> offset;

while the RISC machine might need to do

LOAD #ptr,R20       Load immediate (use value from instruction stream)
LOAD (R20),R24      Load indirect (use address in) R20
LOAD #offset,R21
ADD R20,R21,R22     Add R20 and R21, leaving value in R22
LOAD (R22),R0       Now get actual value (previous stuff was address)

Of course, due to the (normally) large register set of the RISC
machine, the constants and variables may already be in the registers,
considerably reducing the number of instructions needed.  The compiler
is supposed to make this choice.

Of course, since RISC often requires more instructions to accomplish
its task, it is common to find RISC machines belonging to the Harvard
class (separate instruction and data memory streams).

Mark Huth

steven@cwi.nl (Steven Pemberton) (10/20/87)

In article <557@rover.UUCP> mph@rover.UUCP (Mark Huth) writes:
> It seems to me that if one programs in C, then the C compiler is part
> of the environment.  The fact that the compiler distorts the raw
> machine power to some extent is true, but unless you are an assembly
> guru [...] you cannot generate code to fully utilize a giver
> archetecture'r power.  Therefore, the high-level language benchmarks
> are very useful.
> 
> We are able to improve the Dhrystone ratings of our systems by as much
> as 33% by improving the compiler.  This is real good news, as all
> programs get some considerable performance gain by recompilation as
> better compilers become available.

Well, this is exactly what I meant. The problem with a MIPS rating is
that you know little about what sort of instructions are involved. At
least a Dhrystone rating gives you an objective number, but it is
still not a completely good indication of pure machine power because
the compiler distorts the figure (although it does at least give a
lower bound). Just look at the figures for different C compilers on
different makes of 8 MHz 68000: they run from 330 to 1370! The fact
that you could tune your compiler to give 33% better performance only
goes to show that the Dhrystone figure is only loosely related to the
machine performance.

Steven Pemberton, CWI, Amsterdam; steven@cwi.nl