[comp.sys.amiga] Dhrystone

scott@applix.UUCP (Scott Evernden) (08/03/88)

fyi,

Dhrystone v2.1 results - Manx 3.6b vs Lattice 4.02
==================================================

No stack checks, inline string functions, in fast mem. (A1000/68010)
Averaged 5 runs (varying loop counts 10000 to 30000) each.

-----
SMALLCODE & SMALLDATA 16BIT INTS:
==================================
lc -cw -b -r -w -v -DREG=register -Lcdm dhry_?.c
	Dhrystones per Second:                      1219.5

lc -cw -b -r -w -v -Lcdm dhry_?.c
	Dhrystones per Second:                      1146.1

cc +x3 +x5 -DREG=register dhry_1.c
cc +x3 +x5 -DREG=register dhry_2.c
ln dhry_?.o -lm -lc
	Dhrystones per Second:                      1053.9

cc +x3 +x5 dhry_1.c
cc +x3 +x5 dhry_2.c
ln dhry_?.o -lm -lc
	Dhrystones per Second:                       959.2

-----
LARGECODE & LARGEDATA 16BIT INTS:
==================================
lc -cw -w -v -DREG=register -Lm dhry_?.c
	Dhrystones per Second:                      1202.4

lc -cw -w -v -Lm dhry_?.c
	Dhrystones per Second:                      1124.8

cc +x3 +x5 +r +C +D -DREG=register dhry_1.c
cc +x3 +x5 +r +C +D -DREG=register dhry_2.c
ln dhry_?.o -lml -lcl
	Dhrystones per Second:                      1039.0

cc +x3 +x5 +r +C +D dhry_1.c
cc +x3 +x5 +r +C +D dhry_2.c
ln dhry_?.o -lml -lcl
	Dhrystones per Second:                       941.9

-----
SMALLCODE & SMALLDATA 32BIT INTS:
==================================
lc -cw -b -r -v -DREG=register -Lcdm dhry_?.c
	Dhrystones per Second:                       978.1

lc -cw -b -r -v -Lcdm dhry_?.c
	Dhrystones per Second:                       914.6

cc +x3 +x5 +l -DREG=register dhry_1.c
cc +x3 +x5 +l -DREG=register dhry_2.c
ln dhry_?.o -lm32 -lc32
	Dhrystones per Second:                       744.1

cc +x3 +x5 +l dhry_1.c
cc +x3 +x5 +l dhry_2.c
ln dhry_?.o -lm32 -lc32
	Dhrystones per Second:                       683.0

-----
LARGECODE & LARGEDATA 32BIT INTS:
==================================
lc -cw -v -DREG=register -Lm dhry_?.c
	Dhrystones per Second:                       954.7

lc -cw -v -Lm dhry_?.c
	Dhrystones per Second:                       892.0

cc +x3 +x5 +l +r +C +D -DREG=register dhry_1.c
cc +x3 +x5 +l +r +C +D -DREG=register dhry_2.c
ln dhry_?.o -lml32 -lcl32
	Dhrystones per Second:                       733.0

cc +x3 +x5 +l +r +C +D dhry_1.c
cc +x3 +x5 +l +r +C +D dhry_2.c
ln dhry_?.o -lml32 -lcl32
	Dhrystones per Second:                       674.5


-scott

scott@applix.UUCP (Scott Evernden) (08/10/88)

In article <755@applix.UUCP> scott@applix.UUCP (Scott Evernden) writes:
> .....
>Dhrystone v2.1 results - Manx 3.6b vs Lattice 4.02
>LARGECODE & LARGEDATA 16BIT INTS:
>==================================
>lc -cw -w -v -DREG=register -Lm dhry_?.c
... etc.,

Your (my) large-model Lattice tests are hosed.
(Heady compiler junk ahead- "n" to skip...)

I just stumbled across the following facts:

o I should have remembered but forgot, that specifying -b and not specifying
  it are exactly the same.  I need to add -b0 in order to get large-data
  addressing.

o Something I didn't know: specifying -r and not specifying it are the exact
  same thing- I need to add -r0 in order to get large-code addressing.
  (apparently not in the docs; is this a secret?)

o the -Lm would _seem_ to imply that I _don't_ want SMALLCODE or SMALLDATA;
  however, the .lnk file produced by "lc" references lc.lib (the -b lib) and
  _not_ lcnb.lib (the -b0 lib) as I might have expected.  In order to
  perform a truly fair comparison, I need to explicitly modify the .lnk to
  link from the "nb" libraries.

o Having said the above, preliminary indications are that the numbers don't
  change too significantly if I do the tests properly.  If any radical
  differences appear, I will post...

-scott

scott@applix.UUCP (Scott Evernden) (08/15/88)

In article <583@faui44.informatik.uni-erlangen.de> mlelstv@faui44.UUCP (Michael van Elst (kdebugger)) writes:
>I would like to tell you that this Dhrystone results
>depend much on the abilty of Lattice to use inline code
>for string functions.

Manx 3.6 offers inline string function handling.

-scott

dillon@CORY.BERKELEY.EDU (Matt Dillon) (08/15/88)

>In article <583@faui44.informatik.uni-erlangen.de> mlelstv@faui44.UUCP (Michael van Elst (kdebugger)) writes:
>>I would like to tell you that this Dhrystone results
>>depend much on the abilty of Lattice to use inline code
>>for string functions.
>
>Manx 3.6 offers inline string function handling.
>
>-scott

	This is why such benchmarks are ludicrous, when people fine-tune
the benchmark and/or compiler to make the benchmark look better.  I won't
even talk about what Intel did.  Well, maybe I will.  They took IBM's
MIPS benchmark and gave themselves a MIPS rating timing NOP's. ... IBM
defines MIPS as testing *all* the instructions of a processor with the
estimated percent-usage for those instructions.  Suddenly, everybody's 
brother's processor was doing better than an IBM mainframe!

					-Matt

walker@sas.UUCP (Doug Walker) (08/17/88)

In article <8808150554.AA14630@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>
>	This is why such benchmarks are ludicrous, when people fine-tune
>the benchmark and/or compiler to make the benchmark look better.  I won't

I think your comment is the ludicrous thing here.  Think about it and tell
me you don't think in-line string handling improves performance on ANY
program.  Many of the in-line string routines in Lattice are both smaller
AND faster than pushing parameters on the stack, branching, returning and
cleaning up the stack.  Benchmarks need never be considered.  Once the
feature is in, of course it should be taken into account when the benchmarks
are determined.  You are tarring Lattive and Manx with Intel's
brush, when there is no evidence whatsoever that either is guilty.

dillon@CORY.BERKELEY.EDU (Matt Dillon) (08/18/88)

:In article <8808150554.AA14630@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
:>
:>	This is why such benchmarks are ludicrous, when people fine-tune
:>the benchmark and/or compiler to make the benchmark look better.  I won't
:
:I think your comment is the ludicrous thing here.  Think about it and tell
:me you don't think in-line string handling improves performance on ANY
:program.  Many of the in-line string routines in Lattice are both smaller
:AND faster than pushing parameters on the stack, branching, returning and
:cleaning up the stack.  Benchmarks need never be considered.  Once the
:feature is in, of course it should be taken into account when the benchmarks
:are determined.  You are tarring Lattive and Manx with Intel's
:brush, when there is no evidence whatsoever that either is guilty.

	Uh huh, you haven't thought the thing through have you?  Let me
explain it more carefully:  How large a percentage increase in speed do you
think you will get by replacing a subroutine-strcpy() with an inline-strcpy()
(etc...) ???

	Now, does the benchmark give a 'faster' value that agrees with the
relative speed increase of your program?

	Properly, the idea is to repeat the test on a whole shitload of
programs (that were not designed specifically to make a benchmark look good),
get the mean/average/whatever, and compare that relative speed increase to
the relative speed increase in the benchmark by the inline code.

				--

	Does that answer you question?  I am not tarring either Lattice or 
Manx, but pointing out two things people do not seem to understand about
benchmarks.  (1) Never fine tune a benchmark, and (2) Benchmarks are 
incredibly difficult to write if written properly.

	Going a little deeper:  A properly written benchmark should be 
difficult to fine-tune (anybody else catch that inference?)

						-Matt

news@amdcad.AMD.COM (Network News) (08/18/88)

In article <8808180050.AA05343@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
| 	Does that answer you question?  I am not tarring either Lattice or 
| Manx, but pointing out two things people do not seem to understand about
| benchmarks.  (1) Never fine tune a benchmark, and (2) Benchmarks are 
| incredibly difficult to write if written properly.
| 
| 	Going a little deeper:  A properly written benchmark should be 
| difficult to fine-tune (anybody else catch that inference?)
| 
| 						-Matt

No.  A properly written benchmark should reflect the intended use of the
system.  If my system is constantly being used to solve systems of
linear equations, then LINPACK is a great benchmark, and if a compiler
can automatically unroll the inner-loop of saxpy or whatever, so much
the better.

	-- Tim Olson
	Advanced Micro Devices
	(tim@delirun.amd.com)

dillon@CORY.BERKELEY.EDU (Matt Dillon) (08/19/88)

:In article <8808180050.AA05343@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
:| 	Does that answer you question?  I am not tarring either Lattice or 
:| Manx, but pointing out two things people do not seem to understand about
:| benchmarks.  (1) Never fine tune a benchmark, and (2) Benchmarks are 
:| incredibly difficult to write if written properly.
:| 
:| 	Going a little deeper:  A properly written benchmark should be 
:| difficult to fine-tune (anybody else catch that inference?)
:| 
:| 						-Matt
:
:No.  A properly written benchmark should reflect the intended use of the
:system.  If my system is constantly being used to solve systems of
:linear equations, then LINPACK is a great benchmark, and if a compiler
:can automatically unroll the inner-loop of saxpy or whatever, so much
:the better.

	You mean:  "Also, A probably written benchmark should....".  When
I say fine-tune, I mean modify the 'standard' language/compiler/or benchmark
itself to make the perceived results look better.  A Benchmark has never meant
'The best that can be done', but more as a relative comparison agaist other
languages and machines.  Those who take the time to write incredibly optimized 
code will get better results for their product, but that is not comparable to
other languages/machines unless they take the time to optimize each one.

						-Matt

david@ms.uky.edu (David Herron -- One of the vertebrae) (08/27/88)

In article <8808180050.AA05343@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>
>:In article <8808150554.AA14630@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>:>        This is why such benchmarks are ludicrous, when people fine-tune
>:>the benchmark and/or compiler to make the benchmark look better.  I won't
>:I think your comment is the ludicrous thing here.
>        Uh huh, you haven't thought the thing through have you?  Let me
>explain it more carefully:  How large a percentage increase in speed do you
>think you will get by replacing a subroutine-strcpy() with an inline-strcpy()
>(etc...) ???

Matt, Matt, Matt... calm down ...

Have you ever read the original Dhrystone article?  The guy who came up
with Dhrystone did all the right things make sure it used a proper mix
of statement types.  (Well ... he surveyed the programs written by first
semester Pascal students to get his data ...)

>        Properly, the idea is to repeat the test on a whole shitload of
>programs (that were not designed specifically to make a benchmark look good),
>get the mean/average/whatever, and compare that relative speed increase to
>the relative speed increase in the benchmark by the inline code.

I don't think the art of benchmarking has improved to the point where
someone can properly say what you just said.  With larger programs
it's harder to be able to say things about them with confidence.

>        Does that answer you question?  I am not tarring either Lattice or
>Manx, but pointing out two things people do not seem to understand about
>benchmarks.  (1) Never fine tune a benchmark, and (2) Benchmarks are
>incredibly difficult to write if written properly.

It didn't sound to me as if anybody changed the benchmark.  Merely
that Lattice put a global optimization into their compiler to make
strcpy and friends into inline functions.  Maybe they had in mind
soley the idea of improving their dhrystone rating and saw that 
a lot of dhrystone was string copying.  (It's been awhile since I
read dhrystone so I don't remember what the perentage was for string
manipulation).  But so what?  Strings get copied all the time in
programs, so this is also a useful general improvement.
-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<---- Problem: how to get people to call ...; Solution: Completely reconfigure 
<---- your mail system then leave for a weeks vacation when 90% done.