[comp.arch] FORTRAN Dhrystone for i860? NO; NEW PLAUSIBLE ANSWER; FLAMES

mash@mips.COM (John Mashey) (03/16/89)

In article <471@estevax.UUCP> wck353@estevax.UUCP (HrDr Weicker Reinhold ) writes:
>
>In the discussion about Intel's new chip (i860, alias N10),
>there has been some confusion about the Dhrystone number for this chip.
...
>"Fortran Dhrystone" sounds very strange, and first I couldn't believe it since
>I have never made a Fortran version of Dhrystone, nor seen one
>made by somebody else....
>In this paper, on page 11, they say that Dhrystone was compiled with
>"Green Hills Fortran 1.8.5", and on page 12: ".. developed in ADA ..
>Fortran and C versions are more commonly used". ....

Ken Shoemaker posted later that it was indeed C, despite the multiple
references to FORTRAN.

>I encourage everyone to use Dhrystone version 2 since it gives more
>realistic results than version 1. However, I know that manufacturers
>tend to publish whatever results make their product look better.
......
>The difference between the versions varies with the compiler and
>optimization level; I have seen differences betwen 0 and 15 %.
>Again, I can only cite what I wrote in the SIGPLAN Notices paper:
>"For serious performance evaluation, users are advised to ask for
>code listings and to check them carefully."
>Critical points for Dhrystone are separate compilation of the two modules
>and the rule "no procedure inlining".

OK, now I need some help.  We all doubted the i860 numbers the instant we
both knew what the machine looked like, and saw the numbers.  A knowledgable
person (who may or may not wish to be identified!) mailed me the following;
NOTE THAT THIS IS A CONJECTURE THAT NEEDS PROOF...:

"Several years ago I saw the 386 assembly language output generated by the
Greenhills C compiler for Dhrystone.  They inlined all the string functions,
and turned constant strings into fixed length copies (no NUL-character
recognition).

I wouldn't be surprised if you're seeing the same thing for the i860
results."

Now, according to the letter or the law of Herr Doktor Weicker's Dhrystone 2.1
writeup, it's OK to in-line strcpy and strcmp.  Unfortunately, in this
particular case, a set of conditions exists that is not particularly frequent:
	a) The source of the strcpy is a constant string, so the compiler
		knows how long it is.
	b) The target of the strcpy is something whose alignment is known
		at compile time, or even better, can be aligned as the
		compiler chooses.  [i.e., NOT a pointer to some unknown place.]
Given these conditions, you can easily turn the copy into a structure-assignment
equivalent, that need not inspect any bytes at all.
(How do I know this isn't particularly frequent? .... grep is a good thing.
I grepped for strcpy amongst UNIX commands.  Out of sample of about 800 strcpy's
about 150 had constants as sources.  However, very few of them had targets
whose alignments would be known at compile time (from quick eyeball
inspection, and I'm sick enough of this not to look harder.)
I'd guess maybe 10-20% were alignable, and they were typically short
copies of a few characters.  What this says is that about 2-3% of strcpy's
are of this form.  In my experience, a program that spent 10% of its
time in strcpy would be amazing; let's assume 1-10%.  This says that
somewhere between .02% and .3% of a typical program might be attacked
by this optimization, typically.

NOW, THE BIG QUESTION.  How much does this optimization improve your Dhrystones?
Charlie Price earlier posted some of MIPS' "regular" variations.
What he didn't try was faking up the inlining of strcpy (via a macro
that turns it into a structure assignment), which earl did.

Lo and behold!  What we used to call about a 43K Dhrystone (1.1) machine
is now in the 58-60K Dhrystone range, A 30-40% HIGHER NUMBER!!!!!!!
(for an optimization that would be lucky to achieve even 1% in realistic
programs.)  Reinhold has worked HARD to get something a little more
immune to stuff like this, but......if gimmicks like this can
make a 30-40% difference in the numbers, and if there are many compilers
out there doing this (without reporting it, i.e., by just saying
that they used XYZ compilers with (undecipherable options).....then
a lot of people have wasted an INCREDIBLE amount of time believing there
was even the slightest use to these numbers. Two machines can have the same
overall integer performance, but one can be made to look 30-40% better.  ARGH!!

IF this is going on, it will certainly go into the Hennessy/Paterson
RISC book's "rogue's gallery of misbenchmarking", or whatever they call
that chapter, along with some other gems (like, making 100,000 getpid()
go real fast by doing the syscall just once and saving the value; etc)

--------------------------------------------
Now, we have the part you all out there in netland can help with.
Remember that this all started with a conjecture....

1) Does anybody KNOW if the Green Hills compilers do this, in general?
(or specifically:
	the 386?
	the 68020?
	the 88000?  (we never have been able to duplicate the #s....)
	etc.

2) Could you post the assembly code, if there is something like this?

3) Is anybody from Intel or Green Hills willing to say if this
conjecture is right, specifically for the i860, or not?
(I understand that you may not wish to answer ....:-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086