mash@mips.COM (John Mashey) (03/16/89)
In article <471@estevax.UUCP> wck353@estevax.UUCP (HrDr Weicker Reinhold ) writes: > >In the discussion about Intel's new chip (i860, alias N10), >there has been some confusion about the Dhrystone number for this chip. ... >"Fortran Dhrystone" sounds very strange, and first I couldn't believe it since >I have never made a Fortran version of Dhrystone, nor seen one >made by somebody else.... >In this paper, on page 11, they say that Dhrystone was compiled with >"Green Hills Fortran 1.8.5", and on page 12: ".. developed in ADA .. >Fortran and C versions are more commonly used". .... Ken Shoemaker posted later that it was indeed C, despite the multiple references to FORTRAN. >I encourage everyone to use Dhrystone version 2 since it gives more >realistic results than version 1. However, I know that manufacturers >tend to publish whatever results make their product look better. ...... >The difference between the versions varies with the compiler and >optimization level; I have seen differences betwen 0 and 15 %. >Again, I can only cite what I wrote in the SIGPLAN Notices paper: >"For serious performance evaluation, users are advised to ask for >code listings and to check them carefully." >Critical points for Dhrystone are separate compilation of the two modules >and the rule "no procedure inlining". OK, now I need some help. We all doubted the i860 numbers the instant we both knew what the machine looked like, and saw the numbers. A knowledgable person (who may or may not wish to be identified!) mailed me the following; NOTE THAT THIS IS A CONJECTURE THAT NEEDS PROOF...: "Several years ago I saw the 386 assembly language output generated by the Greenhills C compiler for Dhrystone. They inlined all the string functions, and turned constant strings into fixed length copies (no NUL-character recognition). I wouldn't be surprised if you're seeing the same thing for the i860 results." Now, according to the letter or the law of Herr Doktor Weicker's Dhrystone 2.1 writeup, it's OK to in-line strcpy and strcmp. Unfortunately, in this particular case, a set of conditions exists that is not particularly frequent: a) The source of the strcpy is a constant string, so the compiler knows how long it is. b) The target of the strcpy is something whose alignment is known at compile time, or even better, can be aligned as the compiler chooses. [i.e., NOT a pointer to some unknown place.] Given these conditions, you can easily turn the copy into a structure-assignment equivalent, that need not inspect any bytes at all. (How do I know this isn't particularly frequent? .... grep is a good thing. I grepped for strcpy amongst UNIX commands. Out of sample of about 800 strcpy's about 150 had constants as sources. However, very few of them had targets whose alignments would be known at compile time (from quick eyeball inspection, and I'm sick enough of this not to look harder.) I'd guess maybe 10-20% were alignable, and they were typically short copies of a few characters. What this says is that about 2-3% of strcpy's are of this form. In my experience, a program that spent 10% of its time in strcpy would be amazing; let's assume 1-10%. This says that somewhere between .02% and .3% of a typical program might be attacked by this optimization, typically. NOW, THE BIG QUESTION. How much does this optimization improve your Dhrystones? Charlie Price earlier posted some of MIPS' "regular" variations. What he didn't try was faking up the inlining of strcpy (via a macro that turns it into a structure assignment), which earl did. Lo and behold! What we used to call about a 43K Dhrystone (1.1) machine is now in the 58-60K Dhrystone range, A 30-40% HIGHER NUMBER!!!!!!! (for an optimization that would be lucky to achieve even 1% in realistic programs.) Reinhold has worked HARD to get something a little more immune to stuff like this, but......if gimmicks like this can make a 30-40% difference in the numbers, and if there are many compilers out there doing this (without reporting it, i.e., by just saying that they used XYZ compilers with (undecipherable options).....then a lot of people have wasted an INCREDIBLE amount of time believing there was even the slightest use to these numbers. Two machines can have the same overall integer performance, but one can be made to look 30-40% better. ARGH!! IF this is going on, it will certainly go into the Hennessy/Paterson RISC book's "rogue's gallery of misbenchmarking", or whatever they call that chapter, along with some other gems (like, making 100,000 getpid() go real fast by doing the syscall just once and saving the value; etc) -------------------------------------------- Now, we have the part you all out there in netland can help with. Remember that this all started with a conjecture.... 1) Does anybody KNOW if the Green Hills compilers do this, in general? (or specifically: the 386? the 68020? the 88000? (we never have been able to duplicate the #s....) etc. 2) Could you post the assembly code, if there is something like this? 3) Is anybody from Intel or Green Hills willing to say if this conjecture is right, specifically for the i860, or not? (I understand that you may not wish to answer ....:-) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086