[comp.sys.intel] Dhrystones/Whetstones for i860

rfg@riunite.ACA.MCC.COM (Ron Guilmette) (05/18/89)

Does anybody out there have any Dhrystone or Whetstone numbers for
the i860?  If not, why not?

Intel... are you out there?  I'm calling your bluff.

I see where Intel has gotten a lot of publicity from their claim of
80 Mflops "peak" (single precision) and 60 Mflop "peak" (double
precision).  Well, I've finally had a chance to look over the
instruction set, and it is bloody obvious that it is going to take
the compiler writers a LONG LONG TIME before they can get normal
applications to compile down to code that will even approach 2/3
of these "peak" figures.  The problem of filling branch delay slots
shouldn't be too hard (and this is an old and previously solved
problem anyway), but to fully exploit the floating-point pipeline,
the double-wide instruction mode, and the floating-point multiply
and add/subtract instructions, not to mention taking account of
all of the dozen or so weird one-cycle (in some cases multi-cycle)
freeze conditions, presents a new and very difficult set of
code optimization problems.

Now considering the fact that Intel is claiming that they have
(1) a C compiler, (2) a FORTRAN compiler, (3) an assembler,
(4) a linker, (5) a simulator, (6) a (symbolic?) debugger,
and (last but not least) UNIX, all available for this chip,
it seems to me that somewhere along the way (you would think that)
they would have run both Dhrystone and Whetstone benchmarks.
Right?

So why don't those numbers appear on the first page of the
"data sheet" along side the (blatantly misleading) "peak" Mflop
numbers?  Who is Intel trying to kid?  Does anybody out there
who actually makes design-in decisions still (in this day and age)
fall for this s**t of quoting MIPS and Mflops?  If so, then it is
a sad comment on the business as a whole, and on hardware engineers
in particular.

One last question.  Considering all of the i860's (complicated)
parallelism would *could* be productively used, I'd like to know
in Intel (or anybody else) is building (or has built, or is planning
to build) some software tool which can gobble up mediocre code and
re-mangle it into execelent code.

Motorola has a tool like this for the 80000 (all it has to do is to
try to fill branch delay slots because the 88000 is both simpler and
less "parallel").  Unfortunately, I have heard that early versions
were fairly buggy.

-- 
// Ron Guilmette  -  MCC  -  Experimental Systems Kit Project
// 3500 West Balcones Center Drive,  Austin, TX  78759  -  (512)338-3740
// ARPA: rfg@mcc.com
// UUCP: {rutgers,uunet,gatech,ames,pyramid}!cs.utexas.edu!pp!rfg

lfm@fpssun.fps.com (Larry Meadows ) (05/24/89)

In article <212@riunite.ACA.MCC.COM> rfg@riunite.UUCP (Ron Guilmette) writes:
>Does anybody out there have any Dhrystone or Whetstone numbers for
>the i860?  If not, why not?

From intel published material:
Dhrystones:
	Measured 69,000 Dhrystones (V1.1) @ 33.3 MHz 
	  Green Hills C V1.8.5
	  Estimated 90K (V1.1) and 85K(V2.1) @ 40MHz w/ improved compiler
Whetstones:
	Measured 20Kwhets (double) and 25.6 Kwhets (double) @ 33.3 Mhz
	  Green Hills Fortran 1.8.5
	  Estimated 25 and 32 Kwhets @ 40MHz w/ improved compiler
Linpack 100x100 (Double precision):
	6.1 Mflops (compiled), 11 (coded BLAS) @33.3MHZ
	  Green Hills Fortran 1.8.5 and VAST 2.25N1 (vectorizer)
	  Estimated 10 and 13.2 Mflops @40MHz w/ improved compiler

Several compiler optimization switches were used when compiling these
benchmarks.  Note that these numbers have been available for some time.

>I see where Intel has gotten a lot of publicity from their claim of
>80 Mflops "peak" (single precision) and 60 Mflop "peak" (double
>precision).  Well, I've finally had a chance to look over the
>instruction set, and it is bloody obvious that it is going to take
>the compiler writers a LONG LONG TIME before they can get normal
>applications to compile down to code that will even approach 2/3
>of these "peak" figures.

You are correct in saying that this is a difficult compiler problem; however,
there are several companies that have been writing compilers for these
kinds of machines for a long time.

>One last question.  Considering all of the i860's (complicated)
>parallelism would *could* be productively used, I'd like to know
>in Intel (or anybody else) is building (or has built, or is planning
>to build) some software tool which can gobble up mediocre code and
>re-mangle it into execelent code.

Good compilers that include vectorizing and parallelizing transformations
and software pipelining can do this for source code.
If are talking about doing this for assembly code it is probably not worth
the effort -- much more improvement is possible by incorporating these
sorts of transformations into the compiler since more information is available.

Actually I think that the i860 is quite an achievement; it will perform
especially well on vectorizable scientific fortran.  The linpack number
is excellent, especially for a chip that costs $750.
-- 
Larry Meadows @ FPS			...!tektronix!fpssun!lfm
					...!nosun!fpssun!lfm

mash@mips.COM (John Mashey) (05/28/89)

In article <494@sns4.fpssun.fps.com> lfm@sns4.UUCP (Larry Meadows ) writes:

>From intel published material:
>Dhrystones:
>	Measured 69,000 Dhrystones (V1.1) @ 33.3 MHz 
>	  Green Hills C V1.8.5
>	  Estimated 90K (V1.1) and 85K(V2.1) @ 40MHz w/ improved compiler
>Whetstones:
>	Measured 20Kwhets (double) and 25.6 Kwhets (double) @ 33.3 Mhz
>	  Green Hills Fortran 1.8.5
>	  Estimated 25 and 32 Kwhets @ 40MHz w/ improved compiler
>Linpack 100x100 (Double precision):
>	6.1 Mflops (compiled), 11 (coded BLAS) @33.3MHZ
>	  Green Hills Fortran 1.8.5 and VAST 2.25N1 (vectorizer)
>	  Estimated 10 and 13.2 Mflops @40MHz w/ improved compiler
>
>Several compiler optimization switches were used when compiling these
>benchmarks.  Note that these numbers have been available for some time.

Note that Greenhills C compilers sometimes include a "Dhrystone" switch that
inlines strcpy [this was a long topic of discussion in comp.arch.]
Whether or not Intel used this is unclear.  [The March 89 i860 performance
document says it uses "-OLM -X405 -X370 -X393 -X422": if anybody knows
what that means, please post!}

However, it's an optimization 
that might be worth at most 1% on real programs, that happens, on this one
to boost your Dhrystone numbers by about 30%.  The author of the benchmark
states that you cannot make good interpretations of Dhrystone results
without seeing the output of the compiler.....  I.e., given that there is
an optimization that 1) most people would not turn on in normal code,
and that 2) boosts performance 30%, the number has lost all predictive power
in the absence of looking at the code.

Given that fact that Dhrystone numbers are now meaningless without the
generated code, Michael Slater of Microprocessor Report is collecting
the code from vendors and hopefully will publish his analysis of what's
going on.

re: Linpack: you may have missed that LINPACK was simulated (which is OK),
but used zero-wait state external memory (@ 40MHz).  Since LINPACK does not fit
in the on-chip caches (unlike the others) it remains to be seen what the
actual performance will be in buildable machines......

Anyway, the Whetstone numbers look pretty reasonable, and not too surprising;
it's hard to tell what the Dhrys and Linpacks really mean, yet.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086