rfg@riunite.ACA.MCC.COM (Ron Guilmette) (05/18/89)
Does anybody out there have any Dhrystone or Whetstone numbers for the i860? If not, why not? Intel... are you out there? I'm calling your bluff. I see where Intel has gotten a lot of publicity from their claim of 80 Mflops "peak" (single precision) and 60 Mflop "peak" (double precision). Well, I've finally had a chance to look over the instruction set, and it is bloody obvious that it is going to take the compiler writers a LONG LONG TIME before they can get normal applications to compile down to code that will even approach 2/3 of these "peak" figures. The problem of filling branch delay slots shouldn't be too hard (and this is an old and previously solved problem anyway), but to fully exploit the floating-point pipeline, the double-wide instruction mode, and the floating-point multiply and add/subtract instructions, not to mention taking account of all of the dozen or so weird one-cycle (in some cases multi-cycle) freeze conditions, presents a new and very difficult set of code optimization problems. Now considering the fact that Intel is claiming that they have (1) a C compiler, (2) a FORTRAN compiler, (3) an assembler, (4) a linker, (5) a simulator, (6) a (symbolic?) debugger, and (last but not least) UNIX, all available for this chip, it seems to me that somewhere along the way (you would think that) they would have run both Dhrystone and Whetstone benchmarks. Right? So why don't those numbers appear on the first page of the "data sheet" along side the (blatantly misleading) "peak" Mflop numbers? Who is Intel trying to kid? Does anybody out there who actually makes design-in decisions still (in this day and age) fall for this s**t of quoting MIPS and Mflops? If so, then it is a sad comment on the business as a whole, and on hardware engineers in particular. One last question. Considering all of the i860's (complicated) parallelism would *could* be productively used, I'd like to know in Intel (or anybody else) is building (or has built, or is planning to build) some software tool which can gobble up mediocre code and re-mangle it into execelent code. Motorola has a tool like this for the 80000 (all it has to do is to try to fill branch delay slots because the 88000 is both simpler and less "parallel"). Unfortunately, I have heard that early versions were fairly buggy. -- // Ron Guilmette - MCC - Experimental Systems Kit Project // 3500 West Balcones Center Drive, Austin, TX 78759 - (512)338-3740 // ARPA: rfg@mcc.com // UUCP: {rutgers,uunet,gatech,ames,pyramid}!cs.utexas.edu!pp!rfg
lfm@fpssun.fps.com (Larry Meadows ) (05/24/89)
In article <212@riunite.ACA.MCC.COM> rfg@riunite.UUCP (Ron Guilmette) writes: >Does anybody out there have any Dhrystone or Whetstone numbers for >the i860? If not, why not? From intel published material: Dhrystones: Measured 69,000 Dhrystones (V1.1) @ 33.3 MHz Green Hills C V1.8.5 Estimated 90K (V1.1) and 85K(V2.1) @ 40MHz w/ improved compiler Whetstones: Measured 20Kwhets (double) and 25.6 Kwhets (double) @ 33.3 Mhz Green Hills Fortran 1.8.5 Estimated 25 and 32 Kwhets @ 40MHz w/ improved compiler Linpack 100x100 (Double precision): 6.1 Mflops (compiled), 11 (coded BLAS) @33.3MHZ Green Hills Fortran 1.8.5 and VAST 2.25N1 (vectorizer) Estimated 10 and 13.2 Mflops @40MHz w/ improved compiler Several compiler optimization switches were used when compiling these benchmarks. Note that these numbers have been available for some time. >I see where Intel has gotten a lot of publicity from their claim of >80 Mflops "peak" (single precision) and 60 Mflop "peak" (double >precision). Well, I've finally had a chance to look over the >instruction set, and it is bloody obvious that it is going to take >the compiler writers a LONG LONG TIME before they can get normal >applications to compile down to code that will even approach 2/3 >of these "peak" figures. You are correct in saying that this is a difficult compiler problem; however, there are several companies that have been writing compilers for these kinds of machines for a long time. >One last question. Considering all of the i860's (complicated) >parallelism would *could* be productively used, I'd like to know >in Intel (or anybody else) is building (or has built, or is planning >to build) some software tool which can gobble up mediocre code and >re-mangle it into execelent code. Good compilers that include vectorizing and parallelizing transformations and software pipelining can do this for source code. If are talking about doing this for assembly code it is probably not worth the effort -- much more improvement is possible by incorporating these sorts of transformations into the compiler since more information is available. Actually I think that the i860 is quite an achievement; it will perform especially well on vectorizable scientific fortran. The linpack number is excellent, especially for a chip that costs $750. -- Larry Meadows @ FPS ...!tektronix!fpssun!lfm ...!nosun!fpssun!lfm
mash@mips.COM (John Mashey) (05/28/89)
In article <494@sns4.fpssun.fps.com> lfm@sns4.UUCP (Larry Meadows ) writes: >From intel published material: >Dhrystones: > Measured 69,000 Dhrystones (V1.1) @ 33.3 MHz > Green Hills C V1.8.5 > Estimated 90K (V1.1) and 85K(V2.1) @ 40MHz w/ improved compiler >Whetstones: > Measured 20Kwhets (double) and 25.6 Kwhets (double) @ 33.3 Mhz > Green Hills Fortran 1.8.5 > Estimated 25 and 32 Kwhets @ 40MHz w/ improved compiler >Linpack 100x100 (Double precision): > 6.1 Mflops (compiled), 11 (coded BLAS) @33.3MHZ > Green Hills Fortran 1.8.5 and VAST 2.25N1 (vectorizer) > Estimated 10 and 13.2 Mflops @40MHz w/ improved compiler > >Several compiler optimization switches were used when compiling these >benchmarks. Note that these numbers have been available for some time. Note that Greenhills C compilers sometimes include a "Dhrystone" switch that inlines strcpy [this was a long topic of discussion in comp.arch.] Whether or not Intel used this is unclear. [The March 89 i860 performance document says it uses "-OLM -X405 -X370 -X393 -X422": if anybody knows what that means, please post!} However, it's an optimization that might be worth at most 1% on real programs, that happens, on this one to boost your Dhrystone numbers by about 30%. The author of the benchmark states that you cannot make good interpretations of Dhrystone results without seeing the output of the compiler..... I.e., given that there is an optimization that 1) most people would not turn on in normal code, and that 2) boosts performance 30%, the number has lost all predictive power in the absence of looking at the code. Given that fact that Dhrystone numbers are now meaningless without the generated code, Michael Slater of Microprocessor Report is collecting the code from vendors and hopefully will publish his analysis of what's going on. re: Linpack: you may have missed that LINPACK was simulated (which is OK), but used zero-wait state external memory (@ 40MHz). Since LINPACK does not fit in the on-chip caches (unlike the others) it remains to be seen what the actual performance will be in buildable machines...... Anyway, the Whetstone numbers look pretty reasonable, and not too surprising; it's hard to tell what the Dhrys and Linpacks really mean, yet. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086