rmbult01@ulkyvx.BITNET (Robert M. Bultman) (10/30/90)
The i860 has been quoted as having a peak MFLOP rate of ~150 MFLOPS. (This is more of a guess.) People writing in comp.arch have suggested that this is somewhat optimistic. Without causing arguments about what constitutes an "average" or "typical" load on a processor, I would like to ask the question, "What is the average or typical MFLOP rate of the i860?" Please include the aproximate cost of the system in which the i860 is used (not the chip itself). All answers/comments are welcome. Please e-mail them to me. (Data for other computers/processors are also welcome, including CRAYx, CDC's, 680x0, 80x86, SPARC, MIPS, etc.) Thanks in advance Robert Bultman Speed Scientific School University of Louisville
rstewart@megatek.UUCP (Rich Stewart) (10/31/90)
In article <9010291839.AA07178@lilac.berkeley.edu> rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes: >The i860 has been quoted as having a peak MFLOP rate of ~150 >MFLOPS. (This is more of a guess.) People writing in comp.arch >have suggested that this is somewhat optimistic. Without causing >arguments about what constitutes an "average" or "typical" load >on a processor, I would like to ask the question, "What is the for a 40 meg. part: The peak is 80 MFLOP. If you are using a compiler to generate your code, expect 3 clocks/ floating point instruction, so 13 MFLOP If you are writing assembly, it all depends on the problem you are trying to solve. The chip does real well with matrix ops, you may get close to 80 MFLOP. But there are common situations where you may not exceed 13 MFLOP. -Rich rstewart@megatek.uucp My opinions are just that.
chased@rbbb.Eng.Sun.COM (David Chase) (10/31/90)
rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes: >The i860 has been quoted as having a peak MFLOP rate of ~150 >MFLOPS. (This is more of a guess.) People writing in comp.arch >have suggested that this is somewhat optimistic. Note 1: that is probably a peak "MOP" rate, not MFLOP rate. The peak figures are typcally 3 * OP * clock -- i.e., a 50 Mhz chip can go no faster than 150 Million Operation Per Second. In this situation, the chip is performing the following mix of instructions per cycle: 1) 1 single-precision add 2) 1 single-precision multiply 3) 1 integer unit instruction (includes floating point fetches and stores) In practice, all that people care about is FLOPS -- if you can issue enough floating point loads and stores to keep the FPU happy, then that's all that really matters. Note 2: In theory, you can probably get pretty close to this, but in the near term hand-coding is a must, and the going is very very slow. Preston Briggs has opinions on this matter -- I quit worrying about the problem some time ago. The hard part about compilation is that your optimizer must realize that the stages in the floating point pipeline are really registers, and that it ought to use cached loads for certain operands, and pipelined loads for other operands. Since compilers typically don't do this, you're stuck with (extrememly tedious) hand optimizations and a lot of gray hair. Fielding traps on this chip is also not good for your mental health. David Chase Sun Microsystems
alan@uf.msc.umn.edu (Alan Klietz) (10/31/90)
rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes:
<The i860 has been quoted as having a peak MFLOP rate of ~150
<MFLOPS. (This is more of a guess.) People writing in comp.arch
<have suggested that this is somewhat optimistic.
You can get 60 Mflops, if
The pipeline instrutions are used.
All data is on-chip (cache, registers).
The loop consists of exactly 2 FP adds and 1 FP multiply.
The loop is unrolled twice.
All outputs are fed back into the pipeline.
No more than one input comes from cache.
This is the quoted rate "guaranteed not to exceed".
--
Alan E. Klietz
Minnesota Supercomputer Center, Inc.
1200 Washington Avenue South
Minneapolis, MN 55415
Ph: +1 612 626 1737 Internet: alan@msc.edu
apfiffer@admin.cse.ogi.edu (Andy Pfiffer) (10/31/90)
>In article <9010291839.AA07178@lilac.berkeley.edu> rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes: >The i860 has been quoted as having a peak MFLOP rate of ~150 >MFLOPS. (This is more of a guess.) People writing in comp.arch >have suggested that this is somewhat optimistic. Guaranteed Not To Exceed MFLOPS on i860's are in the neighborhood of two times the clock rate. That is based on performing an FP add and an FP multiply concurrently. Hand-tuned assembly can approach this, provided you understand the details of a given platform's memory system (page-mode, external pipeline depth, etc.). Good compilers are now available from the Portland Group. Their compiler loves long, tight loops that pipeline well and I've seen firsthand just over 20 SP MFLOPS @ 33MHz on one loop in sample dusty-deck Fortran from a customer; but that is not typical (4 to 12 is more often observed). Compiler playthings and brain-damaged, orphaned development systems are available from Intel (please don't get me started on a Star860 tirade...). The i860 exception handler didn't cause premature grey *on* my head, but I did notice the hair turning grey as it fell *off* my head. -- Andy Pfiffer apfiffer@admin.ogi.edu Home: (503) 645-1886 "Work:" (503) 590-1450