[comp.arch] i860 MFLOP rate?

rmbult01@ulkyvx.BITNET (Robert M. Bultman) (10/30/90)

The i860 has been quoted as having a peak MFLOP rate of ~150
MFLOPS.  (This is more of a guess.)  People writing in comp.arch
have suggested that this is somewhat optimistic.  Without causing
arguments about what constitutes an "average" or "typical" load
on a processor, I would like to ask the question, "What is the
average or typical MFLOP rate of the i860?"  Please include the
aproximate cost of the system in which the i860 is used (not the
chip itself).   All answers/comments are welcome.  Please e-mail
them to me.  (Data for other computers/processors are also
welcome, including CRAYx, CDC's, 680x0, 80x86, SPARC, MIPS, etc.)

Thanks in advance
Robert Bultman
Speed Scientific School
University of Louisville

rstewart@megatek.UUCP (Rich Stewart) (10/31/90)

In article <9010291839.AA07178@lilac.berkeley.edu> rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes:
>The i860 has been quoted as having a peak MFLOP rate of ~150
>MFLOPS.  (This is more of a guess.)  People writing in comp.arch
>have suggested that this is somewhat optimistic.  Without causing
>arguments about what constitutes an "average" or "typical" load
>on a processor, I would like to ask the question, "What is the

for a 40 meg. part:

The peak is 80 MFLOP.
If you are using a compiler to generate your code, expect 3 clocks/ floating
point instruction, so 13 MFLOP

If you are writing assembly, it all depends on the problem you
are trying to solve.  The chip does real well with matrix ops, you may
get close to 80 MFLOP. But there are common situations where you may not 
exceed 13 MFLOP.

-Rich

rstewart@megatek.uucp

My opinions are just that.

chased@rbbb.Eng.Sun.COM (David Chase) (10/31/90)

rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes:
>The i860 has been quoted as having a peak MFLOP rate of ~150
>MFLOPS.  (This is more of a guess.)  People writing in comp.arch
>have suggested that this is somewhat optimistic.

Note 1:  that is probably a peak "MOP" rate, not MFLOP rate.  The peak
figures are typcally 3 * OP * clock -- i.e., a 50 Mhz chip can go no
faster than 150 Million Operation Per Second.  In this situation, the
chip is performing the following mix of instructions per cycle:

1) 1 single-precision add
2) 1 single-precision multiply
3) 1 integer unit instruction (includes floating point fetches and
   stores)

In practice, all that people care about is FLOPS -- if you can issue
enough floating point loads and stores to keep the FPU happy, then
that's all that really matters.

Note 2: In theory, you can probably get pretty close to this, but in
the near term hand-coding is a must, and the going is very very slow.
Preston Briggs has opinions on this matter -- I quit worrying about
the problem some time ago.

The hard part about compilation is that your optimizer must realize
that the stages in the floating point pipeline are really registers,
and that it ought to use cached loads for certain operands, and
pipelined loads for other operands.  Since compilers typically don't
do this, you're stuck with (extrememly tedious) hand optimizations and
a lot of gray hair.  Fielding traps on this chip is also not good for
your mental health.

David Chase
Sun Microsystems

alan@uf.msc.umn.edu (Alan Klietz) (10/31/90)

rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes:
<The i860 has been quoted as having a peak MFLOP rate of ~150
<MFLOPS.  (This is more of a guess.)  People writing in comp.arch
<have suggested that this is somewhat optimistic.

You can get 60 Mflops, if

	The pipeline instrutions are used.
	All data is on-chip (cache, registers).
	The loop consists of exactly 2 FP adds and 1 FP multiply.
	The loop is unrolled twice.
	All outputs are fed back into the pipeline.
	No more than one input comes from cache.

This is the quoted rate "guaranteed not to exceed".

--
Alan E. Klietz
Minnesota Supercomputer Center, Inc.
1200 Washington Avenue South
Minneapolis, MN  55415
Ph: +1 612 626 1737	       Internet: alan@msc.edu

apfiffer@admin.cse.ogi.edu (Andy Pfiffer) (10/31/90)

>In article <9010291839.AA07178@lilac.berkeley.edu> rmbult01@ulkyvx.BITNET (Robert M. Bultman) writes:
>The i860 has been quoted as having a peak MFLOP rate of ~150
>MFLOPS.  (This is more of a guess.)  People writing in comp.arch
>have suggested that this is somewhat optimistic.

Guaranteed Not To Exceed MFLOPS on i860's are in the neighborhood of
two times the clock rate.  That is based on performing an FP add and an
FP multiply concurrently.

Hand-tuned assembly can approach this, provided you understand the details
of a given platform's memory system (page-mode, external pipeline depth, etc.).

Good compilers are now available from the Portland Group. Their compiler
loves long, tight loops that pipeline well and I've seen firsthand just
over 20 SP MFLOPS @ 33MHz on one loop in sample dusty-deck Fortran from
a customer; but that is not typical (4 to 12 is more often observed).

Compiler playthings and brain-damaged, orphaned development systems are
available from Intel (please don't get me started on a Star860 tirade...).

The i860 exception handler didn't cause premature grey *on* my head, but
I did notice the hair turning grey as it fell *off* my head.

--
Andy Pfiffer			apfiffer@admin.ogi.edu
Home: (503) 645-1886
"Work:" (503) 590-1450