[comp.arch] MIPS/MFLOPS ratio

mark@akbar.megatek.uucp (mark thompson) (06/28/89)

Lately, it seems that the integer performance of popularly available
(especially RISC) computers seems to be outrunning the floating point
performance. A little paper design for a SPARC system using the latest
Cypress IU and FPC/FPU gets me a MIPS/MFLOPS ratio of about 10.

This seems a little out of whack... it seems that older scientific
processors had ratios in the 3-4 range.

Looking a published info on MIPS, and some hand waving gets me a ratio
of about 5-6, better but still slow (an aside: what are the MIPS guys 
doing to get the speeds up higher than the SPARC guys? compilers?)

Why is the floating point lagging integer performance so much? What is
being done to get this back in balance?

-mark
-- 
mark thompson						uunet!megatek!mark
  <Opinions expressed herein are not necessarilly those of Megatek Corp>
--

khb%chiba@Sun.COM (Keith Bierman - SPD Languages Marketing -- MTS) (06/28/89)

In article <596@megatek.UUCP> mark@megatek.UUCP () writes:

>This seems a little out of whack... it seems that older scientific
>processors had ratios in the 3-4 range.

Current SPARC implementations (chips and system) from Sun were
intended for "more general purpose use" hence the (relatively) narrow
gap between integer performance on a Cray to a 4/330. While floating
point is fun (and is typically my reason for existing on a project) I
spend most of my day doing compiles, editing, runing schedtool, and
other nonFP things. So using the 80-20 rule... the first machines
should be the ones we need 80% of the time.

>
>Looking a published info on MIPS, and some hand waving gets me a ratio
>of about 5-6, better but still slow (an aside: what are the MIPS guys 
>doing to get the speeds up higher than the SPARC guys? compilers?)

Compilers is often stated, but according to my weeks of staring at
huge volumes of data, it seems that the compiler differences are
minimal on large codes. The current sun compilers are somewhat less
clever about certain operations, but not enough to explain the
difference in performance.

What is interesting is that the benchmarks which SPARC does worst on
are highly FP and memory intensive (say 30-50% loads and stores).
MIPSco built their own FPU and tightly coupled it to their IU. This
resulted in early units which were superior to the SPARC
implementation philosophy (let's buy whatever is laying around and
glue it in -- in the first implementations that meant a weitek 1164
and 1165 and a controller ... "leftovers" from the sun3/fpa project).
At yesterday's IEEE HOT CHIPS conference, we were treated to three
papers about dedicated SPARC FPU's in addition to the papers focused
on FPU's BIT is already sampling ECL SPARC chips. So the FPU
integration/implementation variable is tilting towards SPARC (unless
one assumes that MIPSco is smarter than Ross, Fuji.,BIT, LSI, TI,
Solb., Prisma and all the others.

As for loads and stores current IMPLEMENTATIONS of SPARC use 2 and 3
cycle parts ... this is NOT part of the arch. but was a concession to
low cost system design. High performance SPARC systems (i.e. those
designed to use all the implementation tricks) are just now appearing
(not anything we have announced :<  but these "low performance" models
are actually quite snappy ... a 4/330GX makes a VERY nice personal
workstation).  

>
>Why is the floating point lagging integer performance so much? What is
>being done to get this back in balance?

Well, ld/sto is key ... linpack is about as kind as it gets, and that
is 1.5 memory references for every FLOP.

Second, chips designed from the ground up to be SPARC FPU's rather
than random bits of sand are just now available (weitek 3170 and 71
TMS390C602, LSI 64814 to name just 3).

PRIMSA delievered a system level paper about their 250 MIPS (native,
say 100vaxmips) 100Mflop SPARC machine. Not sampling just yet, but
probably 2nd qtr next year (my guess, from the presentation; no solid
info; last press release said a working machine in Jan ... but I
assume that they might want to test it before shipping it :>).

Keith H. Bierman      |*My thoughts are my own. Only my work belongs to Sun*
It's Not My Fault     |	Marketing Technical Specialist    ! kbierman@sun.com
I Voted for Bill &    |   Languages and Performance Tools. 
Opus  (* strange as it may seem, I do more engineering now     *)

henry@utzoo.uucp (Henry Spencer) (06/28/89)

In article <596@megatek.UUCP> mark@megatek.UUCP () writes:
>... (an aside: what are the MIPS guys 
>doing to get the speeds up higher than the SPARC guys? compilers?)

No -- system designers who really care about floating-point performance.
-- 
NASA is to spaceflight as the  |     Henry Spencer at U of Toronto Zoology
US government is to freedom.   | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

les@unicads.UUCP (Les Milash) (06/29/89)

In article <112807@sun.Eng.Sun.COM> khb@sun.UUCP (Keith Bierman - SPD Languages Marketing -- MTS) writes:
>PRIMSA [...] Not sampling just yet, but...

i'd sure like to get on the list for one of the early "samples".

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (06/29/89)

In article <596@megatek.UUCP> mark@megatek.UUCP () writes:
>performance. A little paper design for a SPARC system using the latest
>Cypress IU and FPC/FPU gets me a MIPS/MFLOPS ratio of about 10.
>
>This seems a little out of whack... it seems that older scientific
>processors had ratios in the 3-4 range.
>
>Looking a published info on MIPS, and some hand waving gets me a ratio
>of about 5-6, better but still slow (an aside: what are the MIPS guys 
>doing to get the speeds up higher than the SPARC guys? compilers?)
>
>Why is the floating point lagging integer performance so much? What is
>being done to get this back in balance?
>
Question is: Is a 3-4 MIPS/MFLOPS balanced?

To avoid the eternal "it depends on the application" suppose we agree 
that for example the SPEC Benchmark suite is a representative model of
our application.

Can anybody give some sort of (simplistic, I know) rules-of-thumb about 
MIPS/MFLOPS real estate ratios as a function of performance. Something like:
increasing MFLOPS performance x% would mean y% more real estate needed,
the corresponding real estate reduction for the IU (and rest) would probably 
mean z% less MIPS. 

By varying the MIPS/MFLOPS ratios (given a fixed amount of silicon) a
ratio best tuned to the Suite could be calculated using the agreed
upon weightings etc.  

Since we (one sidedly) agreed that the suite was a representative model of 
our application world this could be a quasi-objective determination of
what is a "balanced" processor.

Flames anyone?

Disclaimer: Opinions are really just onions and pi.
-- 
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (06/30/89)

In article <140@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:

>Question is: Is a 3-4 MIPS/MFLOPS balanced?

I personally like a balance of scalar MFLOPS = 1/3 of MIPS and vector MFLOPS =
3* MIPS.  The reasons are manifold, but I have found this to be a "cost 
effective" ratio on older ECL discrete/SSI/MSI vector mainframe systems.  

More recently, folks
in the RISC camp have been saying that this ratio is "obsolete", in the sense
that the "extra" scalar MIPS are almost free relative to the cost of providing
the extra MFLOPS, so a lower ratio is more appropriate.  It appears to me
that MIPSCO is doing about the best that can be done with the new R3xxx chips,
so I am *not* complaining.  But I still think that a vector instruction
set provides a cheap way to get the most out of the existing floating
point real estate, and can improve performance significantly, using the
*same* floating point units, over a machine with only a scalar instruction
set.  Usually a speedup of about 5 is possible in this case, so I would now
like to see MIPS = vector MFLOPS and scalar MFLOPS 1/5 of MIPS.  

For the purpose of comparing microprocessors, I am satisfied to define MIPS 
as the ratio of harmonic means of many integer benchmarks relative to their 
times on a VAX 11/780.  And vector MFLOPS as the time on the standard
LINPACK benchmark.  

Thanks to Weitek, MIPSCO, Fairchild/Intergraph, et al. for raising the 
standard of floating point performance in the micro world.


  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

prc@erbe.se (Robert Claeson) (06/30/89)

In article <596@megatek.UUCP> mark@megatek.UUCP () writes:

>Lately, it seems that the integer performance of popularly available
>(especially RISC) computers seems to be outrunning the floating point
>performance. A little paper design for a SPARC system using the latest
>Cypress IU and FPC/FPU gets me a MIPS/MFLOPS ratio of about 10.

>This seems a little out of whack... it seems that older scientific
>processors had ratios in the 3-4 range.

A 17 MIPS Motorola 88100 RISC CPU has a fp performance of about 12 MFLOPS.
That gives (at least) me a MIPS/MFLOPS ratio for that chip of only ~1.4.
-- 
          Robert Claeson      E-mail: rclaeson@erbe.se
	  ERBE DATA AB

rro@bizet.CS.ColoState.Edu (Rod Oldehoeft) (07/03/89)

In article <749@maxim.erbe.se> rclaeson@erbe.se (Robert Claeson) writes:
>
>A 17 MIPS Motorola 88100 RISC CPU has a fp performance of about 12 MFLOPS.
>That gives (at least) me a MIPS/MFLOPS ratio for that chip of only ~1.4.

I've usually heard MIPS/MFLOPS ratio discussed as the ratio between
the number of instructions (nonFP/FP) actually executed when one
runs an application program of interest.  This is a function of both
the architecture and compiler and is harder to measure than dividing
peak numbers.

Rod Oldehoeft                    rro@handel.CS.ColoState.EDU
Computer Science Department      303/491-5792
Colorado State University
Fort Collins, CO  80523

mash@mips.COM (John Mashey) (07/05/89)

In article <749@maxim.erbe.se> rclaeson@erbe.se (Robert Claeson) writes:
>In article <596@megatek.UUCP> mark@megatek.UUCP () writes:

>>Lately, it seems that the integer performance of popularly available
>>(especially RISC) computers seems to be outrunning the floating point
>>performance. A little paper design for a SPARC system using the latest
>>Cypress IU and FPC/FPU gets me a MIPS/MFLOPS ratio of about 10.

>>This seems a little out of whack... it seems that older scientific
>>processors had ratios in the 3-4 range.

>A 17 MIPS Motorola 88100 RISC CPU has a fp performance of about 12 MFLOPS.
>That gives (at least) me a MIPS/MFLOPS ratio for that chip of only ~1.4.

Whenever I've seen MIPS/MFLOPS ratios discussed, I don't ever remember
MFLOPS being peak-MFLOPS, but rather, LINPACK DP MFLOPS, usually FORTRAN,
I think.  Given the well-known fuzziness of mips-ratings, it's a little
harder.  If you take the currently-published numbers for a 20MHz
88K, they get 1.2MFLOPS FORTRAN DP LINPACK, and 2.2 Coded. If you use
17 MIPS, you're probably using Dhrystone-mips, which are usually 20-25%
higher (assuming no strcpy inlining) that what people see for
VAX-relative-versus-good-compilers-on-real-programs-mips.
To get a bound, assume the LINPACK numbers shown, and 13-17 mips.
This gives 13/2.2 = 6, and 17/1.2 = 14 as the limits of this.  If I had
to bet, as the compilers get better, I'd suspect a realistic number
might be 14/2 = 7.  All this is highly subject to cache configurations,
etc , and so one must be verycareful not to over-interpret such
numbers. 
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086