[comp.sys.super] How Fast Are Supercomputers?

achoi@cory.Berkeley.EDU (Andrew Choi) (12/20/90)

Hi.  First of all, my apology for cross-posting;  I really don't
know which newsgroup is appropriate for a question like this.

Anyway, I've been hearing all kinds of numbers coming out from all
kinds of people (the numbers differ in magnitude by 100 times).  I
am just wondering does anyone know for sure.

Also, it will be nice if you can tell me the speed at which the
FASTEST supercomputer is running at, and an average supercomputer.

BTW, I am interested in the number of floating point operations per
second.  Thanks a lot.

Name:  Andrew Choi	Internet Address:  achoi@cory.berkeley.edu
Tel:   (415)848-5658
#include <standard/disclaimer.h>

patrick@convex.COM (Patrick F. McGehearty) (12/20/90)

How fast are supercomputers and what is an average number.

That question has many answers and much dispute is made over truth.

The number most salespeople quote is "peak rates" which are only
obtained if all processors/pipes/and computation units are busy
at the same time.  Most supercomputers approach these numbers when
used on larger problems using tuned assembly language, such as Linpack
1000x1000.  However, on smaller problems the time to get all the computing
units started can dominate the computation.  For code that does not
vectorize or parallelize, the numbers get worse.

For example, consider the Linpack benchmarks.  I will use Convex
numbers, but the same pattern may be found in any of the parallel/vector
machines  (the following numbers are approximate as I do not have
the latest references right in front of me):

Convex			Peak	 Linpack1000x1000   Linpack100x100
C210 (1 processor):    50 Mflops    45 Mflops	        17 Mflops
C240 (4 processors):  200 Mflops   165 Mflops	        28 Mflops

Why are the Linpack1000x1000 numbers so different from the Linpack100x100
numbers?  Both use vectors, but the 100x100 number uses an algorithm
which is illsuited to execution on a parallel-vector processor.  The
average vector length for each processor can be much less than 25 in the
most obvious division of work among the processors.  The 1000x1000 linpack
has both longer vectors and an algorithm more suited for execution
on a parallel vector processor.  There are other issues as well, but the
point is the Mflop rating depends on the application and the compiler
technology.  Other benchmarks can be selected to give higher or lower Mflop
ratings, depending on how well the code is suited for the system
architecture.

Beware of round numbers, they are likely to be Peak rates :-)
Also, beware if the n processor system is rated to be n times
as fast as the one processor system.  That is either a peak rate, or
it is likely have been obtained by measuring a single processor
and multiplying by n, which is almost always incorrect.

So, to answer the question of how fast is a Convex C240, I could
say 200 Mflops, 165 Mflops, or 28 Mflops and be correct in all three
cases.  Most vendor advertisements will pick whichever benchmark that
they do the best on relative to the competition.  If you want good
info, contact each vendor, and ask them about themselves and their
competition.  Filter what you get appropriately.

As far as the FASTEST, that also depends on whether you mean:
(1) Announced (many new announcements are years before delivery)
	always a peak rate :-)
(2) Delivered, peak rate comparison
(3) Hand coded assembly language results based on unrealistic data sets
	and problems.
(4) Hand coded assembly language results based on real application code.
(5) Real application code as generated by normal language compiler.

Problems with each of the above:
(1) is almost pure vaporware, except some govt procurements require
	years to process and require all components of the bid to be
	announced products.  But these should not be considered as
	comparable.  Delivered hardware is doubling in performance
	every 2 years (exact rate is debatable, but you get the idea)
(2) Peak rate is never obtained
(3) Over optimistic
(4) Can require substantial code tuning, not representative of much
	day to day stuff, but can represent some specialized applications.
(5) Represents much day to day computation, but can yield significantly
	lower values than (4) if compiler technology is inadequate to
	handle architectural innovations.

I don't know who can claim the highest number in each of the above
categories, but I am sure it is greater than 1Gflop for all categories.
For (1), I have heard of Japanese supers in the 4-5 Gflop range for
delivery in a year or two.  I have also heard of the Teraflop project
where Darpa wants to receive 1,000,000,000,000 Flops in the mid-1990s.

Personally, I prefer to have (4) and (5) with measures of
Peak rates,
Linpack (both kinds),
Livermore Loops (data for each loop),
and time to execute a tuned 1kx1k complex real*4 fft.

Given that much info, I can form a good idea of what a supercomputer
is good for.  Oh, price tag is interesting too, but such to frequent
change :-)