achoi@cory.Berkeley.EDU (Andrew Choi) (12/20/90)
Hi. First of all, my apology for cross-posting; I really don't know which newsgroup is appropriate for a question like this. Anyway, I've been hearing all kinds of numbers coming out from all kinds of people (the numbers differ in magnitude by 100 times). I am just wondering does anyone know for sure. Also, it will be nice if you can tell me the speed at which the FASTEST supercomputer is running at, and an average supercomputer. BTW, I am interested in the number of floating point operations per second. Thanks a lot. Name: Andrew Choi Internet Address: achoi@cory.berkeley.edu Tel: (415)848-5658 #include <standard/disclaimer.h>
patrick@convex.COM (Patrick F. McGehearty) (12/20/90)
How fast are supercomputers and what is an average number. That question has many answers and much dispute is made over truth. The number most salespeople quote is "peak rates" which are only obtained if all processors/pipes/and computation units are busy at the same time. Most supercomputers approach these numbers when used on larger problems using tuned assembly language, such as Linpack 1000x1000. However, on smaller problems the time to get all the computing units started can dominate the computation. For code that does not vectorize or parallelize, the numbers get worse. For example, consider the Linpack benchmarks. I will use Convex numbers, but the same pattern may be found in any of the parallel/vector machines (the following numbers are approximate as I do not have the latest references right in front of me): Convex Peak Linpack1000x1000 Linpack100x100 C210 (1 processor): 50 Mflops 45 Mflops 17 Mflops C240 (4 processors): 200 Mflops 165 Mflops 28 Mflops Why are the Linpack1000x1000 numbers so different from the Linpack100x100 numbers? Both use vectors, but the 100x100 number uses an algorithm which is illsuited to execution on a parallel-vector processor. The average vector length for each processor can be much less than 25 in the most obvious division of work among the processors. The 1000x1000 linpack has both longer vectors and an algorithm more suited for execution on a parallel vector processor. There are other issues as well, but the point is the Mflop rating depends on the application and the compiler technology. Other benchmarks can be selected to give higher or lower Mflop ratings, depending on how well the code is suited for the system architecture. Beware of round numbers, they are likely to be Peak rates :-) Also, beware if the n processor system is rated to be n times as fast as the one processor system. That is either a peak rate, or it is likely have been obtained by measuring a single processor and multiplying by n, which is almost always incorrect. So, to answer the question of how fast is a Convex C240, I could say 200 Mflops, 165 Mflops, or 28 Mflops and be correct in all three cases. Most vendor advertisements will pick whichever benchmark that they do the best on relative to the competition. If you want good info, contact each vendor, and ask them about themselves and their competition. Filter what you get appropriately. As far as the FASTEST, that also depends on whether you mean: (1) Announced (many new announcements are years before delivery) always a peak rate :-) (2) Delivered, peak rate comparison (3) Hand coded assembly language results based on unrealistic data sets and problems. (4) Hand coded assembly language results based on real application code. (5) Real application code as generated by normal language compiler. Problems with each of the above: (1) is almost pure vaporware, except some govt procurements require years to process and require all components of the bid to be announced products. But these should not be considered as comparable. Delivered hardware is doubling in performance every 2 years (exact rate is debatable, but you get the idea) (2) Peak rate is never obtained (3) Over optimistic (4) Can require substantial code tuning, not representative of much day to day stuff, but can represent some specialized applications. (5) Represents much day to day computation, but can yield significantly lower values than (4) if compiler technology is inadequate to handle architectural innovations. I don't know who can claim the highest number in each of the above categories, but I am sure it is greater than 1Gflop for all categories. For (1), I have heard of Japanese supers in the 4-5 Gflop range for delivery in a year or two. I have also heard of the Teraflop project where Darpa wants to receive 1,000,000,000,000 Flops in the mid-1990s. Personally, I prefer to have (4) and (5) with measures of Peak rates, Linpack (both kinds), Livermore Loops (data for each loop), and time to execute a tuned 1kx1k complex real*4 fft. Given that much info, I can form a good idea of what a supercomputer is good for. Oh, price tag is interesting too, but such to frequent change :-)