rpeglar@eta.ETA.COM (Rob Peglar) (11/05/87)
A recent posting read as follows: > > I have seen it many times in the Media. The 10-P is rated > at 375 Mflops, but does Linpack at 25. Why the big difference? > Do other supercomputers behave similarly? What does a Cray 2 > do on Linpack? > -- > Kevin Buchs 3500 Zycad Dr. Oakdale, MN 55109 (612)779-5548 > Zycad Corp. {rutgers,ihnp4,amdahl,umn-cs}!meccts!nis!zycad!kjb 1) The 10-P rating at 375 Mflops is PEAK rate. This means long (500 elements, for example) vectors of 32-bit precision running in a linked triad computation (e.g. A = B*C + D, all vectors). The quoted example of Linpack does not perform at peak rates. 2) Yes, other supercomputers behave similarly, in a loose sense. The ETA architecture is sufficiently different than, for example, a Cray X model. Their Linpack ratings will be slower than their peak ratings. 3) What does a Cray-2 do on Linpack? Dr. Dongarra's report dated October 30, 1986 (about a year old now) shows the Cray-2 running at 15 Mflops in one processor using CFT2.7, rolled BLAS. No listing for coded BLAS. *** It must be noted that the published rates on ETA10-P (and all other models of ETA procesors) are for 100x100 Linpack, 64-bit (full) precision, all Fortran. *** (The Cray-2 number above is the same calculation) There exist many articles in the literature which explain the various supercomputer architectures. Understanding Linpack results can only be achieved with study of same. BTW, the Cray X-MP single processor rates at 24 Mflops, using CFT 1.13, rolled BLAS. Remember, these numbers vary with compilers' astuteness. Using CFT 1.15, for example, may give better numbers. Try it out. The ETA10-P rating used the ETA Fortran 77 compiler 1.0. Rob Peglar ETA Systems, Inc. Disclaimer: The above is not an opinion of ETA Systems, Inc. It's the truth.
preston@titan.rice.edu (Preston Briggs) (10/18/89)
In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu writes: >Throw away ALL your copies of the LINPACK 100x100 benchmark if you >are interested in supercomputers. The 300x300 is barely big enough Danny Sorenson mentioned recently that linpack is sort of intended to show how *bad* a computer can be. The sizes are kept deliberately small so that the vector machines barely have a chance to get rolling. So, big if you're optimistic; small otherwise. Preston Briggs
kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (10/19/89)
In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu writes: >>Throw away ALL your copies of the LINPACK 100x100 benchmark if you >>are interested in supercomputers. The 300x300 is barely big enough >Danny Sorenson mentioned recently that linpack is sort of intended >to show how *bad* a computer can be. The sizes are kept >deliberately small so that the vector machines barely have a chance >to get rolling. It certainly is biased towards micros with limited memory and is absolutely irrelevant as a *supercomputer* application. Yes, it can show how bad a supercomputer can be. I dont believe, however, that it was *intended* to do that. My theory is that the size was set smallish because a 100x100 matrix wasnt considered so small back then, and the algorithm is suboptimal because that's what level-1 BLAS and the LINPACK library did back then. There were people who were implementing equivalents of level-2 BLAS and even the emerging LAPACK kernels, but those didnt get blessed by the Linpack benchmark until the 300x300 and finally the 1000x1000 were included. DONT compare supercomputers with the 100x100 linpack. If you must use linpack 100x100, use linpack 300x300! And then submit 37 copies of the program at once and see which machine does 37 copies the fastest, and which one allows you to use the keyboard while its running them!! Read JackWorlton's paper in SupercomputerReveiw of Dec.88 or Jan.89, and then if you still want to use linpack, at least you'll do it with better knowledge.
mccalpin@masig3.masig3.ocean.fsu.edu (John D. McCalpin) (10/19/89)
In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu writes: >Throw away ALL your copies of the LINPACK 100x100 benchmark if you >are interested in supercomputers. The 300x300 is barely big enough In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >Danny Sorenson mentioned recently that linpack is sort of intended >to show how *bad* a computer can be. The sizes are kept >deliberately small so that the vector machines barely have a chance >to get rolling. In article <9089@batcomputer.tn.cornell.edu> kahn@batcomputer.tn.cornell.edu (Shahin Kahn) writes: >It certainly is biased towards micros with limited memory and is >absolutely irrelevant as a *supercomputer* application. Yes, it >can show how bad a supercomputer can be. Well, I'll through in my $0.02 of disagreement with this thread. It has been my experience that the poor performance of the LINPACK 100x100 test on supercomputers is *entirely typical* of what users actually run on the things. There a plenty of applications burning up Cray, Cyber 205, and ETA-10 cycles which have average vector lengths *shorter* than the average of 66 elements for the LINPACK test, and which are furthermore loaded down with scalar code. The 100x100 test case is not representative of *everyone's* jobs, but it is not an unreasonable "average" case, either. I think it is much more representative of what most users will see than the 1000x1000 case, for example. The 1000x1000 case is a very good indicator of what the *best* performance will be with *careful optimization* on codes that are essentially 100% vectorizable. Most of the supercomputer workload does not fall into that category.... -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu mccalpin@scri1.scri.fsu.edu mccalpin@delocn.udel.edu
mcdonald@uxe.cso.uiuc.edu (10/20/89)
>Well, I'll through in my $0.02 of disagreement with this thread. It >has been my experience that the poor performance of the LINPACK >100x100 test on supercomputers is *entirely typical* of what users >actually run on the things. There a plenty of applications burning up >Cray, Cyber 205, and ETA-10 cycles which have average vector lengths >*shorter* than the average of 66 elements for the LINPACK test, and >which are furthermore loaded down with scalar code. >The 100x100 test case is not representative of *everyone's* jobs, but >it is not an unreasonable "average" case, either. I think it is much >more representative of what most users will see than the 1000x1000 >case, for example. The 1000x1000 case is a very good indicator of I heartily agree with that: I have two projects that I tried on a big supercomputer: one diagonalized zillions of 20x20 matrices, the other wants to diagonalize a single 1000000x1000000 matrix. Neither was suitable for a Cray! Doug McDonald
swarren@eugene.uucp (Steve Warren) (10/20/89)
In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu writes: > >>Throw away ALL your copies of the LINPACK 100x100 benchmark if you >>are interested in supercomputers. The 300x300 is barely big enough > >Danny Sorenson mentioned recently that linpack is sort of intended >to show how *bad* a computer can be. The sizes are kept >deliberately small so that the vector machines barely have a chance >to get rolling. > >So, big if you're optimistic; small otherwise. > >Preston Briggs That is sort of like testing a dump-truck on a slalom course. A vector machine should have balanced performance on scalar code, but you are buying vector performance primarily. So, big if you want to see if it can do what you are paying for, small if you want to see if it can do anything else. --Steve ------------------------------------------------------------------------- {uunet,sun}!convex!swarren; swarren@convex.COM
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (10/21/89)
In article <MCCALPIN.89Oct19090641@masig3.masig3.ocean.fsu.edu> mccalpin@masig3.masig3.ocean.fsu.edu (John D. McCalpin) writes: >In article <9079@batcomputer.tn.cornell.edu> kahn@tcgould.tn.cornell.edu >writes: >>Throw away ALL your copies of the LINPACK 100x100 benchmark if you >>are interested in supercomputers. The 300x300 is barely big enough > >In article <2203@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) >writes: >>Danny Sorenson mentioned recently that linpack is sort of intended >>to show how *bad* a computer can be. The sizes are kept >>deliberately small so that the vector machines barely have a chance >>to get rolling. > >In article <9089@batcomputer.tn.cornell.edu> kahn@batcomputer.tn.cornell.edu >(Shahin Kahn) writes: >>It certainly is biased towards micros with limited memory and is >>absolutely irrelevant as a *supercomputer* application. Yes, it >>can show how bad a supercomputer can be. I found this particularly amusing. As a longtime defender of Linpack, I have often been accused of being biased towards big vector machines, because of the sensitivity of Linpack to memory and FPU bandwidth, and, particularly, the ability to stream from memory to FPU and back to memory. Now, this happens to be a very important property of a CPU to effectively run many codes which I have seen over the years. I never rate machines on the basis of Linpackin absolute terms, but you can tell a lot about a machine with low Linpack numbers. I never could understand why people bought 11/780's, for example :-) >Well, I'll through in my $0.02 of disagreement with this thread. It >has been my experience that the poor performance of the LINPACK >100x100 test on supercomputers is *entirely typical* of what users >actually run on the things. I agree that vector startup time is extremely important, and Linpack is a fairly "nice" program with respect to average vector length, so if vector startup time is so long as to slow it down significantly, this is significant to users. On the other hand, the performance is not so poor as it once was. See below. > There a plenty of applications burning up >Cray, Cyber 205, and ETA-10 cycles which have average vector lengths >*shorter* than the average of 66 elements for the LINPACK test, and >which are furthermore loaded down with scalar code. I note, at this point, that the ~7 ns (~142 MHz) ETA10G achieved the fastest single processor Linpack score of 93 MFLOPS, or, .65 FLOPs/cycle. The Cyber 205, using earlier compilers, achieved only 17 MFLOPS, on a 20 ns clock, or, .34 FLOPs/cycle. The Cray Y-MP gets .50 FLOPs/cycle, while the Cray 1/S (in 1983) got only .15 FLOPs/cycle. The same Cray 1/S today gets .34 FLOPs/cycle. (It has less memory bandwidth than the Cray X-MP and Y-MP, so you can see this effect clearly.) The Cray XYs and ETA machines are capable of achieving around 2 FLOPs/cycle in hardware. My point is that there has been considerable improvement in both hardware and software and startup time penalties have been correspondingly reduced. What is the relevance of Linpack today? Well, it still has *some* of the same significance that it always had, but tells less than it used to. When caches were small, you could extrapolate the 100x100 results to bigger jobs without worrying. On the big iron, your performance went *up* with larger problem sizes, so even if 300x300x300 was typical of your problem, you knew what to expect. Now, with 100x100 fitting in some small caches, you need to run a bigger job to make sure performance doesn't go *down* dramatically. (Which it does on some micro based systems, of course.) On the other hand, if you switch to 300x300, you lose the information contained in the 100x100 case wrt startup time. So, good numbers tell you even less than they did before, but bad numbers, in a sense, tell you even more, for the same reason. I wouldn't buy a machine with a bad Linpack result to do these kinds of problems, but I would look hard at the set of machines with good results, and would look further, to see which one was the best for the job at hand. Sometimes I use a "grep" benchmark just for fun. The Cray Y-MP still greps faster than any other machine I have tested, but, I agree, it isn't the world's most cost effective grepper out there :-) As with all benchmarks, you have to be careful not to fool yourself... I would guess that an amd29000 based system might be the fastest on that particular test. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117