mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (11/21/90)
I collecting the performance results on this code of mine for several years, and since there is now a comp.benchmarks group, I offer them for your edification. The model is somewhat interesting from the benchmarking point of view because the operations are about 70% vectorizable and 30% non-vectorizable (in a recursive routine dominated by divides). The profiles of the runs (not shown here) show significant variability --- most of the scalar machines are limited by the main part of the code, while most of the vector and parallel machine are limited by the scalar bottleneck. This is one of the few cases I have seen that shows the IBM 3090/VF in a good light relative to the Crays and relative to the IBM 3090 in scalar mode. ============================================================================= November 20, 1990 Quasigeostrophic Ocean Model Benchmark John D. McCalpin ============================================================================= The following timings are for the compilation and execution of a floating- point-intensive fortran 77 program. The program is typical of test runs of numerical models used in meteorology/oceanography/geophysical fluid dynamics. The model executed integrates a time-dependent equation for 60 time steps. At each time step, an elliptic equation must be solved on the 101x41 finite difference grid. The Fishpak routine HWSCRT is used for this purpose. Therefore, these benchmarks test the floating-point hardware and the array access/manipulation efficiency of the hardware and software. ------------------------------------------------------------------------------- machine execute ratio to notes compiler model VAX 780 (VMS) (sec) compile execute ------------------------------------------------------------------------------- Cray Y/MP (UNICOS, CFT77) 1.9 * 4.76 63.15 9 Cray X/MP (UNICOS, CFT77) 1.9 * 5.02 65.16 9 Cray 2 (CFT77) 2.6 * 2.54 46.40 9 Cray X/MP (8.5 ns, CFT 1.15) 2.3 * 24.25 52.80 8 Cray 1S (CFT 1.14) 3.3 * 20.78 37.58 8 ETA-10G (7 ns) 3.7 * 3.05 33.11 ETA-10E (10.5 ns, UNIX) 5.7 * 2.28 21.64 Cyber 205 (FTN200/670) 5.6 * 7.73 21.88 7 Cyber 760 14.0 * 10.08 8.75 Cyber 850 30.2 * - 4.06 Cyber 835 79.9 * 3.46 1.53 Cyber 730 128.3 * 2.31 0.95 IBM 3090/VF (vector) 3.6 * 14.38 34.31 12 IBM 3090 (scalar) 5.0 * 19.59 24.65 12 IBM RS/6000 Model 530 4.8 * -- 25.52 IBM RS/6000 Model 320 6.3 * -- 19.44 IBM RS/6000 Model 320 8.7 1.04 14.08 Convex C240B 5.4 * 1.36 22.69 10 Convex C220B 5.9 * 1.36 20.76 10 Convex C210B 6.5 * 1.36 18.85 10 Convex C220 7.9 * 1.36 15.51 10,11 Convex C210 7.5 * 1.36 16.33 10,11 Convex C120 16.4 ? 0.54 7.47 10 Alliant FX/8 22.7 ? 0.31 5.40 2,7 Alliant FX/1 33.2 ? 0.29 3.69 9 SGI IRIS 4D/220 (1 cpu) 8.8 1.57 13.92 9 SGI IRIS 4D/120 (1 cpu) 16.2 0.88 7.57 SGI Personal IRIS 22.7 0.67 5.40 SGI 4D-60 Turbo 25.2 0.78 4.86 2 SGI 3030 (w/Weitek) 151.4 4.08 0.81 DECstation 3100 15.5 - 7.90 VAX 8700 VMS 4.6 28.0 6.36 4.37 VAX 6210 VMS 49.4 2.68 2.48 VAX 11/780 VMS 4.2 122.5 1.00 1.00 VAX 11/780 Ultrix 208.1 0.25 0.59 VAX 11/750 VMS 4.1 495.5 0.63 0.25 Micro VAX II VMS 195.4 0.97 0.63 Micro VAX II Ultrix 356.1 0.25 0.34 VAXstation 2000 VMS 197.8 0.96 0.62 Apollo DN 10000 10.9 * 1.30 11.24 HP 835 SRX turbo 19.6 0.79 6.25 NeXT (Sun f77) 158.4 N/A 0.77 13 ------------------------------------------------------------------------------- machine execute ratio to notes compiler model VAX 780 (VMS) (sec) compile execute ------------------------------------------------------------------------------- SUN 4/260 37.0 0.72 3.31 2 SUN 3/260 (Weitek) 55.3 0.49 2.19 2 SUN 3/260 (68881) 275.2 " 0.45 SUN 3/160 (Weitek) 80.9 0.26 1.51 2 SUN 3/160 (68881) 324.0 0.28 0.38 2 SUN 3/50 (no fpa) 1692.3 0.17 0.07 4 Ridge 32 81.6 0.92 1.50 Masscomp 5400 (68881) 420.3 1.20 0.29 Masscomp 5400 (Weitek) 135. " 0.91 Compaq 386/20 (Weitek) 59.6 2.51 2.05 IBM PC/AT (80287) 3640.0 0.10 0.03 5 -------------------------------------------------------------------------- COMMENTS: The code executed consists of a user program which is essentially 100% vectorizable, and a call to the hwscrt library routine. About 30% of the operations are in the library routine TRIX, which is not vectorizable, and which uses about 75% of the total time on vector machines. Parallelization can only help on the 15% of the cpu time spent in vector code. The last 10% of the time is spent in formatted I/O. NOTES: (1) All codes run were computationally identical. Some changes were required for I/O compatibility. Code lengths (with comments): model.f 1129 lines ; hwscrt.f 2076 lines. Timings marked with '*' employed 64-bit arithmetic. Timings marked with '?' might have used 64-bit arithmetic. Unmarked timings employed 32-bit arithmetic. (2) Timings in parentheses are for the minimum settings of the compiler optimizer. All other timings are with maximum optimization/vectorization/parallelization, where appropriate. (3) Where possible (i.e., on the workstations), jobs were run alone on the machine. The performance on the UNIX machines tended to degrade slightly as the load increased. Typically the timings would increase by 5% for each additional CPU-intensive job in the system. (4) The SUN 3/50 is an early version with a 12MHz clock. Current models run at 15MHz. (5) The IBM PC/AT used Ryan-McFarlane Professional Fortran. Other PC compilers failed to generate excutable code. (6) The ratios shown are for compile time (including link), and execution time relative to the total time on the VAX 11/780 (VMS). On the Cyber 760, link time is included in the execution time. (7) The tests on the CRAY's, Cyber 205, ETA-10, and Alliant were run with the vectorizers on, but the program spends ~75% of its time in an unvectorized subroutine in the hwscrt package. A vectorized solver has been produced for the ETA-10 which is about 10 times faster than the scalar code. Similar improvements could be made on each of these machines. (8) These Cray results were provided by Mohan Ramamurthy, and utilized the CRAY machines at the National Center for Atmospheric Research. (9) These results were provided by Glenn Randers-Pehrson at the Ballistic Research Lab (May 9, 1989). He also provided results for the SGI IRIS 2500T, which matched the Silicon Graphics 3030 results shown. (10) Convex results provided by Howard Page of Convex on May 22, 1989. The B series machines have faster scalar divide and square root hardware. (11) I have no immediate explanation for this reversal in orderings. (12) These results were provided by Claudia Stelz at the University of Delaware. (13) Code was compiled and linked on a Sun 3, using the Sun f77 compiler. Benchmarks executed and compiled by: John D. McCalpin Graduate College of Marine Studies The University of Delaware Robinson Hall Newark, DE 19716 mccalpin@perelandra.cms.udel.edu (Internet) DELOCN::MCCALPIN (SPAN) J.MCCALPIN/OMNET (Telemail) (302) 292-3686 (voice) (302) 451-6838 (fax) -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@vax1.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (11/21/90)
> On 20 Nov 90 18:02:11 GMT, mccalpin@perelandra.cms.udel.edu I said:
John> I collecting the performance results on this code of mine for several
^^^^^^^^^^^^
John> years, and since there is now a comp.benchmarks group, I offer them
John> for your edification.
Sorry for the incompetence....
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@vax1.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
crispin@csd.uwo.ca (Crispin Cowan) (11/21/90)
In article <MCCALPIN.90Nov20130211@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: [some very interesting data] >Compaq 386/20 (Weitek) 59.6 2.51 2.05 >IBM PC/AT (80287) 3640.0 0.10 0.03 5 The difference between these numbers suggests that the 287 on the AT was not used. Is the Weitek chip really 60 times faster than a 287? Ten, I'd believe, but 60 looks more like the AT was doing it's FP calculations in software. Crispin ----- Crispin Cowan, CS grad student, University of Western Ontario Work: MC28-C, x3342 crispin@csd.uwo.ca 890 Elias St., London, Ontario, N5W 3P2, 432-7823 ---> Support the GST: Canada's first fair tax <---