rick@cs.arizona.edu (Rick Schlichting) (03/25/91)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Asia (ONR/Asia). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David Kahaner ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp] Re: SX3 Benchmarks from Swiss team tests. 24 March 1991 ABSTRACT. Benchmarks from Swiss team tests on NEC SX-3/12 & /14, made August and December 1990. Several times over the past year, Dr. Armin Friedli Interdisciplinary Project for Supercomputing (IPS) ETH-Zentrum Fliederstrasse CH-8092, Zurich Switzerland Tel: +41 1 256 3440, Fax: +41 1 252 0185 Email: FRIEDLI@IPS.ETHZ.CH has been in Japan as the leader of a team to evaluate supercomputers for possible use in Switzerland as a national supercomputer. (I've known Dr. Friedli for many years since I spent a sabbatical at ETH.) Now a decision has been made to acquire an SX-3 to address supercomputing needs in Swiss research universities and institutes. A two processor SX-3/22 is to be running by 10/91, with plans to upgrade this to a four processor model afterwards. The SX-3 that is being installed will have 1 GByte of main memory, 4 GBytes of extended memory, 70 GBytes of disk storage, including 20 GBytes of high speed disks. A 1 TByte cartridge system and tapes will be available for archiving. The computer center will be located in the small city of Manno, near Lugano, but operated by ETH Zurich. Access will be provided by the Swiss national university research network, SWITCH. As part of the decision making process, a number of real application programs were collected among the Swiss universities supercomputer users in early 1990. They were to serve as a basis for extended tests to measure single program and throughput performance. Friedli has provided me with summaries of their benchmark tests, which are further summarized below. (A more complete description will appear in IPS Windows, No 2, 1991, which can be obtained by contacting Friedli.) The tests were based on unaltered versions of the programs. No changes were permitted except those necessary to run the program on one processor; compiler options were permitted, but compiler directives were not permitted except to replace those directives already contained in the program. Similarly, program library routines were not permitted except to replace those already contained in the program. (Some of the programs had been running on Cray computers and hence had some directives included or had been optimized in a greater or lesser way.) In the table below, the NEC SX-3 is compared with a Cray Y-MP/8128, with eight processors, 128 MWords of main memory and a clock time of 6.0 nanoseconds. UNICOS(5.1) and CFT77(4.0) were used. The Cray tests took place at Cray Research in August 1990. The SX-3 tests took place in December 1990, on an SX-3/22, with two processors and two pipe sets each, 1 GByte main memory, 4 GBytes extended memory, and a clock time of 2.9 nanoseconds. A model /24 was also tested. In both cases only one procesor was used. A pre-release version of SUPER-UX(R1.1) and a pre-release version of FORTRAN77/SX(R1.1) were used. The results below were measured on a single processor. The performance ratios given in the table were obtained by computing the ratio of CPU- times measured on the two systems. Friedli noted that the SX-3 is at the beginning of its life cycle and that further improvements can be expected, and that hand tuning of these programs may significantly improve performance. A brief description of the programs follows the table. For reference, several Linpack results are also given, 100S refers to 100 equations scalar mode, 100V refers to 100 equations vector mode, 1000 refers to 1000 thousand equations, best effort test. The same program may appear several times, such as FANTOM-1, FANTOM-2, referring to different cases, i.e., different input data. Program Name SX-3/12 SX-3/14 speedup speedup vs YMP/1 vs YMP/1 CHEASE-3 1.3 1.3 FANTOM-2 1.5 1.5 BBINTER-1 2.0 2.0 LINPACK100-S 2.1 2.1 SECOND/S-4 1.8 2.1 PWSCF-1 2.0 2.2 ZETA-1 2.3 2.3 CYL3D-2 2.3 2.3 CHEASE-2 2.3 2.5 LINPACK100-V 2.5 2.5 FANTOM-1 2.6 2.6 MCRG16-1 2.3 3.3 CYL3D-3 3.3 3.4 T1XY-1 2.6 3.5 GLOBE-1 3.5 4.3 MCRG32-2 3.0 4.6 CORES-1 3.8 5.5 CONV3D/C-4 3.7 5.6 CONV3D/R-4 2.6 5.6 CONV3D/C-3 4.3 7.1 TERPSICHORE-1 5.3 8.5 LINPACK100 6.7 >10.0 BBINTER: High Energy Physics. Particle tracking of particles in a high energy colliding beam storage ring. CHEASE: Plasma Physics. Cubic Hermite Element Axisymmetric Static Equilibrium. Grad-Shafranov equation in axisymmetric geometry, produces a mapping for stability code ERATO. CIRC: Computational Fluid Dynamics. Computational Investigation on Rotational Couette flow. Unsteady turbulent flow through plane or curved channels. CONV3D: Image Analysis (CT and MR). 3D convolution using a separable symmetric kernel for medical data. Image enhancement, smoothing, edge detection. CORES: Elementary Particle Physics (Hadrons). Inversion of Fermion matrices in lattice gauge theory by the conjugate residual method. CYL3D: Computational Fluid Dynamics. Finite volume solution of unsteady incompressible Navier-Stokes equations. FANTOM: Molecular Biology. Energy refinement of polypeptides and proteins. GLOBE: Meteorology. Numerical model for the simulation of an incompressible fluid on a rotating sphere. MCRG32: Elementary Particle Physics. Monte Carlo Renormalization Group calculation for 4D SU(2) lattice gauge theory on 32^4 lattices. PWSCF: Solid State Physics. Plane waves pseudopotential local density self consistent calculation of the band structure and total energy for a solid. SECOND: Integrated Device Simulation. 3D semiconductor device simulation by the finite element method. TERPSICHORE: Plasma Physics. 3D ideal magnetohydrodynamics stability program. T1XY: Molecular Spectroscopy. Flexible models for intramolecular motion, a versatile treatment and its applications to Glyoxal. ZETA: Integer arithmetic. Computing with arbitrary precision. -----------------END OF REPORT------------------------------------------