rick@cs.arizona.edu (Rick Schlichting) (03/25/91)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Asia (ONR/Asia).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution
From: David Kahaner ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: SX3 Benchmarks from Swiss team tests.
24 March 1991
ABSTRACT. Benchmarks from Swiss team tests on NEC SX-3/12 & /14, made
August and December 1990.
Several times over the past year,
Dr. Armin Friedli
Interdisciplinary Project for Supercomputing (IPS)
ETH-Zentrum
Fliederstrasse
CH-8092, Zurich
Switzerland
Tel: +41 1 256 3440, Fax: +41 1 252 0185
Email: FRIEDLI@IPS.ETHZ.CH
has been in Japan as the leader of a team to evaluate supercomputers for
possible use in Switzerland as a national supercomputer. (I've known Dr.
Friedli for many years since I spent a sabbatical at ETH.) Now a decision
has been made to acquire an SX-3 to address supercomputing needs in Swiss
research universities and institutes. A two processor SX-3/22 is to be
running by 10/91, with plans to upgrade this to a four processor model
afterwards. The SX-3 that is being installed will have 1 GByte of main
memory, 4 GBytes of extended memory, 70 GBytes of disk storage, including
20 GBytes of high speed disks. A 1 TByte cartridge system and tapes will
be available for archiving. The computer center will be located in the
small city of Manno, near Lugano, but operated by ETH Zurich. Access will
be provided by the Swiss national university research network, SWITCH.
As part of the decision making process, a number of real application
programs were collected among the Swiss universities supercomputer users
in early 1990. They were to serve as a basis for extended tests to
measure single program and throughput performance. Friedli has provided
me with summaries of their benchmark tests, which are further summarized
below. (A more complete description will appear in IPS Windows, No 2,
1991, which can be obtained by contacting Friedli.) The tests were based
on unaltered versions of the programs. No changes were permitted except
those necessary to run the program on one processor; compiler options
were permitted, but compiler directives were not permitted except to
replace those directives already contained in the program. Similarly,
program library routines were not permitted except to replace those
already contained in the program. (Some of the programs had been running
on Cray computers and hence had some directives included or had been
optimized in a greater or lesser way.)
In the table below, the NEC SX-3 is compared with a Cray Y-MP/8128, with
eight processors, 128 MWords of main memory and a clock time of 6.0
nanoseconds. UNICOS(5.1) and CFT77(4.0) were used. The Cray tests took
place at Cray Research in August 1990. The SX-3 tests took place in
December 1990, on an SX-3/22, with two processors and two pipe sets each,
1 GByte main memory, 4 GBytes extended memory, and a clock time of 2.9
nanoseconds. A model /24 was also tested. In both cases only one procesor
was used. A pre-release version of SUPER-UX(R1.1) and a pre-release
version of FORTRAN77/SX(R1.1) were used.
The results below were measured on a single processor. The performance
ratios given in the table were obtained by computing the ratio of CPU-
times measured on the two systems. Friedli noted that the SX-3 is at the
beginning of its life cycle and that further improvements can be
expected, and that hand tuning of these programs may significantly
improve performance. A brief description of the programs follows the
table.
For reference, several Linpack results are also given, 100S refers to 100
equations scalar mode, 100V refers to 100 equations vector mode, 1000
refers to 1000 thousand equations, best effort test. The same program
may appear several times, such as FANTOM-1, FANTOM-2, referring to
different cases, i.e., different input data.
Program Name SX-3/12 SX-3/14
speedup speedup
vs YMP/1 vs YMP/1
CHEASE-3 1.3 1.3
FANTOM-2 1.5 1.5
BBINTER-1 2.0 2.0
LINPACK100-S 2.1 2.1
SECOND/S-4 1.8 2.1
PWSCF-1 2.0 2.2
ZETA-1 2.3 2.3
CYL3D-2 2.3 2.3
CHEASE-2 2.3 2.5
LINPACK100-V 2.5 2.5
FANTOM-1 2.6 2.6
MCRG16-1 2.3 3.3
CYL3D-3 3.3 3.4
T1XY-1 2.6 3.5
GLOBE-1 3.5 4.3
MCRG32-2 3.0 4.6
CORES-1 3.8 5.5
CONV3D/C-4 3.7 5.6
CONV3D/R-4 2.6 5.6
CONV3D/C-3 4.3 7.1
TERPSICHORE-1 5.3 8.5
LINPACK100 6.7 >10.0
BBINTER: High Energy Physics. Particle tracking of particles in a high
energy colliding beam storage ring.
CHEASE: Plasma Physics. Cubic Hermite Element Axisymmetric Static
Equilibrium. Grad-Shafranov equation in axisymmetric geometry, produces a
mapping for stability code ERATO.
CIRC: Computational Fluid Dynamics. Computational Investigation on
Rotational Couette flow. Unsteady turbulent flow through plane or curved
channels.
CONV3D: Image Analysis (CT and MR). 3D convolution using a separable
symmetric kernel for medical data. Image enhancement, smoothing, edge
detection.
CORES: Elementary Particle Physics (Hadrons). Inversion of Fermion
matrices in lattice gauge theory by the conjugate residual method.
CYL3D: Computational Fluid Dynamics. Finite volume solution of unsteady
incompressible Navier-Stokes equations.
FANTOM: Molecular Biology. Energy refinement of polypeptides and
proteins.
GLOBE: Meteorology. Numerical model for the simulation of an
incompressible fluid on a rotating sphere.
MCRG32: Elementary Particle Physics. Monte Carlo Renormalization Group
calculation for 4D SU(2) lattice gauge theory on 32^4 lattices.
PWSCF: Solid State Physics. Plane waves pseudopotential local density
self consistent calculation of the band structure and total energy for a
solid.
SECOND: Integrated Device Simulation. 3D semiconductor device simulation
by the finite element method.
TERPSICHORE: Plasma Physics. 3D ideal magnetohydrodynamics stability
program.
T1XY: Molecular Spectroscopy. Flexible models for intramolecular motion,
a versatile treatment and its applications to Glyoxal.
ZETA: Integer arithmetic. Computing with arbitrary precision.
-----------------END OF REPORT------------------------------------------