[comp.sys.super] GF11 Parallel Computer

jsexton@arnor.watson.ibm.com (Jim Sexton) (06/07/90)

We've noted the following recent submission to "comp.sys.super"
describing the QCDPAX project.

>        ===QCDPAX attained 12.25 GFLOPS peak speed===
> 
> Parallel Computer QCDPAX has reached the world-fastest(probably)
> effective speed in scientific calculations.   If any computer can
> exceed the speed of QCDPAX, please let us know.
>
> ...
 
We include here a brief description of the GF11 project for comparison. 

GF11 is a SIMD parallel computer being built at IBM's T. J. Watson
Research Center in Yorktown Heights, New  York.  Components of the
machine include 566 processors, a programmable Bennis switch network
connecting those processors, a file server to provide disk capacity,
and a central controller.  Each processor has 2 megabytes of dynamic
ram, 64 kilobytes of cache, and 256 32 bit registers.  Each processor
has a peak performance of 20 MegaFlops and sustains over 80% of that
peak on typical lattice QCD (Quantum Chromodynamics) calculations.
Currently GF11 has 400 processors installed.  Thus its current peak
performance is 8 GigaFlops.  When all 566 processors are installed the
peak speed will be 11.3 GigaFlops.

The initial architecture design for GF11 is due to Monte Denneau and
John Beetem from Computer Science and to Don Weingarten from Physical
Sciences at T. J. Watson.  Currently the project includes:

    Computer Science               Physical Sciences

        Yurij Baranski                 Jim Sexton
        Mike Cassera                   Don Weingarten
        Molly Elliot                   
        Dave George
        Manoj Kumar
        Randy Moulic
        Ed Nowicki
        Micky Tsao
        
> [QCDPAX Benchmark]
>
> The machine was benchmarked by the QCD model.   In the most time
> consuming part, 3 by 3 unitary matrix product, QCDPAX with 432 PU's
> recorded the speed nearly 4 times as fast as that of CM-2, (CM-2's
> measurement was reported in Supercomputing '89 in Reno by C. F. Baillie,
> pp.2-9).  Single link update time for the subspace heat bath method
> with 8 hits is 1.8 micro second, which is three times faster than
> the HITAC S820/80 at KEK (peak 3 Gflops).

GF11 is currently being used to study the finite temperature deconfining
transition is pure gauge lattice QCD.  The link update method we employ
is a 3 hit modified SU(2) heatbath + Metropolis accept/reject update.
This is a slightly different algorithm to that used on QCDPAX but
operation counts per link update are comparable.  On GF11, the wall clock
time to update a single link is 200 microsecs per processor which
corresponds to a sustained performance per processor of 90% of peak.  On
400 processors our link update time is therefor 0.5 microsecs (200/400). 

> [QCDPAX Performance for Poisson equation by Red-Black point-SOR method]
>
> ...  The overall effective speed is 2.04 GFLOPS.
> ...
> If any computer can exceed this speed, please let us know.
> We would like to know if our machine is really the world-fastest
> or not.
 
The link update times for GF11 quoted above translate to a sustained
performance for the pure gauge lattice QCD algorithm of 7.2 GigaFlops
(400 processors x 90% efficiency x 20 MegaFlops/Processor Peak).
When all 566 processors are installed GF11's sustained performance will
jump to 10.2 GigaFlops.

QCDPAX quotes a sustained performance of 2 GigaFlops for the Poisson
equation solver.  For lattice QCD, they only quote a 1.8 microsecs link
update time.  If we presume the QCDPAX and GF11 operations counts are
comparable then 1.8 microsecs per link update translates to about 
2 GigaFlops sustained also.


   Jim Sexton (jsexton@watson.ibm.com)