jsexton@arnor.watson.ibm.com (Jim Sexton) (06/07/90)
We've noted the following recent submission to "comp.sys.super" describing the QCDPAX project. > ===QCDPAX attained 12.25 GFLOPS peak speed=== > > Parallel Computer QCDPAX has reached the world-fastest(probably) > effective speed in scientific calculations. If any computer can > exceed the speed of QCDPAX, please let us know. > > ... We include here a brief description of the GF11 project for comparison. GF11 is a SIMD parallel computer being built at IBM's T. J. Watson Research Center in Yorktown Heights, New York. Components of the machine include 566 processors, a programmable Bennis switch network connecting those processors, a file server to provide disk capacity, and a central controller. Each processor has 2 megabytes of dynamic ram, 64 kilobytes of cache, and 256 32 bit registers. Each processor has a peak performance of 20 MegaFlops and sustains over 80% of that peak on typical lattice QCD (Quantum Chromodynamics) calculations. Currently GF11 has 400 processors installed. Thus its current peak performance is 8 GigaFlops. When all 566 processors are installed the peak speed will be 11.3 GigaFlops. The initial architecture design for GF11 is due to Monte Denneau and John Beetem from Computer Science and to Don Weingarten from Physical Sciences at T. J. Watson. Currently the project includes: Computer Science Physical Sciences Yurij Baranski Jim Sexton Mike Cassera Don Weingarten Molly Elliot Dave George Manoj Kumar Randy Moulic Ed Nowicki Micky Tsao > [QCDPAX Benchmark] > > The machine was benchmarked by the QCD model. In the most time > consuming part, 3 by 3 unitary matrix product, QCDPAX with 432 PU's > recorded the speed nearly 4 times as fast as that of CM-2, (CM-2's > measurement was reported in Supercomputing '89 in Reno by C. F. Baillie, > pp.2-9). Single link update time for the subspace heat bath method > with 8 hits is 1.8 micro second, which is three times faster than > the HITAC S820/80 at KEK (peak 3 Gflops). GF11 is currently being used to study the finite temperature deconfining transition is pure gauge lattice QCD. The link update method we employ is a 3 hit modified SU(2) heatbath + Metropolis accept/reject update. This is a slightly different algorithm to that used on QCDPAX but operation counts per link update are comparable. On GF11, the wall clock time to update a single link is 200 microsecs per processor which corresponds to a sustained performance per processor of 90% of peak. On 400 processors our link update time is therefor 0.5 microsecs (200/400). > [QCDPAX Performance for Poisson equation by Red-Black point-SOR method] > > ... The overall effective speed is 2.04 GFLOPS. > ... > If any computer can exceed this speed, please let us know. > We would like to know if our machine is really the world-fastest > or not. The link update times for GF11 quoted above translate to a sustained performance for the pure gauge lattice QCD algorithm of 7.2 GigaFlops (400 processors x 90% efficiency x 20 MegaFlops/Processor Peak). When all 566 processors are installed GF11's sustained performance will jump to 10.2 GigaFlops. QCDPAX quotes a sustained performance of 2 GigaFlops for the Poisson equation solver. For lattice QCD, they only quote a 1.8 microsecs link update time. If we presume the QCDPAX and GF11 operations counts are comparable then 1.8 microsecs per link update translates to about 2 GigaFlops sustained also. Jim Sexton (jsexton@watson.ibm.com)