gannon@iuvax.cs.indiana.edu (08/07/89)
Here is an item that should be of interest to this group:
From: George Cybenko <gc@uicsrd.csrd.uiuc.edu>
Date: Fri, 21 Jul 89 21:07:15 CDT
Subject: 1989 Bell Award for Perfect Benchmarks
Gordon Bell is sponsoring a new award for high performance scien-
tific computing that consists of five categories. Contestants can
compete in any number of the categories described below.
1989 Bell Award for Perfect Benchmark Rules
Four of the five new award categories are based on the Per-
fect Club benchmarks, 13 Fortran codes from a range of scientific
and engineering applications, including fluid dynamics, signal
processing, physical/chemical computation, and engineering
design. The codes have been collected and ported to a number of
computer systems by a group of applications experts from industry
and academia.
The $2,500 prize fund will be distributed appropriately, at
the discretion of the judges, among the winning entries in the
following five categories:
(1) Sixteen or Fewer Processors: The measure is the fastest
wall clock time (including I/O) for the entire Perfect suite
on any computer system that contains no more than 16 proces-
sors. The programs must be executed as a sequential job
stream, i.e., only one of the benchmarks may be executing at
any moment. "Computer system" includes distributed systems.
There are no constraints on modifications that may be made
to the codes to obtain the results as long as solutions are
sufficiently close to solutions obtained by the benchmark
codes.
(2) More than 16 Processors: The measure is the same as in 1.,
except that the computer system has more than 16 processors
and all processors must participate in the execution of each
benchmark.
(3) Perfect Suite Cost-effectiveness: The measure is the max-
imum cost-effectiveness run of the Perfect suite where
cost-effectiveness is defined as 1 divided by the product of
running time and the cost, where running time is defined in
1., and cost is the list price of the computer system and
software at the time of the run. This disqualifies noncom-
mercial machines from competing, unless they are a combina-
tion of commercially available computer systems.
(4) Algorithms Cost-effectiveness: The measure is the maximum
cost-effectiveness for the total running time of four algo-
rithms (not whole benchmarks) chosen each year. The same
rules apply here as in 3., except that the current list
price is based on the minimum configuration required to run
the algorithm. (Write to address below for more details for
1989.)
(5) Perfect Subset: The measure is the minimal running time,
defined as in 1., but with no restriction on the number of
processors, for two codes to be selected annually from the
Perfect suite. (Write to address below for more details for
1989.)
Processor Definition
The processor divisions in 1. and 2., although somewhat
arbitrary, are intended to reflect the broad classes of extant
parallel systems: current systems range from small numbers of
powerful processors to large numbers of extremely simple proces-
sors. The division at 16 would move to a larger number over time.
The number of processors is defined as the number of simultaneous
program execution streams, i.e., in effect the number of program
counters in simultaneous operation. For example the Cray Y-MP in
operation today has 8 processors and the number is projected to
grow to 16 and 64 for the Cray 3 and 4. Similarly, the Thinking
Machines Corp. CM2 has up to 4 processors each with 16K process-
ing elements or is a uniprocessor with 64K processing elements.
The Perfect Club Benchmark Applications
The Perfect Club was formed with the purpose of developing
and applying a scientific methodology for the performance evalua-
tion of supercomputers. Club members were drawn from industry
and academic sectors and an initial suite of 13 Fortran codes
were designated as the "Perfect" Benchmark programs. These codes
were selected because they solved fundamental problems across a
variety of applications requiring supercomputing performance -
fluid dynamics, signal processing, physical/chemical computations
and engineering design. See [BCKK89] for more information about
the Perfect Club codes.
[BCKK89] Berry, M., Chen, D., Koss, P., Kuck, D., Lo, S., Pang,
Y. Pointer, L., Roloff, R., Sameh, A., Clementi, E.,
Chin, S., Schneider, D., Fox, G., Messina, P., Walker,
D., Hsiung, C., Schwarzmeier, J., Lue, K., Orszag, S.,
Seidl, F., Johnson, O., Swanson, G., Goodrum, R., Mar-
tin, J., The Perfect Club Benchmarks: Effective Per-
formance Evaluation of Supercomputers, CSRD Report No.
827, 1988. (To appear in the International Journal of
Supercomputing Applications, 1989.)
The deadline for contest submissions is December 31, 1989.
For more information about the Perfect Benchmark Suite and
the Bell Award for the Perfect Benchmarks write to:
Bell Awards for Perfect Benchmarks
Center for Supercomputing Research and Development
University of Illinois
Urbana, IL 61801
USA
contact persoin lpointer@uicsrd.csrd.uiuc.edu
NOTE: IEEE Software administers a separate prize sponsored by
Gordon Bell. Contact the IEEE Software office for information
about that prize.gillies@m.cs.uiuc.edu (08/08/89)
I read about the new Gordon Bell award in SIAM news. The rules are awfully vague and complicated, especially concerning "Cost Effectiveness", and seem to have several loopholes. The idea seems half-baked.
david@june.cs.washington.edu (David Callahan) (08/08/89)
In article <107900005@iuvax> gannon@iuvax.cs.indiana.edu writes: >Here is an item that should be of interest to this group: >From: George Cybenko <gc@uicsrd.csrd.uiuc.edu> >Date: Fri, 21 Jul 89 21:07:15 CDT >Subject: 1989 Bell Award for Perfect Benchmarks >Processor Definition >The number of processors is defined as the number of simultaneous >program execution streams, i.e., in effect the number of program >counters in simultaneous operation. For example the Cray Y-MP in >operation today has 8 processors and the number is projected to >grow to 16 and 64 for the Cray 3 and 4. Similarly, the Thinking >Machines Corp. CM2 has up to 4 processors each with 16K process- >ing elements or is a uniprocessor with 64K processing elements. This definition seams to be a little restrictive. Would a data-flow processor such as Arvind's Monsoon or the recently proposed P-RISC then have an unbounded number of processors? despite the fact that it might be able to issue only one arithmetic operation per tick per processor? What about a multi-stream architecture such as the HEP or the CHOPP where a large number of "instruction streams" are multiplexed onto a single arithmetic pipeline? (Aside: isn't the CM2 a single stream machine? I'd never heard that the 4 quadrants could be run independently.) (A little out of order...) >(2) More than 16 Processors: The measure is the same as in 1., > except that the computer system has more than 16 processors > and all processors must participate in the execution of each > benchmark. What does this mean for a data flow machine? Must a multi-stream architecture have all streams firing? even when the pipeline saturates well below this level? Why should all processors be required? If the problem sizes in the benchmark set are fixed, Amdahl's law will make it very difficult for large machines to "compete" with small machines. Disclaimer: I work for a company that is designing a multi-stream, multi-processor so my concern over Bell's definitions may appear less academic than I'd like to beleive they are. David Callahan (david@tera.com, david@june.cs.washington.edu,david@rice.edu) Tera Computer Co. 400 North 34th Street Seattle WA, 98103 -- David Callahan (david@tera.com, david@june.cs.washington.edu,david@rice.edu) Tera Computer Co. 400 North 34th Street Seattle WA, 98103
ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (08/09/89)
In article <8953@june.cs.washington.edu> david@tera.com (David Callahan) writes:
!>>(from the benchmark contest)
!>>The number of processors is defined as the number of simultaneous
!>>program execution streams, i.e., in effect the number of program
!>>counters in simultaneous operation. For example the Cray Y-MP in
!>>operation today has 8 processors and the number is projected to
!>>grow to 16 and 64 for the Cray 3 and 4. Similarly, the Thinking
!>>Machines Corp. CM2 has up to 4 processors each with 16K process-
!>>ing elements or is a uniprocessor with 64K processing elements.
!>(Aside: isn't the CM2 a single stream machine? I'd never heard that
!>the 4 quadrants could be run independently.)
Yes, depending on your installation, the CM can be split up into several
segments. We have a 16K CM2 which can be attached to as 16K processors
or 2 users @ 8K processors (we also have an 8K CM2 which can be used as
all 8K or two users @ 4K...).
And just because every processor gets the same instruction feed, one must not
think that every processor is "doing the same thing." Each CM processor
can hold an index to an array located in that processor, so with the
right software, the CM could become a MIMD machine.
-Thomas Edwards
tedwards@cmsun.nrl.navy.mil