rick@cs.arizona.edu (Rick Schlichting) (01/19/91)
[This duplicates a previous article in comp.research.japan, but
is provided for completeness. -- Rick]
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Asia (ONR/Asia).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution
From: David K. Kahaner ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: Comments on ETL's parallel processing projects from M. Rosing.
18 Jan 1991
ABSTRACT. Matt Rosing (U Colorado) reports on work during
summer 1990 in the Computer Architecture Division of the
Electrotechnical Lab in Tsukuba Japan.
Rosing's comments provide welcome detail to other reports on this subject
that I have distributed, including the following.
"etl" 2 July 1990
"dataflow" 16 August 1990
"parallel.904" 6 November 1990
"ricks" 17 January 1991
Matt Rosing
Department of Computer Science
Campus Box 430
Boulder, Colorado 80309
(ROSING@BOULDER.COLORADO.EDU)
Last summer I had the opportunity to spend two months at the
Electrotechnical Lab in Tsukuba Japan. This was an NSF sponsored trip for
American grad students to spend time in Japan learning about the research
and culture in Japan. [See below for more details. (DKK)]. I worked in
the Computer Architecture Division and spent most of my time learning
about the SIGMA-1, EM-4, and CODA projects. I have seen a few reports,
from Kahaner, Schlichting, and others, describing these projects and
would like to add a little more detail.
The SIGMA-1 machine is a 128 node, instruction level data flow machine. The
project was started in 1982. It has measured rates of 170 MFLOPS. I can't
really add anymore information than what Kahaner has described. If you have
questions the best person to ask is Satoshi Sekiguchi (sekiguti@etl.go.jp).
The project that I spent the most time learning about was the EM-4. The EM-
4 is an 80 node, distributed memory, coarse grain data flow machine which
is a prototype for a 1000 node version. By coarse grain, I mean that blocks
of Von-Neumann, register based code are connected by data flow arcs. This
mixed model provides the advantage of data flow (flexible, dynamic
scheduling) at the upper level of a program and advantages of Von-Neumann
machines (static register scheduling) at the lower levels. As many parallel
programs have this type of architecture this appears to be a promising
architecture.
The low level, register based part of this architecture is risc based. The
risc pipeline is integrated with the data flow loop so the two parts work
quite well together (see experiment below). All of the hardware for one
node (including communications) is built into a single gate array chip. The
clock cycle is 80ns. Although there is no floating point in this
prototype, there will be in the next version.
The other part of the architecture is the communications network. Each
processor is one node in an omega network. The communications network is
designed to support data flow computations and therefore is designed for
very fine grain communications. A word can be sent or received in a single
instruction. Messages propagate through the network at one word per node
per clock cycle (80ns). The really nice feature of this is that the user
does not need to worry about packing up words into contiguous memory
buffers before sending a message. Messages are typed by both destination
memory address and category. Example categories are "data message" or
"create process." The user may define these categories.
I wrote a program to illustrate some of these features. The test was a
smoothing algorithm which worked on a vector. It is an iterative algorithm
in which the next value of each element of a vector is a function of the
previous value and the two neighboring values. This generates
communications which are similar to many numerical problems (PDEs etc). In
a sense, the test I wrote was not something which is ideally suited for
data flow machines. It is more of a data parallel algorithm which is more
ideally suited for a SIMD machine like the CM-2.
The program consisted of placing a "process" (or whatever the equivalent
idea is for data flow machines) on each node which iteratively (10000
cycles) received the values from the two neighboring processors, added them
together, and sent the results to the two neighboring processors. So
messages are one word long and there are lots of messages. The results of
the test were that it took 36 clock ticks to perform each cycle. (The EM-4
has a wonderful timing facility which counts clock ticks.) During each
iteration two words were sent so, at 80 ns/cycle, this corresponds to
1440ns/word or 360ns/byte. (Also includes the integer add). This compares
quite favorably to 390ns/byte on an IPSC2 with messages long enough to
overcome message start up costs (greater than 20k bytes).
The best person to ask questions about the EM-4 is Shuichi Sakai
(sakai@etl.go.jp)
The final project is called CODA and is still in the paper stage. CODA is a
distributed memory machine that is designed for real time applications.
This machine looks much less like a data flow machine than the EM-4. The
interesting aspects of CODA are in the communications hardware. First of
all, message priorities have been added to support real time applications.
Within the communications hardware, messages with higher priority are
routed before lower priority messages.
To me, the more interesting aspect of CODA is in the synchronization and
communication hardware. Each memory location and register contain full
empty bits which are used to implement produce/consume semantics on all
memory and register access. In conjunction with this, a send or receive
instruction can target any memory element or register on any processor.
Message packets to read or write memory or registers are inserted into the
risc instruction pipeline along with instructions from the local processor.
The result should be very fine grained communications with very little
overhead.
In order to keep the processors busy there are several instruction streams
supported by the hardware. Processes that block on memory or register
accesses can be swapped out and another instruction stream swapped in in a
few clock cycles.
These three constructs, message priorities, register level synchronization,
and multiple instruction stream support, should greatly improve
communications on distributed memory multiprocessors. If you have questions
you should ask Kenji Toda (toda@etl.go.jp)
My stay at ETL was supported by the NSF Japan Summer Institute program. I
strongly recommend that any PhD students in science or engineering that are
interested in Japan should apply for next year's program. Yes, this is an
advertisement but the program was well worth it.
NOTE.
If you would like to apply for support to conduct research and or
study in Japan (either graduate student, post doc, or faculty
appointments), please contact NSF's Japan Program, at Room 1214, NSF,
Washington, D.C. 20550, or by e-mail to NSFJinfo@nsf.gov (on Internet)
or NSFJinfo@nsf (BitNet). The telephone number is (202) 357-9558.
For more general information request the new Japan Program
announcement (NSF 90-144) from NSF's Publications Unit, Room 232, NSF,
Washington, D.C. 20550, or by e-mail to pubs@nsf.gov (Internet) or
pubs@nsf (BitNet).
NSF also provides free to libraries and reference collections a copy
of a catalogue of Japanese government laboratories.
The catalogue gives two-page descriptions of the organization and
major research activities of each of 110 laboratories in Japan which are
run by the national government, public corporations, and non-profit
organizations. The catalogue will be of most use to U.S. scientists and
engineers attempting to find Japanese research partners, whether for
research collaboration or for visits to Japan. NSF would like to place
the catalogue in collections readily accessible to scientists and
engineers at U.S. Ph.D.-granting institutions. Contact the Japan Program,
at the above address.
Citation:
Research Development Corporation of Japan (JRDC). "National
Laboratories and Research Public Corporations in Japan." 200 pp.
[Tokyo,] Japan, 1990. Revision of the first edition, 1987, and
supplement (Part II), 1988.
Another useful NSF contact is Dr. Douglas McNeal (DMCNEAL@NSF.GOV).
------------------END OF REPORT------------------------------------------