[comp.parallel] Kahaner Report: ETL Dataflow Project, Revised and Updated

e@Daisy.EE.UND.AC.ZA (06/27/90)

ugene@wilbur.nas.nasa.gov (Eugene N. Miya)
Date: 25 Jun 90 12:41:47 GMT
Message-ID: <9458@hubcap.clemson.edu>
Newsgroups: comp.parallel,comp.arch,soc.culture.japan
  
To: Distribution
From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: ETL Dataflow update
25 June 1990
  
Some members of the ETL dataflow group read my report before it was
circulated, but recent updates arrived too late to be included.  Briefly,
the most important of these are
 (1) An 80 processor version of EM-4 is running at 997 MIPS
 (2) The DFC-II compiler is running
 (2) The full 1,024 processor version of EM-4 will have floating point
       hardware
I have revised my report to include this information. This revision is
available on request.
  
I think you can post this one. I asked the ETL group for more
comments but they never responded although I gave them a week so I
am going to assume it's okay. David
  
To: Distribution
From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: ETL Dataflow Project.
19 June 1990
  
ABSTRACT. A visit to the Data Flow project at ETL is summarized. The
Dataflow SIGMA-1 computer project is ending. The new project EM-4, will
have 1,024 processors and is designed to have less overhead. EM-4 was
originally proposed for symbolic rather than numeric computation, but the
designers now feel that with the inclusion of floating point hardware it
will also be used for numerical computation. Currently an 80 processor
version of EM-4 is running at 997 MIPs.
  
On 28 March 1990, Dr. Bill Buzbee (NCAR), Prof. Jack Dongarra (UTenn),
and I visited the Computer Architecture Section of the Computer Sciences
Division at the Electrotechnical Laboratory (ETL) in Tsukuba. I had been
to Tsukuba several times earlier, but never to ETL.  ETL is the largest
national research institute in Japan, belonging to the Agency of
Industrial Science and Technology (AIST) of the Ministry of International
Trade and Industry (MITI). As a national lab it is surprising that it
only has about 700 employees, and a 1988 budget of around $70M U.S. More
surprising still is that about 550 of the employees are members of the
research staff, only about 150 are general or administrative staff. In
fact Buzbee remarked that NCAR also has about 700 employees, but the
ratio of scientific to non scientific staff was almost exactly reversed
there. I am familiar with other U.S.  laboratories and the NCAR ratio is
similar to those. Thus the ETL scientists must be remarkably self
sufficient or their counting process differs from ours.
  
ETL will be 100 years old next year, beginning as a testing laboratory
under the Bureau of Electrocommunication. Its current charter is to
perform basic research and development in electronics, information
processing, energy technology and standards and measurements. Emphasis is
placed on the development of technologies which exert impact on society
and industry, and which are so advanced that they require long time and
substantial risk to attain their goals.
  
The activites of ETL that are directly computer related are contained in
the Information Sciences Division, Computer Science Division, Machine
Understanding Division, and Intelligent Systems Division. Together these
groups contain 120 researchers. A major project of the Computer Science
Division has been the construction of the Dataflow computer, SIGMA-1, and
its follow up the EM-4.
  
Scientists who are working on this project include the following.
  
        Dr. Toshio Shimada
        Chief of Computer Architecture Section
        Computer Science Division
        Electrotechnical Laboratory
        1-1-4 Umezono, Tsukuba-shi
        Ibaraki 305, Japan
        Tel: (0298) 54-5443
        Email: SHIMADA@ETL.GO.JP
  
        Dr. Shuichi Sakai
        Tel: (0298) 58 5876, Fax: (0298) 58 5882
        Email: SAKAI@ETL.GO.JP
  
        Dr. Satoshi Sekiguchi
        Tel: (0298) 58-5877, Fax: (0298) 58-5882
        SEKIGUCHI@ETL.GO.JP
  
        Dr. Yoshinori Yamaguchi
        Tel: (0298) 58-5873, Fax: (0298) 58-5882
        YAMAGUTI@ETL.GO.JP
  
Technical reports about the ETL dataflow project are available by
contacting Dr. Sekiguchi.
  
In addition to the dataflow group, we also met with the ETL Director,
  
        Dr. Hiroshi Kashiwagi
        Director-General, ETL
        Tel: (0298) 54-5002, Fax: (0298) 55-1729
  
and the Directors of the Information Science, and Intelligent Systems
Divisions,
  
        Dr. Koichiro Tamura
        Director, Information Science Division
        Tel: (0298) 54-5414, (0298) 58-5361-51560
        Fax: (0298) 58-5156
        Email: KTAMURA@ETL.GO.JP
  
        Dr. Toshitsugu Yuba
        Director, Intelligent Systems Division
        Tel: (0298) 54-5412
        Email: YUBA@ETL.GO.JP
  
  
SIGMA-1 has been extensively written about--it is not a new project and
it is ending this year--so we were mostly interested in learing about
their more recent work. An excellent summary of dataflow, specifically
oriented toward the ETL dataflow SIGMA-1 project was written by Shimada,
Hiraki, and Sekiguchi of ETL and then translated to English by Dr. C.
Eoyang of the Institute of Supercomputing Research (ISR) in Tokyo.  This
translation, which I used heavily in writing this report, was published
in "Vector Register, 1988.11.15." Contact Eoyang at the address below.
         Dr. C. Eoyang
         Inst for Supercomputing Research
         15F Inui Building Kachidoki
         1-13-1 Kachidoki, Chuo-ku
         Tokyo 104, Japan
         Tel: 82-03-536-9661
         Fax: 82-03-536-9670
         Email: EOYANG@ISR.RECRUIT.CO.JP
  
In principle the dataflow idea is very simple, that an instruction should
execute whenever its data are available.  As a prototypical example,
consider the computation (a+b)*(c+d).  Such a computation could be broken
into a "dataflow graph" which looks like the following.
  
   a     b      c     d
    \   /        \   /
     \ /          \ /
      +            +
        \         /
         \       /
          \     /
           \   /
             *
             |
             |
  
A computation begins at one of the + or * points (called nodes) whenever
a token appears, signifying that data are present on an input line.  The
node "fires" (arithmetic operation is executed) when all input data are
available. At that point all input tokens are removed and a token is
placed on the output line. Obviously in this model many other
computations are independent of this one; thus many nodes can fire
simultaneously.  Various enhancements are necessary for this model to
work in practice; these have to do with making best use of the dataflow
graph once it has been created and allowing parallel and loop operations
to be executed. In the SIGMA-1 project these enhancements are called
"dynamic computational model."  The key research projects are associated
with implementing this model in practical hardware, and designing and
implementing user-usable software that can take advantage of the
hardware.
  
The SIGMA-1 project ends this year. The EM-4 project is a follow that
attempts to build on what was learned earlier. The main emphasis is to
simplify the total architecture, by putting several processing elements
onto a single chip with a simplified network structure. The ETL group
also decided that some modifications of the "pure" dataflow model were
necessary to efficiently match the machine with real programs and get
maximum performance from it. In the context of a dataflow graph, it was
observed that no strategy was proposed to permit maximum utilization of
the processing elements by detecting possible critical paths and
scheduling the computation so that work along these paths had priority.
The new model, called "strongly connected arc model" attempts to remedy
this by allowing certain portions of the dataflow graph to be performed
in a more or less traditional way.
  
As part of the project the ETL group is developing a C-like language,
DFC-II, which is only partly a functional language. Using traditional
functional languages it is difficult to write programs for the utilities
that are necessary in practical programs, such as writing
synchronization, resource management, and global variables.  DFC-II was
not running when we visited although we are informed that it is now up on
the EM-4.  Lack of software has been criticized (see below), although
this year the group's emphasis has shifted toward software development.
Dr. Shimada explained that the group plans to complete the compiler work
by autumn, and then begin to evaluate the architecture on practical
application programs. He also noted that at some point the group will
also develop a functional language.
  
SIGMA-1 was specifically designed for numeric computation. The new
machine EM-4, as originally described had no hardware to support floating
point. In fact, in several papers Yamaguchi et. al., says that their
field of interest has shifted away from numerical to symbolic
manipulations involving knowledge information processing. Thus this
machine was less interesting to me as a computer that would support
simulation research, and I sensed that Buzbee and Dongarra had similar
feelings. However, Dr. Sakai in the dataflow group has recently told us
that floating point will be included in the 1,024 processor version,
which will then be able to perform at about 20 Gigaflops. Their vision is
that EM-4 will also be suitable for numerical computation.
  
Neither I, Buzbee, nor Dongarra are experts in computer architecture.
Dr.  Olaf Lubeck, a researcher in the Computer Research and Applications
Division of Los Alamos National Laboratory also visited ETL late last
year. His research is directly related to the work at ETL and this was
his second visit to the dataflow laboratory. I asked him to provide me
with a short assessment of that visit, a portion of which is included
below. I am indebted to him for this effort.  My personal impression is
more positive than his. Despite the difficulties that have slowed
progress one should realize that more than a decade of research was
necessary to reach the level we see today and that many fundamental
problems had to be solved. The group is remarkably productive. In fact,
we three (B/D/K) were quite astonished at how few people were actually
working on this project. It is a very small group working without the
assistance of many students, and based on my statistics above, with only
limited administrative support. Nevertheless, their research is of long
term interest.  They are building a "real" dataflow computer, neither a
software simulator nor a hardware simulator.  Currently, an 80 processor
version has been built; it performs at 996 MIPS.  Dr. Shimada told me
that preliminary evaluation on small benchmark programs shows that the
performance is 15-100 times faster than a Sparcstation 330.  The design
allows for  1,024 processors and this is the next project. I would not be
surprised to see some attempt to commercialize it.
  
-----------------
  
Dr. Olaf Lubeck
Computer Research and Applications
LANL
Los Alamos, New Mexico 87545
Tel: (505) 667-6017, Fax: (505) 665-3812
OML@LANL.GOV
  
Dec 10-13 1989:    ElectroTechnical Laboratory (ETL), Tskuba, Japan
  
Purpose: Discuss progress of the ETL dataflow parallel computer project;
         Give two seminars at ETL
  
Summary: 
         1. The ETL dataflow group had previously built a 128 processor
tagged token dataflow machine called SIGMA-1. They now have designed and
are building a follow-on single-chip RISC processor (EM-4) that is a
hybrid between an instruction-level dataflow processor and a typical von
Neumann processor. The hybrid solves some of the inefficiency problems
associated with pure dataflow machines.  The single-chip processor will
eventually be integrated into a 1000 processor system.
  
         2. While at ETL, I gave two seminars - the first was entitled
"Resource Management in Dataflow: A Case Study of Two Numerical Codes"
and the second was "Vectorization of Monte Carlo Algorithms: An
Architectural Study of Vector Processors"
  
Funding: Funds for this trip were provided in whole by the Japanese
         ElectroTechnical Laboratory.
  
Detailed Trip Report
  
  
  
1. Visit to ETL (Dataflow Research Project)
  
        I visited ETL for three days. While there, I discussed mutual
interests in dataflow research with the group. The group consists of
three sections built around one of three different machines. The first
section is investigating the SIGMA-1 tagged-token dataflow machine. The
machine was completed about a year ago. It consists of 128 processors, is
a "pure" operational level dataflow machine, and modeled after the MIT
Tagged-Token dataflow architecture. Since my last visit a year ago,
little progress has been made to allow the programming of the machine
from a high-level language. The intent of this effort has been and still
is the development of a C-like language called DFC II.  The language is
imperative and contains explicit synchronization to allow user-level
partitioning of programs. However, the compiler is still being written
and I was unable to execute any code using it. Funding that has been
explicitly targeted for this project is ending in six months. Although
the SIGMA-1 represents the largest dataflow machine to date, the overall
results of this project are disappointing due to a complete lack of
usable software.
  
        The second effort in the ETL dataflow group is led by Dr. Sakai.
The goals of this section are 1) the construction of a hybrid dataflow
single chip processor, 2) the development of an 80 processor prototype
system consisting of these single-chip processors and finally, the
construction of a full 1000 processor machine. So far, they have designed
the single chip processor and have manufactured 5 chips. The 5 chips
along with 1 megaword of memory for each processor reside on a single
board. The CMOS chips have 50,000 gates each and are being manufactured
by the Japanese division of LSI Logic which has a plant in Tskuba.  While
there, I saw a demonstration of a Fibonacci algorithm handed-coded in
assembly language execute on the 5 processors.
  
        The EM-4 section believes that "pure" operational dataflow is
inefficient and has, therefore, built the processor with a hybrid
dataflow model called the strongly connected block model. The main
feature of the model is to take an instruction level dataflow graph and
to collect nodes together into many single "strongly connected blocks".
Each strongly connected block will then be executed on a single processor
von Neumann style with a program counter and registers. Anything outside
of a block will be executed in a dataflow model where matching operands
fire an instruction.  Strongly connected blocks are macro nodes in a
dataflow graph and execute when all of their operands arrive. The model
encorporates the ability to execute variable size dataflow nodes from a
single instruction to an entire program.
  
        The multiprocessor machine has an Omega interconnection network
where each node of the network is a processor-memory pair.  One of the
more unique features of the system is a built-in hardware capability to
collect processor activity. A token is circulated around the network that
collects activity information about the amount of allocated heap storage
and the amount of unmatched tokens for the least active node.  Load
balancing mechanisms can then use this information.
  
        There seems to be two major weaknesses in the effort currently.
The first is that no floating point hardware is encorporated on the
single processor chip. The machine was funded as a symbolic processor,
not as a numeric processor.  The designers view this aspect of the
project as politically motivated to ensure uniqueness compared to the
SIGMA-1 machine. The second weakness of the effort is the same problem
that the SIGMA-1 effort had - all hardware, no software.  Only time will
tell whether the software will come. The plan seems to be to use whatever
the SIGMA-1 project had in terms of software, but it is not currently
usable.
  
      The third section in the ETL dataflow group is designing a coarse
grain datafow machine called CODA. Because of language difficulties, I
understood little of what was unique about this project. The effort is
based around multi-threaded architectural ideas and is early in its
design stages.  Concepts have not been finalized and no hardware has been
built.
---------End of Lubeck's report--------
  
References
  
1. Shimada T., Hiraki K., Sekiguchi S., "A Dataflow Supercomputer for
Scientific Computations: The SIGMA-1 System", translated into English by
Eoyang, C, and published in The Institute for Supercomputing Research
Vector Register, 1988.11.15 pp3-9.
  
2. Sakai S., Yamaguchi Y., Hiraki K., Kodama Y., Yuba T., "An
Architecture of a Dataflow Single Chip Processor" ACM 0884-
7495/89/0000/0046, 1989.
  
3. Sekiguchi S., Shimada T., Hiraki K., "A Design of Practical Dataflow
Language, DFCII and Its Data Structures", ETL Technical Report, ETL-TR-
90-16, 1990.
  
4. Yamaguchi Y., Sakai S., Hiraki K., Kodama Y., "An Architectural Design
of a Highly Parallel Dataflow Machine" Information Processing '89, G. X.
Ritter (ed), Elsevier Science Publishers (North Holland), pp1155-1160.
  
5. Shimada T., Hiraki K., Nishida K., Sekiguchi S., "Evaluation of a
Prototype Dataflow Processor of the SIGMA-1 for Scientific Computations",
12th Int Symp on Computer Arch, Tokyo Japan, IEEE Computer Society, 1986,
pp226-234.
  
6. Hiraki K., Sekiguchi S., Shimada T., "System Architecture of a
Dataflow Supercomputer", Tencon '87-IEEE Region 10 Conference, 1987,
pp1044-1049.
  
7. Sekiguchi S., Shimada T., Hiraki K., "A Design of a Dataflow Language
DFCII for New Generation Supercomputers", (in Japanese) Vol 30, No. 12
1989, pp1639-1645.
  
8. Tamura K., "Outline of the Project--High Speed Computing System for
Scientific and Technological Uses", draft 1990.
  
  
-------------END OF REPORT-----------------------------------
  
  
--- QM v1.00
 * Origin: Bink of an Aye - Portland, OR US - PEP/V32 (1:105/42.0)