e@Daisy.EE.UND.AC.ZA (06/27/90)
ugene@wilbur.nas.nasa.gov (Eugene N. Miya) Date: 25 Jun 90 12:41:47 GMT Message-ID: <9458@hubcap.clemson.edu> Newsgroups: comp.parallel,comp.arch,soc.culture.japan To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] Re: ETL Dataflow update 25 June 1990 Some members of the ETL dataflow group read my report before it was circulated, but recent updates arrived too late to be included. Briefly, the most important of these are (1) An 80 processor version of EM-4 is running at 997 MIPS (2) The DFC-II compiler is running (2) The full 1,024 processor version of EM-4 will have floating point hardware I have revised my report to include this information. This revision is available on request. I think you can post this one. I asked the ETL group for more comments but they never responded although I gave them a week so I am going to assume it's okay. David To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] Re: ETL Dataflow Project. 19 June 1990 ABSTRACT. A visit to the Data Flow project at ETL is summarized. The Dataflow SIGMA-1 computer project is ending. The new project EM-4, will have 1,024 processors and is designed to have less overhead. EM-4 was originally proposed for symbolic rather than numeric computation, but the designers now feel that with the inclusion of floating point hardware it will also be used for numerical computation. Currently an 80 processor version of EM-4 is running at 997 MIPs. On 28 March 1990, Dr. Bill Buzbee (NCAR), Prof. Jack Dongarra (UTenn), and I visited the Computer Architecture Section of the Computer Sciences Division at the Electrotechnical Laboratory (ETL) in Tsukuba. I had been to Tsukuba several times earlier, but never to ETL. ETL is the largest national research institute in Japan, belonging to the Agency of Industrial Science and Technology (AIST) of the Ministry of International Trade and Industry (MITI). As a national lab it is surprising that it only has about 700 employees, and a 1988 budget of around $70M U.S. More surprising still is that about 550 of the employees are members of the research staff, only about 150 are general or administrative staff. In fact Buzbee remarked that NCAR also has about 700 employees, but the ratio of scientific to non scientific staff was almost exactly reversed there. I am familiar with other U.S. laboratories and the NCAR ratio is similar to those. Thus the ETL scientists must be remarkably self sufficient or their counting process differs from ours. ETL will be 100 years old next year, beginning as a testing laboratory under the Bureau of Electrocommunication. Its current charter is to perform basic research and development in electronics, information processing, energy technology and standards and measurements. Emphasis is placed on the development of technologies which exert impact on society and industry, and which are so advanced that they require long time and substantial risk to attain their goals. The activites of ETL that are directly computer related are contained in the Information Sciences Division, Computer Science Division, Machine Understanding Division, and Intelligent Systems Division. Together these groups contain 120 researchers. A major project of the Computer Science Division has been the construction of the Dataflow computer, SIGMA-1, and its follow up the EM-4. Scientists who are working on this project include the following. Dr. Toshio Shimada Chief of Computer Architecture Section Computer Science Division Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba-shi Ibaraki 305, Japan Tel: (0298) 54-5443 Email: SHIMADA@ETL.GO.JP Dr. Shuichi Sakai Tel: (0298) 58 5876, Fax: (0298) 58 5882 Email: SAKAI@ETL.GO.JP Dr. Satoshi Sekiguchi Tel: (0298) 58-5877, Fax: (0298) 58-5882 SEKIGUCHI@ETL.GO.JP Dr. Yoshinori Yamaguchi Tel: (0298) 58-5873, Fax: (0298) 58-5882 YAMAGUTI@ETL.GO.JP Technical reports about the ETL dataflow project are available by contacting Dr. Sekiguchi. In addition to the dataflow group, we also met with the ETL Director, Dr. Hiroshi Kashiwagi Director-General, ETL Tel: (0298) 54-5002, Fax: (0298) 55-1729 and the Directors of the Information Science, and Intelligent Systems Divisions, Dr. Koichiro Tamura Director, Information Science Division Tel: (0298) 54-5414, (0298) 58-5361-51560 Fax: (0298) 58-5156 Email: KTAMURA@ETL.GO.JP Dr. Toshitsugu Yuba Director, Intelligent Systems Division Tel: (0298) 54-5412 Email: YUBA@ETL.GO.JP SIGMA-1 has been extensively written about--it is not a new project and it is ending this year--so we were mostly interested in learing about their more recent work. An excellent summary of dataflow, specifically oriented toward the ETL dataflow SIGMA-1 project was written by Shimada, Hiraki, and Sekiguchi of ETL and then translated to English by Dr. C. Eoyang of the Institute of Supercomputing Research (ISR) in Tokyo. This translation, which I used heavily in writing this report, was published in "Vector Register, 1988.11.15." Contact Eoyang at the address below. Dr. C. Eoyang Inst for Supercomputing Research 15F Inui Building Kachidoki 1-13-1 Kachidoki, Chuo-ku Tokyo 104, Japan Tel: 82-03-536-9661 Fax: 82-03-536-9670 Email: EOYANG@ISR.RECRUIT.CO.JP In principle the dataflow idea is very simple, that an instruction should execute whenever its data are available. As a prototypical example, consider the computation (a+b)*(c+d). Such a computation could be broken into a "dataflow graph" which looks like the following. a b c d \ / \ / \ / \ / + + \ / \ / \ / \ / * | | A computation begins at one of the + or * points (called nodes) whenever a token appears, signifying that data are present on an input line. The node "fires" (arithmetic operation is executed) when all input data are available. At that point all input tokens are removed and a token is placed on the output line. Obviously in this model many other computations are independent of this one; thus many nodes can fire simultaneously. Various enhancements are necessary for this model to work in practice; these have to do with making best use of the dataflow graph once it has been created and allowing parallel and loop operations to be executed. In the SIGMA-1 project these enhancements are called "dynamic computational model." The key research projects are associated with implementing this model in practical hardware, and designing and implementing user-usable software that can take advantage of the hardware. The SIGMA-1 project ends this year. The EM-4 project is a follow that attempts to build on what was learned earlier. The main emphasis is to simplify the total architecture, by putting several processing elements onto a single chip with a simplified network structure. The ETL group also decided that some modifications of the "pure" dataflow model were necessary to efficiently match the machine with real programs and get maximum performance from it. In the context of a dataflow graph, it was observed that no strategy was proposed to permit maximum utilization of the processing elements by detecting possible critical paths and scheduling the computation so that work along these paths had priority. The new model, called "strongly connected arc model" attempts to remedy this by allowing certain portions of the dataflow graph to be performed in a more or less traditional way. As part of the project the ETL group is developing a C-like language, DFC-II, which is only partly a functional language. Using traditional functional languages it is difficult to write programs for the utilities that are necessary in practical programs, such as writing synchronization, resource management, and global variables. DFC-II was not running when we visited although we are informed that it is now up on the EM-4. Lack of software has been criticized (see below), although this year the group's emphasis has shifted toward software development. Dr. Shimada explained that the group plans to complete the compiler work by autumn, and then begin to evaluate the architecture on practical application programs. He also noted that at some point the group will also develop a functional language. SIGMA-1 was specifically designed for numeric computation. The new machine EM-4, as originally described had no hardware to support floating point. In fact, in several papers Yamaguchi et. al., says that their field of interest has shifted away from numerical to symbolic manipulations involving knowledge information processing. Thus this machine was less interesting to me as a computer that would support simulation research, and I sensed that Buzbee and Dongarra had similar feelings. However, Dr. Sakai in the dataflow group has recently told us that floating point will be included in the 1,024 processor version, which will then be able to perform at about 20 Gigaflops. Their vision is that EM-4 will also be suitable for numerical computation. Neither I, Buzbee, nor Dongarra are experts in computer architecture. Dr. Olaf Lubeck, a researcher in the Computer Research and Applications Division of Los Alamos National Laboratory also visited ETL late last year. His research is directly related to the work at ETL and this was his second visit to the dataflow laboratory. I asked him to provide me with a short assessment of that visit, a portion of which is included below. I am indebted to him for this effort. My personal impression is more positive than his. Despite the difficulties that have slowed progress one should realize that more than a decade of research was necessary to reach the level we see today and that many fundamental problems had to be solved. The group is remarkably productive. In fact, we three (B/D/K) were quite astonished at how few people were actually working on this project. It is a very small group working without the assistance of many students, and based on my statistics above, with only limited administrative support. Nevertheless, their research is of long term interest. They are building a "real" dataflow computer, neither a software simulator nor a hardware simulator. Currently, an 80 processor version has been built; it performs at 996 MIPS. Dr. Shimada told me that preliminary evaluation on small benchmark programs shows that the performance is 15-100 times faster than a Sparcstation 330. The design allows for 1,024 processors and this is the next project. I would not be surprised to see some attempt to commercialize it. ----------------- Dr. Olaf Lubeck Computer Research and Applications LANL Los Alamos, New Mexico 87545 Tel: (505) 667-6017, Fax: (505) 665-3812 OML@LANL.GOV Dec 10-13 1989: ElectroTechnical Laboratory (ETL), Tskuba, Japan Purpose: Discuss progress of the ETL dataflow parallel computer project; Give two seminars at ETL Summary: 1. The ETL dataflow group had previously built a 128 processor tagged token dataflow machine called SIGMA-1. They now have designed and are building a follow-on single-chip RISC processor (EM-4) that is a hybrid between an instruction-level dataflow processor and a typical von Neumann processor. The hybrid solves some of the inefficiency problems associated with pure dataflow machines. The single-chip processor will eventually be integrated into a 1000 processor system. 2. While at ETL, I gave two seminars - the first was entitled "Resource Management in Dataflow: A Case Study of Two Numerical Codes" and the second was "Vectorization of Monte Carlo Algorithms: An Architectural Study of Vector Processors" Funding: Funds for this trip were provided in whole by the Japanese ElectroTechnical Laboratory. Detailed Trip Report 1. Visit to ETL (Dataflow Research Project) I visited ETL for three days. While there, I discussed mutual interests in dataflow research with the group. The group consists of three sections built around one of three different machines. The first section is investigating the SIGMA-1 tagged-token dataflow machine. The machine was completed about a year ago. It consists of 128 processors, is a "pure" operational level dataflow machine, and modeled after the MIT Tagged-Token dataflow architecture. Since my last visit a year ago, little progress has been made to allow the programming of the machine from a high-level language. The intent of this effort has been and still is the development of a C-like language called DFC II. The language is imperative and contains explicit synchronization to allow user-level partitioning of programs. However, the compiler is still being written and I was unable to execute any code using it. Funding that has been explicitly targeted for this project is ending in six months. Although the SIGMA-1 represents the largest dataflow machine to date, the overall results of this project are disappointing due to a complete lack of usable software. The second effort in the ETL dataflow group is led by Dr. Sakai. The goals of this section are 1) the construction of a hybrid dataflow single chip processor, 2) the development of an 80 processor prototype system consisting of these single-chip processors and finally, the construction of a full 1000 processor machine. So far, they have designed the single chip processor and have manufactured 5 chips. The 5 chips along with 1 megaword of memory for each processor reside on a single board. The CMOS chips have 50,000 gates each and are being manufactured by the Japanese division of LSI Logic which has a plant in Tskuba. While there, I saw a demonstration of a Fibonacci algorithm handed-coded in assembly language execute on the 5 processors. The EM-4 section believes that "pure" operational dataflow is inefficient and has, therefore, built the processor with a hybrid dataflow model called the strongly connected block model. The main feature of the model is to take an instruction level dataflow graph and to collect nodes together into many single "strongly connected blocks". Each strongly connected block will then be executed on a single processor von Neumann style with a program counter and registers. Anything outside of a block will be executed in a dataflow model where matching operands fire an instruction. Strongly connected blocks are macro nodes in a dataflow graph and execute when all of their operands arrive. The model encorporates the ability to execute variable size dataflow nodes from a single instruction to an entire program. The multiprocessor machine has an Omega interconnection network where each node of the network is a processor-memory pair. One of the more unique features of the system is a built-in hardware capability to collect processor activity. A token is circulated around the network that collects activity information about the amount of allocated heap storage and the amount of unmatched tokens for the least active node. Load balancing mechanisms can then use this information. There seems to be two major weaknesses in the effort currently. The first is that no floating point hardware is encorporated on the single processor chip. The machine was funded as a symbolic processor, not as a numeric processor. The designers view this aspect of the project as politically motivated to ensure uniqueness compared to the SIGMA-1 machine. The second weakness of the effort is the same problem that the SIGMA-1 effort had - all hardware, no software. Only time will tell whether the software will come. The plan seems to be to use whatever the SIGMA-1 project had in terms of software, but it is not currently usable. The third section in the ETL dataflow group is designing a coarse grain datafow machine called CODA. Because of language difficulties, I understood little of what was unique about this project. The effort is based around multi-threaded architectural ideas and is early in its design stages. Concepts have not been finalized and no hardware has been built. ---------End of Lubeck's report-------- References 1. Shimada T., Hiraki K., Sekiguchi S., "A Dataflow Supercomputer for Scientific Computations: The SIGMA-1 System", translated into English by Eoyang, C, and published in The Institute for Supercomputing Research Vector Register, 1988.11.15 pp3-9. 2. Sakai S., Yamaguchi Y., Hiraki K., Kodama Y., Yuba T., "An Architecture of a Dataflow Single Chip Processor" ACM 0884- 7495/89/0000/0046, 1989. 3. Sekiguchi S., Shimada T., Hiraki K., "A Design of Practical Dataflow Language, DFCII and Its Data Structures", ETL Technical Report, ETL-TR- 90-16, 1990. 4. Yamaguchi Y., Sakai S., Hiraki K., Kodama Y., "An Architectural Design of a Highly Parallel Dataflow Machine" Information Processing '89, G. X. Ritter (ed), Elsevier Science Publishers (North Holland), pp1155-1160. 5. Shimada T., Hiraki K., Nishida K., Sekiguchi S., "Evaluation of a Prototype Dataflow Processor of the SIGMA-1 for Scientific Computations", 12th Int Symp on Computer Arch, Tokyo Japan, IEEE Computer Society, 1986, pp226-234. 6. Hiraki K., Sekiguchi S., Shimada T., "System Architecture of a Dataflow Supercomputer", Tencon '87-IEEE Region 10 Conference, 1987, pp1044-1049. 7. Sekiguchi S., Shimada T., Hiraki K., "A Design of a Dataflow Language DFCII for New Generation Supercomputers", (in Japanese) Vol 30, No. 12 1989, pp1639-1645. 8. Tamura K., "Outline of the Project--High Speed Computing System for Scientific and Technological Uses", draft 1990. -------------END OF REPORT----------------------------------- --- QM v1.00 * Origin: Bink of an Aye - Portland, OR US - PEP/V32 (1:105/42.0)