rick@cs.arizona.edu (Rick Schlichting) (01/19/91)
[This duplicates a previous article in comp.research.japan, but is provided for completeness. -- Rick] [Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Asia (ONR/Asia). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David K. Kahaner ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp] Re: Comments on ETL's parallel processing projects from M. Rosing. 18 Jan 1991 ABSTRACT. Matt Rosing (U Colorado) reports on work during summer 1990 in the Computer Architecture Division of the Electrotechnical Lab in Tsukuba Japan. Rosing's comments provide welcome detail to other reports on this subject that I have distributed, including the following. "etl" 2 July 1990 "dataflow" 16 August 1990 "parallel.904" 6 November 1990 "ricks" 17 January 1991 Matt Rosing Department of Computer Science Campus Box 430 Boulder, Colorado 80309 (ROSING@BOULDER.COLORADO.EDU) Last summer I had the opportunity to spend two months at the Electrotechnical Lab in Tsukuba Japan. This was an NSF sponsored trip for American grad students to spend time in Japan learning about the research and culture in Japan. [See below for more details. (DKK)]. I worked in the Computer Architecture Division and spent most of my time learning about the SIGMA-1, EM-4, and CODA projects. I have seen a few reports, from Kahaner, Schlichting, and others, describing these projects and would like to add a little more detail. The SIGMA-1 machine is a 128 node, instruction level data flow machine. The project was started in 1982. It has measured rates of 170 MFLOPS. I can't really add anymore information than what Kahaner has described. If you have questions the best person to ask is Satoshi Sekiguchi (sekiguti@etl.go.jp). The project that I spent the most time learning about was the EM-4. The EM- 4 is an 80 node, distributed memory, coarse grain data flow machine which is a prototype for a 1000 node version. By coarse grain, I mean that blocks of Von-Neumann, register based code are connected by data flow arcs. This mixed model provides the advantage of data flow (flexible, dynamic scheduling) at the upper level of a program and advantages of Von-Neumann machines (static register scheduling) at the lower levels. As many parallel programs have this type of architecture this appears to be a promising architecture. The low level, register based part of this architecture is risc based. The risc pipeline is integrated with the data flow loop so the two parts work quite well together (see experiment below). All of the hardware for one node (including communications) is built into a single gate array chip. The clock cycle is 80ns. Although there is no floating point in this prototype, there will be in the next version. The other part of the architecture is the communications network. Each processor is one node in an omega network. The communications network is designed to support data flow computations and therefore is designed for very fine grain communications. A word can be sent or received in a single instruction. Messages propagate through the network at one word per node per clock cycle (80ns). The really nice feature of this is that the user does not need to worry about packing up words into contiguous memory buffers before sending a message. Messages are typed by both destination memory address and category. Example categories are "data message" or "create process." The user may define these categories. I wrote a program to illustrate some of these features. The test was a smoothing algorithm which worked on a vector. It is an iterative algorithm in which the next value of each element of a vector is a function of the previous value and the two neighboring values. This generates communications which are similar to many numerical problems (PDEs etc). In a sense, the test I wrote was not something which is ideally suited for data flow machines. It is more of a data parallel algorithm which is more ideally suited for a SIMD machine like the CM-2. The program consisted of placing a "process" (or whatever the equivalent idea is for data flow machines) on each node which iteratively (10000 cycles) received the values from the two neighboring processors, added them together, and sent the results to the two neighboring processors. So messages are one word long and there are lots of messages. The results of the test were that it took 36 clock ticks to perform each cycle. (The EM-4 has a wonderful timing facility which counts clock ticks.) During each iteration two words were sent so, at 80 ns/cycle, this corresponds to 1440ns/word or 360ns/byte. (Also includes the integer add). This compares quite favorably to 390ns/byte on an IPSC2 with messages long enough to overcome message start up costs (greater than 20k bytes). The best person to ask questions about the EM-4 is Shuichi Sakai (sakai@etl.go.jp) The final project is called CODA and is still in the paper stage. CODA is a distributed memory machine that is designed for real time applications. This machine looks much less like a data flow machine than the EM-4. The interesting aspects of CODA are in the communications hardware. First of all, message priorities have been added to support real time applications. Within the communications hardware, messages with higher priority are routed before lower priority messages. To me, the more interesting aspect of CODA is in the synchronization and communication hardware. Each memory location and register contain full empty bits which are used to implement produce/consume semantics on all memory and register access. In conjunction with this, a send or receive instruction can target any memory element or register on any processor. Message packets to read or write memory or registers are inserted into the risc instruction pipeline along with instructions from the local processor. The result should be very fine grained communications with very little overhead. In order to keep the processors busy there are several instruction streams supported by the hardware. Processes that block on memory or register accesses can be swapped out and another instruction stream swapped in in a few clock cycles. These three constructs, message priorities, register level synchronization, and multiple instruction stream support, should greatly improve communications on distributed memory multiprocessors. If you have questions you should ask Kenji Toda (toda@etl.go.jp) My stay at ETL was supported by the NSF Japan Summer Institute program. I strongly recommend that any PhD students in science or engineering that are interested in Japan should apply for next year's program. Yes, this is an advertisement but the program was well worth it. NOTE. If you would like to apply for support to conduct research and or study in Japan (either graduate student, post doc, or faculty appointments), please contact NSF's Japan Program, at Room 1214, NSF, Washington, D.C. 20550, or by e-mail to NSFJinfo@nsf.gov (on Internet) or NSFJinfo@nsf (BitNet). The telephone number is (202) 357-9558. For more general information request the new Japan Program announcement (NSF 90-144) from NSF's Publications Unit, Room 232, NSF, Washington, D.C. 20550, or by e-mail to pubs@nsf.gov (Internet) or pubs@nsf (BitNet). NSF also provides free to libraries and reference collections a copy of a catalogue of Japanese government laboratories. The catalogue gives two-page descriptions of the organization and major research activities of each of 110 laboratories in Japan which are run by the national government, public corporations, and non-profit organizations. The catalogue will be of most use to U.S. scientists and engineers attempting to find Japanese research partners, whether for research collaboration or for visits to Japan. NSF would like to place the catalogue in collections readily accessible to scientists and engineers at U.S. Ph.D.-granting institutions. Contact the Japan Program, at the above address. Citation: Research Development Corporation of Japan (JRDC). "National Laboratories and Research Public Corporations in Japan." 200 pp. [Tokyo,] Japan, 1990. Revision of the first edition, 1987, and supplement (Part II), 1988. Another useful NSF contact is Dr. Douglas McNeal (DMCNEAL@NSF.GOV). ------------------END OF REPORT------------------------------------------