rick@cs.arizona.edu (Rick Schlichting) (06/13/91)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Asia (ONR/Asia). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp] Re: Joint Symposium on Parallel Processing '91, Kobe Japan, 14-16 May 1991 13 June 1991 This file is named "jspp.91" ABSTRACT. An overview is given of the Joint Symposium on Parallel Processing '91, held in Kobe Japan, 14-16 May 1991, as well as titles and some abstracts. Also appended are the titles/authors of IFIP Vol 33#4, which was a special issue on massively parallel computers. INTRODUCTION. The Joint Symposium of Parallel Processing is an annual research conference associated with parallel processing. Approximately 250 people attended this year's conference, which was held on an artificial island in the Kobe harbor. (Kobe is an important port city near Osaka.) There were 59 half hour papers in three parallel sessions, one panel discussion on the future of parallel processing, and two invited lectures, by C. Polychronopoulos (Illinois) and D. Gannon (Indiana). The cross section of topics was as follows. Architecture 25 papers Applications 10 Systems 9 Neurocomputing 4 Fundamentals 6 Operating systems 3 Invited papers 2 Except for the lectures by the two invited speakers all the presentations were in Japanese. A few papers are printed in English in the bound Proceedings. The titles and authors of all the papers are appended to the end of this report, as are the electronic mail addresses of many of the authors. I wish to thank the many Japanese scientists who took the time and effort to provide me with English translations of their abstracts, and these are also included, as are some comments. This report also contains the titles of papers published in a special 1990 issue of the Japan IFIP Vol 33#4, entirely devoted to massively parallel computers. The organizers told me that they made extra efforts to encourage papers with more software and application content, but that the resulting mix was still heavily weighted toward hardware. SUMMARY. I concentrated on the applications papers and discovered that there were only a very few surprises; perhaps being here a year and a half helps. One surprise was the paper on Super Data Base Computer being developed by Dr. Masaru Kitsuregawa Institute of Industrial Science University of Tokyo Roppongi, Minato-ku, Tokyo Japan Tel: +81-3-3402-6231x2356, Fax: +81-3-3479-1706 Email: a80509@tansei.cc.u-tokyo.ac.jp especially since I was part of a JTEC team here in March to study Japanese activities in the database area. Another surprise was the paper on the next generation of the ETL parallel computer (EM-5), in which it was stated emphatically that this would not be a dataflow machine in any sense. I reported on this earlier (see data.eng, 30 May 1991) where Dr. Sakai one of the designers explained that this comment was an error in the English translation. While I have reported on Japanese parallel computing in the past it is worth repeating that there are a number of highly capable parallel machines (MIMD) that are being used here for real science applications. There are also some SIMD machines, typically associated with even more specialized applications such as image, text, or speech processing. Most Japanese parallel computers are in the hands of very friendly users, or in prototype form. They have from 64 to about 1000 processors, and have peak performance of several tens of gigaflops (perhaps more when fully configured). However, thus far I have not seen any general purpose parallel computers in the sense of CM, Hypercube, etc. An exception to this is the PIE (Parallel Inference Engine) computers being developed by ICOT, but these have not been used for numerical computation. Instead parallel computers in Japan have been developed by Japanese companies with very specific applications in mind. Some examples follow. It seems to me that these companies are being very conservative about marketing parallel computers. Senior administrators in two different organizations told me that they were not sure about the market size for highly parallel machines. They felt that it was necessary to have an active research effort but would be tentative about going further. In my opinion parallel computers from NEC and Fujitsu could easily be commercialized. At the same time these two companies are very aggressively pursuing the traditional supercomputer market. In fact while I was at this meeting, NEC announced that their one processor SX-3/14 had taken first place in Dongarra's LINPACK benchmarks with 314MFLOPs for n=100, and 4.2GFLOPs for n=1000, mostly through tuning and enhancements in the Fortran system. The list of examples of parallel computing given below is definitely not exhaustive, but simply meant to suggest the level of activity. There is one Connection Machine in Japan, at the ATR lab between Kyoto and Osaka. Researchers there have been using it for speech processing related research, and while there were no papers about that work presented at this meeting, one paper appeared in the IFIP journal whose titles are listed at the end of this report. Hitachi: Developing the 64 node H2P and the parallel programming language Paragram (see parallel.903, 6 Nov 1990). An Hitachi researcher gave a talk describing various comparisons between Multigrid, Jacobi, Red-Black SOR, ADI, PCG-ICCG, and Gaussian elimination for solving the pde "div(-k gradU)=Q" on a rectangle. Hitachi also has general purpose neurocomputer with peak performance of 2.3GCUPS, worlds fastest. Practical applications like stock prediction expected in 2-3 years. Fujitsu: 1024 PE version of AP1000 to be available in 1991. At this meeting Fujitsu researchers described using AP1000 to perform molecular dynamics on the 64 node AP1000 using an adaptation of AMBER (Assisted Model Building with Energy Refinement), developed by A. Kollman at U-Cal San Francisco. Speedup with 64 processors was about 55 (86%), and they predict that with 128 processors it will be about 80%. AP1000 is the most "general purpose" of the Japanese parallel computers. See my remarks about this machine in the report (parallel.902, 6 Nov 1990). An AP1000 is installed at the Australian National University in Canberra, where I will be visiting next month, so I hope to have additional details at that time. Fujitsu also described their work on the non-numeric parallel processor, MAPLE-RP (routing processor) for laying out IC designs. In one benchmark (384x256 grid) known as the "Burnstein switch box problem" the 4096 PE MAPLE-RP ran 300 times faster than a Sun4/1. Fujitsu is responsible for the parallel inference machine of the 5th generation project. This year Fujitsu will complete a neural computer to rival Hitachi's. NEC: Steady preparations for super parallel machines, including trials for in-house semiconductor design via 64 processor Cenju. See my report on Cenju in (spice, 2 July 1990). At this meeting NEC presented a nice application of Cenju for a completely different application, plasma simulation in magneto hydrodynamics (MHD). The major issue here is solving the specially block structured linear equations that arise after the discretization. For this problem a speedup of about 40 with 64 PEs was reported. The authors also suggest that a version of Cenju with 512 processors is somewhere in the development stage. NEC Keyboarded neurocomputer being sold for PC applications. Matsushita: Developing ADENA with Kyoto University, see my report (parallel.904, 6 Nov 1990). At this meeting a description of the Fortran compiler, and the preprocessor for the special purpose language ADETRAN was given. Matsushita also has worked on OHM256, with 25GFLOPS peak performance, and may combine four of them to reach 100GFLOPS. Matsushita is also marketing a sweeper assembled with application of neurotechnology. Anritsu: The commercial version of Tsukuba University's PAX. At this meeting one talk was given analyzing the number of computations for a parallel implementation of Gaussian elimination on PAX. We reported in (chep.91, 22 May 1991) that support for a new version of PAX has been approved by the Ministry of Education. A very early version of PAX was also marketed by Mitsubishi. Prof Y. Oyanagi, one of the principal investigators from Tsukuba has just moved to Tokyo University. Professor Yoshio Oyanagi Department of Information Science Faculty of Science, University of Tokyo Hongo 7-3-1, Bunkyo, Tokyo, 113 JAPAN Tel: +81-3-3812-2111 ex. 4115, Fax: +81-3-3818-1073 Email: OYANAGI@IS.S.U-TOKYO.AC.JP Toshiba: 512 PE Prodigy. NTT: Research in using the 256 PE SIMD computer LISCAR for Japanese full text retrieval. Also NTT engages in research in applications of neurocomputers to voice recognition and automatic translation systems. NTT has also developed a 4-Kbit content addressable memory (CAM), which is being used by Waseda University , ETL, as well as NTT itself as part of a string-search chip. The universities are busy too. Several of the parallel computing projects that are now supported in companies began as university projects, including PAX and ADENA. We reported on Kyushu-U's reconfigurable parallel computer in (parallel.904, 6 Nov 1990) and that is still moving forward, although the main investigator, Professor Tomita, has just transferred to Kyoto University. Professor Shinji Tomita Dept of Information Science Kyoto University 606 Yoshidahonmachi, Sakyo-ku Kyoto, Japan Tel: +81 75 753-5373 Email: TOMITA@KUIS.KYOTO-U.AC.JP Kyushu also reported on several other projects, including a parallel rendering machine for high speed ray-tracing, a streaming FIFO processor, and a hyperscalar architecture. (This department supports an extremely large variety of projects.) Waseda University has two interesting independent projects directed by Prof. Muraoka (the Harray system and its Fortran compiler), and Prof Kasahara (Oscar system). Keio university described the experimental system ATTEMPT 10 (A Typical Testing Environment of MultiProcessing Systems) for evaluation of the communication performance of multiprocessors, and this should be followed by those in the performance evaluation area. Keio's Professor Boku presented a paper on DISTRAN (Distributed System Translator), a language for discretizing partial differential equations via explicit differencing, first into Prolog and then other languages so that they can be run on parallel machines. Finally, the government labs ETL and ICOT are very active, with ICOT especially presenting five papers on diverse topics. See my report on ICOT (data.eng, 30 May 1991). Because there are (as yet) no general purpose parallel computers from Japan, universities here are far behind in the kind of algorithmic work that is common in Western universities. There are also very few Western commercial general purpose parallel computers at Japanese universities. There is an iPSC/2 in the Information Science Department at the University of Tokyo, Alliants at the University of Tsukuba and Hiroshima, one or two BBN machines at other universities, and perhaps a few other machines scattered about, but these are the exceptions, and they are not common. (There may be more at industrial research labs.) Reliable machines like these are very useful for experimentation without having to worry too much about the system staying up. Naturally, those headaches reduce the time and resources available for development of algorithms, system software and tools, and ultimately the time available for solving real problems. There is a great deal of tool building on Unix workstations however, and much of that is directly related to parallel processing. On the other hand, there is much more system building (hardware) here than in the West and this is reflected in the mix of accepted papers for this conference. ------------------JOINT SYMPOSIUM ON PARALLEL PROCESSING '91------------- May 14-16, 1991 INVITED LECTURES---------------------------- alpha-Coral: A Control/Data Flow Multiprocessor and its Compiler Constantline D. Polychronopoulos (Center for Supercomputing Research and Development and Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign E-mail: cdp@csrd.unuc.edu Objected Oriented Parallelism: PC++Ideas and Experiments Dennis Gannon, Jenq Kueu Lee (Department of Computer Science, Indiana University, Bllmington Indiana 47401) E-mail: gannon@iuvax.cs.indiana.edu. PANEL DISCUSSION---------------------------- Research Trends on Parallel Processing Hiromu Hayashi (Information Processing Division, Fujitsu Laboratories, Ltd.) Expected Features of the Future Parallel Processing - What to do now - T. Hiraki (Tokyo University/Electro Technical Laboratory) Future Parallel Processing Systems Hironori Kasahara (Dept. of Information & Computer Sciences, Waseda University) Expected Features of the Future Parallel Processing - What to do now - M. Kitsuregawa (Institute of Industrial Science, Tokyo University) Expected Features of the Future Parallel Processing: - What to do now - Kazuo Taki (Institute for New Generation Computer Technology) E-mail: taki@icot.or.jp Future Operating Systems Yutaka Ishikawa (Electrotechnical Laboratory) DATA BASE & MEMORY--------------------------------- A Scheduling-Based Cache Coherence Scheme Masaru Takesue (NTT Software Laboratories) E-mail: takesue@lucifer.ntt.jp Implementation and Evaluation of Coherency Protocol for Virtual Shared Memory in the Network-connected Parallel Computer Hironori Nakajo, Newton Kl Miura, Yukio Kaneda (Department of Systems Engineering, Faculty of Engineering, Kobe University) Koichi Wada (Institute of Information Science and Electronics, University of Tsukuba) The parallel logic simulation is treated as a parallel event simulation. In parallel event simulation, the time keeping is important. There are two time keeping algorithms, which are conservative method and the virtual time method. As conservative method may introduce a deadlock, the means to avoid the deadlock is important. The virtual time method, although deadlock never takes place, needs a rollback operation when there occurs a time discrepancy. The authors have implemented parallel logic simulation program based on virtual time method on their parallel computer Multi-PSI, which has 64 PSI computers interconnected with orthogonal bus. The performance observed by experiment is 60 kilo events per seconds and the speed-up ratio obtained is more than 40 by using 64 processors. A comment made by Prof. Yasuura of Kyoto University, however, pointed out that even a single workstation can attain as high as 100 kilo events per seconds. Multiple Processing Module Control on SDC, The Super Database Computer S. Hirano, M. Harada, M. Nakamura, Y. Aiba, K. Suzuki, M. Kitsuregawa, M. Takagi, W. Yang (Institute of Industrial Science, University of Tokyo) E-mail: hirano@tkl.iis.u-tokyo.ac.jp SDC, The Super Database Computer is a highly parallel relational database server which serves SQL. In this paper we describe SDC's process model which is a basic framework for parallel data processing and multiple module control scheme on the framework. We have developed two module version of SDC for feasibility study, the result is also presented. SDC archived abount 30 times faster performance than Teradata DBC/1024. Full-Text Retrieval System using a SIMD Parallel Processor Sueharu Miyahara, Toshio Kondo (NTT Human Interface Laboratories, Yokosuka Kanagawa) Syunkichi Tada (NTT Intelligent Technology Corp, Naka-ku, Yokohama, Kanagawa) PARALLEL INFERENCE MACHINE--------------------------- The Architecture of the Parallel Processing Management Kernel of PIE64 Yasuo Hidaka, Hanpei Koike, Hidehiko Tanaka (Department of Electrical Engineering, Faculty of Engineering, The University of Tokyo) E-mail: {hidaka,koike,tanaka}@mtl.t.u-tokyo.ac.jp We have noticed that the overhead of parallel processing is mainly caused by communication, synchronization and parallel processing management. Therefore, we have introduced a network interface processor and a management processor into the processing element(PE) of the parallel inference engine PIE64. In this paper, the architecture of the "parallel processing management kernel" executed by the management processor will be described, focusing on how to treat parallel processing management, e.g. load distribution and scheduling, which becomes significant in fine-grained highly parallel processing. The parallel processing management kernel performs dynamic load partitioning, a part of the general load distribution process. The partitioning decision is based on parallelism, so that it eliminates excessive concurrency and reduces communication. The scheduling strategy of the kernel introduces dynamic priorities based on parallelism and room in heap memory, in order to avoid exhaustion of resources caused by explosive parallelism and also in order to increase parallelism when it is insufficient. Thus a programmer need not be concerned with parallelism explosion. It also introduces respite time in starting execution of each thread in order to reduce cost of suspension and context switching. The paper also presents a comparison of static partitioning by the compiler and dynamic partitioning by the kernel. When the parallelism exceeds the number of PEs to a high degree, the simple dynamic method with little overhead is more effective than the sophisticated static method. However, dynamic partitioning becomes ineffective if the parallelism and the number of PEs are comparable degree. We concludes that the most promising method is the composite method of both the static and dynamic methods. Evaluation of Instruction Level Parallelism on Parallel Inference Machine PIM/i Teruhiko Oohara, Koichi Takeda, Masatoshi Sato (Oki Electric Industry Co., Ltd.) The Inference Processor UNIRED II: Evaluation by Simulation Kentaro Shimada, Hanpei Koike, Hidehiko Tanaka (Department of Electrical Engineering, Faculty of Engineering, University of Tokyo) E-mail: {shimada,koike,tanaka}@mtl.t.u-tokyo.ac.jp UNIREDII is the high performance inference processor of the parallel inference machine PIE64. It is designed for the committed choice language Fleng, and for use as an element processor of parallel machines. Its main features are: 1) tag architecture, 2) three independent memory buses (instruction fetching, data reading, and data writing), 3) multi-context processing for reducing pipeline interlocking and cost of context-switching for inter-processor synchronization. In this paper, several architectural features of UNIREDII are evaluated by register transfer level simulation. High performance (over 1MLIPS) was attained, as predicted from its design, and it was indicated that three memory buses and multi- context processing are yielding improved performance. DEDICATED MACHINE------------------------------- Image Logic Algebra (ILA) and its Optical Implementations Masaki Fukui, Kenichi Kitayama (NTT Transmission Systems Laboratories) A Single-Chip Vector-Processor Prototype Based on Streaming/FIFO Architecture - Evaluation of Macro Operation, Vector-Scalar Cooperation and Terminating Vector Operations Takashi Hashimoto, Keizou Okazaki, Tetsuo Hironaka, Kazuaki Murakami (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) Shinji Tomita (Kyoto University) E-mail: {hashimot,keizo,hironaka,murakami}@is.kyushu-u.ac.jp A Parallel Rendering Machine for High Speed Ray-Tracing - Instruction- Level Parallelism in the Macropipeline Stages Seiji Murata, Oubong Gwun, Kazuaki Murakami (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) Shinji Tomita (Kyoto University) E-mail: {murata,gwun,murakami}@is.kyushu-u.ac.jp SUPERSCALAR ARCHITECTURE---------------------------- A Pipeline Architecture for Parallel Processing Across Basic Blocks Toshikazu Marushima, Naoki Nishi, Ryosei Nakazaki (NEC Corporation) Kenji Ohsawa (NEC Scientific Information System Development Ltd.) DSNS Processor Prototype - Evaluation of the Architecture and the Effect of Static Code Schedule Akira Noudomi, Morihiro Kuga, Kazuaki Murakami (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) Tetsuya Hara (Mitsubishi Electric Co.) Shinji Tomita (Kyoto University) E-mail: {noudomi,kuga,murakami}@is.kyushu-u.ac.jp Hyperscalar Processor Architecture - The Fifth Approach to Instruction-Level Parallel Processing Kazuaki Murakami (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) E-mail: murakami@is.kyushu-u.ac.jp DATA FLOW MACHINE----------------------------- Evaluation of Parallel Performance on Highlly Parallel Computer EM-4 Yuetsu Kodama, Shuichi Sakai, Yoshinori Yamaguchi (Electrotechnical Laboratory) E-mail: saka@etl.go.jp Architectural Design of a Parallel Supercomputer EM-5 (English) Shuichi Sakai, Yuetsu Kodama, Yoshinori Yamaguchi (Electrotechnical Laboratory) Email: sakai@au-bon-pain.lcs.mit.edu (or) sakai@etl.go.jp This paper describes an architecture of a parallel supercomputer EM-5. The EM-5 design objective is to construct a feasible parallel supercomputer whose target performance is over 1 TFLOPS. The design principles of the EM-5 are: (1) an object-oriented data-driven model; (2) an advanced direct matching scheme; (3) a highly fused pipeline; (4) a RISC processor EMC-G for a highly parallel computer; (5) a functional interconnection network; and (6) a maintenance architecture which can provide real-time monitoring facilities. After examining these features, this paper shows the architectural design of the EM-5, whose target structure will have 16,384 processing elements and whose peak performance is about 655 GIPS and 1.3 TFLOPS (double precision). A Scheme to Reduce the Access Rate to Shared Memory for the Parallel Processing System - Harray Hayato Yamana, Satoshi Ohdan, Yoichi Muraoka (School of Science and Engineering, Waseda Universuty) Email: muraoka@jpnwas00.bitnet INTERCONNECTION NETWORK------------------------ An Approach to Realizing a Reconfigurable Interconnection Network Using Field Programmable Gate Arrays Toshinori Sueyoshi, Itsujiro Arita (Kyushu Institute of Technology) Kouhei Hano (Kyocera Inc.) E-mail: sueyoshi@ai.kyutech.ac.jp We present a new reconfigurable interconnection network utilizing the reconfigurability facilities of FPGA (Field Programmable Gate Array), a kind of programmable logic LSI. Reconfiguration for the desired connections on our proposed reconfigurable interconnection network is performed by programming the configuration data to each FPGA, so that it can be directly implemented without simulation to both: the static networks such as mesh and hypercube networks, and dynamic networks such as baseline and omega networks. Consequently, the optimum connections for interprocess communications or memory reference patterns in executing application programs over the reconfigurable multiprocessor can be configured adaptively by programming. Integrated Parallelizing Compiler - Network Synthesizer Hiroki Akaboshi, Kazuaki murakami, Akira Fukuda (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) Shinji Tomita (Kyoto University) E-mail: {akaboshi,murakami,fukuda}@is.kyushu-u.ac.jp Evaluation for Various Implementation of base-m n-cube Network Yasushi Kawakura, Noboru Tanabe, Takashi Suzuoka (Toshiba Research and Development Center) MULTIPROCESSOR I--------------------------- A Node Processor for the A-NET Multicomputer and its Execution Scheme Tsutomu Yoshinaga, Mitsuru Suzuki, Takashi Teraoka, Hisashi Mogi, Takanobu Baba (Department of Information Science, Faculty of Engineering, Utsunomiya University) E-mail: yoshi@infor.utsunomiya-u.ac.jp The node processor of the A-NET parallel object-oriented computer consists of a 40-bit processing element (PE) which executes methods of allocated objects, a router which determines the path of a message or transfers an object code, and 320KB of local memory. We chose a high-level machine instruction set and a tagged architecture for the PE, so that it may include supporting hardware units like an instruction preprocessing unit and a tag processing unit. The organization of the router is independent to the network-topology, so that the message routing algorithm is programmable. The other feature of the router is that it uses adaptable cut-through routing for the packet switching, and circuit-switching object code transfer as well. Performance Comparison of Parallel Wire-routing on Distributed Multiprocessors and Shared Memory Multiprocessors Masahiko Sano, Yoshizo Takahashi (Department of Information Science and Intelligent Systems, Faculty of Engineerig, Tokushima University) E-mail: sano,taka@n30.is.tokushima-u.ac.jp The Performance Evaluation of Communication Mechanism of Multiprocessor Test Bed ATTEMPT Sunao Torii, Hideharu Amano (Department of Computer Science, Keio University) MULTIPROCESSOR II------------------------------ Functional Memory Type Parallel Processors FMPP on a CAM and its Applications Hiroto Yasuura, Akihiro Watanabe, Ryugo Sadachi, Keikichi Tamaru (Department of Electronics, Kyoto University) Demand/Accept Control Mechanism and Hardware of a Parallel Computer Masaki Tomisawa (Department of Computer Science, Faculty of Technology, Tokyo Univ. of Agr. and Tech.) KRPP: Kyushu University Reconfigurable Parallel Processor Naoya Tokunaga, Shinichiro Mori, Kazuaki Murakami, Akira Fukuda (Interdisciplinary Graduate School of Engineering Sciences, Kyushu University) Tomoo Ueno (Kyushu Nippon Electric Co.) Eiji Iwata (Sony Co.) Koji Kai (Matsushita Electric Ind. Co.) Shinji Tomita (Kyoto University) E-mail: {tokunaga,mori,murakami,fukuda}@is.kyushu-u.ac.jp PARALLEL LANGUAGE------------------------------ Distributed Implementation of Stream Communication in A'UM-90 Koichi Konishi, Tsutomu Maruyama, Akihiko Konagaya (C&C Systems Research Laboratories, NEC Corporation) Kaoru Yoshida, Takashi Chikayama (Institute for New Generation Computer Technology) Intra-object Parallelism on Parallel Object Oriented Languages Minoru Yoshida, Hidehiko Tanaka (Faculty of Engineering, University of Tokyo) E-mail: {minoru,tanaka}@mtl.t.u-tokyo.ac.jp Intra-object parallelism is important because server objects must process many messages in short time and because concurrency in an object makes its implementation easy. The paper presents a model, in which messages are interpreted parallelly and instance variables are accessed instantaneously. These two points were chief sequentiality in intra-object parallel processing. Using single- assigned variables, instance variables can be accessed for an instant. A language based on the model is also introduced. Because the order of messages does not matter, it has the expressive power for natural concurrent programming using an atomic access to instance variables. Hyper DEUB: A Multiwindow Debugger for Parallel Logic Programs and Committed-Choice Language Junichi Takemura, Hanpei Koike, Hidehiko Tanaka (Faculty of Engineering, The University of Tokyo) E-mail: {tatemura,koike,tanaka}@mtl.t.u-tokyo.ac.jp The debugging of parallel programs is more difficult than that of sequential programs. Since a Committed-Choice Language (CCL), which is a kind of parallel logic programming language, enables fine-grained highly parallel execution, it is very hard to examine and to manipulate its numerous complicated control/data flows. A debugger, whose role is to show users a model abstracted from execution of a program, needs a model to represent execution of fine-grained highly parallel program. To represent execution of a CCL program, we propose a communicating process model which has flexible levels and aspects of abstraction. Our debugger represents this model. A parallel program has multiple complicated control/data flows which are considered to be high-dimensional information. Therefore, a high-dimensional interface is necessary to debug it. Since a user compares a model represented by a debugger with expected behavior of the program in order to find a bug in the program, the debugger must provide the kind of view he/she wants. Accordingly, the debugger must provide views which have flexible levels and aspects of abstraction. We developed a multiwindow debugger HyperDEBU which provides a high-dimensional interface. HyperDEBU provides windows flexible enough for programmers to examine and manipulate complicated structures composed of multiple control/data flows. PARALLEL SYSTEM/ EVALUATION---------------------------- On the Real Number Index Sperce Array in the Dataflow Stream Language VISDAL Hirohisa Mori, Kazuhiko Kato, Hiroaki Takada (Dept. of Information Science, Faculty of Science, University of Tokyo) Quantitative Evaluation of Several Synchronization Mechanisms Based on Static Scheduling and Fuzzy Barriere Hiromitsu Takagi, Takaya Arita, Masahiro Sowa (Department of Electrical Engineering and Computer Science, Nagoya Institute of Technology) E-mail: takagi@craps.elcom.nitech.ac.jp Parallel Garbage Collection on a Shared Memory Multi-Processor and its Evaluation Akira Imai (Institute for New Generation Computer Technology) Evan Tick (Univ. of Oregon) Katsuto Nakajima (Mitsubishi Electric Co.) Atsuhiro Goto (NTT) PARALLELIZING COMPILER-------------------------- Prototype FORTRAN to Data Flow-Compile for Parallel Processing System - Harray Toshiaki Yasue, Jun Kohdate, Hayato Yamana, Yoichi Muraoka (School of Science and Engineering, Waseda University) E-mail: yasu@muraoka.info.waseda.ac.jp APARC: Parallelizing Compiler for Parallel Computer ADENART Koji Zaiki, Akiyoshi Wakatani, Tadashi Okamoto (Matsushita Electric Industrial Co., Ltd., Semiconductor Research Center)) Shigeru Kuroda (Matsushita Softresearch, Inc.) E-mail: zaiki@vdrl.src.mei.co.jp The parallelizing compiler, APARC translates FORTRAN programs into ADETRAN programs that are high level parallel language for the parallel computer ADENART. Mainly APARC changes do loops into parallel executable codes by control flow analysis and data dependence analysis. ADENART has a fast data communication network between PE's(Processing Element) and synchronization mechanism. APARC uses this advantages in parallelization. Especially, even if do loops have goto statements that branch out of do loops, they can be changed into parallel executable codes by APARC with exception handling routines inserted. Now, a prototype version of APARC is available, and some applications can be translated. In the near future, we will make APARC available for many applications. DISTRAN System (Distributed Systems Translator) Implementation on Parallel Computers Kiyohiro Suzuki, Nobuyuki Yamasaki, Takao Yumiba, Kaoru Murata, Taisuke Boku (Faculty of Science and Technology, Keio University) Email: taisuke@kw.phys.keio.ac.jp When solving problems described with partial differential equations, the most general method is to discritize the space and time domains, and calculate all spatial domains step by step. This method requires a large amount of calculation if the density of the mesh is high enough to get accurate solutions. However, all spatially discritized domains can be calculated in parallel, and it is possible to achieve high performance when calculating them on large scale multiprocessors. DISTRAN is a partial differential equation solver on parallel processors using this method. With DISTRAN, a user can solve the problem only describing a very simple form of problem specification, consisting of the original partial differential equations, boundary and initial conditions, and domain information. No actual programming by the user is needed. DISTRAN analyzes the given equations and checks their consistency. The problem domain is discritized automatically, and all spatial points and boundaries are calculated to satisfy given conditions. Finally, DISTRAN generates a program to solve the problem on a sequential or parallel processor. Currently, we have implemented three versions of DISTRAN for three types of parallel processors, MiPAX-32 [a commercial version of U-Tsukuba's PAX], QCDPAX and a Transputer system. The first two machines are based on a shared memory and global synchronization mechanism. The last one is based on message passing links. We calculated the same problem on each system, and confirmed that DISTRAN achieves actual high performance. In this paper, we describe how to design and implement such an automated programming and solving system on several types of multiprocessors. We also show the actual performance of each system and evaluate the calculation efficiency by DISTRAN. PARALLEL OS------------------------------- A Testbed OS for Evaluation of Parallel Algorithms Takahiro Yakoh, Yuichiro Anzai (Department of Computer Science, Keio University) Parallel Processings in OS Kernel by the Process Network Architecture Yasuichi Nakayama, Iwao Morishita (University of Tokyo) Kazuya Tago (IBM Japan Tokyo Laboratories) E-mail: yasu@meip7s.t.u-tokyo.ac.jp A parallel operating system has been designed and implemented on a loosely-coupled multiprocessor system employing the process network architecture. The operating system consists of a number of light-weight processes interconnected by rendezvous communications and is compatible with the UNIX system. It has been shown that when this process network is distributed on multiple computer units with an optimum assignment, some processes can run in parallel with the others. In this paper we consider parallel processings in OS kernel in order to improve the response of a system call. On Paralleling Transaction Processes by Exchanging Messages Haruo Yokota, Yasuo Noguchi, Riichiro Take (Fujitsu Laboratories, Ltd.) NUMERIC PROCESSING Parallelizing Gaussian Elimination on PAX Kimio Takahashi (Scientific Technology, Tsukuba Univ.) Study on the Algorithms for Matrix Solver on Massively Parallel Computer Mitsuyoshi Igai (Hitachi VLSI Engineering Corp.) Toshio Okouchi, Chisato Konno (Central Research Lab, Hitachi, Ltd.) Molecular Dynamics Simulation on a Highly Parallel Computer AP1000 Yoshiyuki Sato (Computer-Based Systems Lab., Fujitsu Labs Ltd.) E-mail: hsat@flab.fujitsu.co.jp Yasumasa Tanaka (Fujitsu Ltd.) Hiroshi Iwama, Shigetsugu Kawakita, Minoru Saito, Kenji Morikami, Toru Yao (Protein Engineering Research Institute) Shigenoru Tsutsumi, Hideaki Yoshijima (Fujitsu Kyushu System Engineering) Parallel Nonlinear MHD Plasma Simulator Satoshi Matsushita, Nobuhiko Koike (NEC Corporation) Masaru Narusawa (NEC Scientific Information System Development Ltd.) Genichi Kurita, Toshihide Tsunematsu, Tatsuoki Takeda (Japan Atomic Energy Research Institute) Email: {matsushita, koike}@csl.cl.nec.co.jp AEOLUS is a non-linear Plasma simulator for instability (called disruption) analysis of Tokamak Plasma in a Nuclear Fusion Reactor, which is very time consuming. As most of AEOLUS's calculation is non-linear, it employs explicit time integration. However, by applying an implicit method to the linear part, we have improved its convergence. We tried to parallelize the AEOLUS code developed and tuned for a vector machine at the Japan Atomic Energy Research Institute. The vector code ran 6 to 7 times faster than its scalar counterpart. The small parallelism in the implicit part limits the speed-up. We propose a novel parallel algorithm for MIMD parallel machines, and successfully parallelized the implicit part of the simulation. We have achieved a speed-up of 42 using the 64 processor Cenju. (Cenju is a multiprocessor system with a distributed shared memory scheme developed mainly for circuit simulation. Cenju is designed for effective execution of our modular circuit simulation algorithms.) (References follow.) 1. T.Takeda, K.Tani, S.Matsushita, et al.: Plasma Simulator METIS and Tokamaku Plasma Analysis, US-Japan Workshop on Advances in Computer Simulation Techniques Applied to Plasma and Fusion, (1990). 2. T. Nakata et. al: Cenju: A Multiprocessor System with a Distributed Shared Memory Scheme for Modular Circuit Simulation, Proc. International Symposium on Shared Memory Multiprocessing , pp.82-90, April (1991). COMPUTER AIDED DESIGN FOR LARGE SCALE INTEGRATION-------------- Parallel Logic Simulation based on Virtual Time Yukinori Matsumoto, Kazuo Taki (Institute for New Generation Computer Technology) Email: yumatumo@icot.or.jp Author's abstract: This paper focuses on parallel logic simulation. An efficient logic simulation system on a large-scale multiprocessor is targeted. The Time Warp mechanism, an optimistic approach, was experimented and evaluated though it has been said that rollback processes costed much. The system is implemented on the Multi-PSI, a distributed memory multiprocessor. It includes several new ideas to enhance the performance, such as local message scheduler, antimessage reduction mechanism and load distribution scheme. In our experiment, using 64 processors, about 48-fold speedup was attained and the performance of the whole system amounted to about 60 k events/sec that is fairly good as a full software simulator. Then this paper reports the empirical comparison between the Time Warp mechanism and two conservative mechanisms: an asynchronous approach using null messages and a synchronous approach. The comparison shows that the Time Warp mechanism will be the most efficient of the three, and could be the most suitable for large-scale multiprocessors. [Comment: The parallel logic simulation is treated as a parallel event simulation. In parallel event simulation, the time keeping is important. There are two time keeping algorithms, which are conservative method and the virtual time method. As conservative method may introduce a deadlock, the means to avoid the deadlock is important. The virtual time method, although deadlock never takes place, needs a rollback operation when there occurs a time discrepancy. The authors have implemented parallel logic simulation program based on virtual time method on their parallel computer Multi-PSI, which has 64 PSI computers interconnected with orthogonal bus. The performance observed by experiment is 60 kilo events per seconds and the speed-up ratio obtained is more than 40 by using 64 processors. A comment made by Prof. Yasuura of Kyoto University, however, pointed out that even a single workstation can attain as high as 100 kilo events per seconds.] Massively Parallel Layout Engine - Routing Processor K. Kawamura, T. Shindo, T. Shibuya, H. Miwatari, Y. Ohki, T. Doi (Computer-Based Systems Lab., Fujitsu Laboratoties Ltd.) The authors have developed a new algorithm called the constrained relaxational maze running algorithm for automated wire-routing. In this method the intersection of nets are allowed but is evaluated by a cost function. By iterating the routing by decrementing the cost of penalty, the optimum routings are finally obtained. They have built a massively parallel computer to implement this algorithm. This machine is called MAPLE-RP, which has 8K 1bit PU connected in lattice and operate in SIMD. The performance is 40 GOPS when 64K PU are used. The performances of routing rate and the routing speed were observed quite satisfactory. A Parallel Router based on a Concurrent Object-oriented Model Hiroshi Date, Yoshihisa Ohtake, Kazuo Taki (Institute for New Generation Computer Technology) E-mail: date@icot.or.jp Author's abstract: The design of LSI routing is well known as a process theat requires massive computational power. So speedup using parallel processing leads to a shortening in the LSI design period. This paper presents a new parallel router based on a concurrent object-oriented model. The objects corresponding to line segments find the path between terminals by exchanging messages with each other. This method has high parallelisms. The searching algorithm of our model is based on a look-ahead line search algorithm. We implemented this algorithm using the KL1 language on Multi-PSI. We have been verifying our router using real LSI data, the initial results are described. [Comment. This paper presents a parallel routing algorithm based on look-ahead line-search algorithm and the result of speedup obtained by running the program on their parallel computer Multi-PSI. The algorithm is based on the object-oriented model in the sense that each net is considered an object which exchange messages to avoid intersection. Although the obtained speedup was favorable, the routing rate was not.] ARTIFICIAL INTELLIGENCE/DATA BASE------------------------ A Parallel Processing Feature of a DBMS with SCMP for OLTP Kazumi Hayashi, Kazuhiko Saitoh, Tomohiro Hayashi, Masaaki Mitani, Hiroshi Ohsato, Takashi Obata, Yutaka Sekine, Mitsuhiro Ura, Takuji Ishii (2nd Software Division, Computer System Group, Fujitsu Ltd.) Parallel Dynamic Map Construction and Navigation in Real-Time for Autonomous Robots (ENGLISH) Martin Nilsson (Swedish Institute of Computer Science, Box 1263, S-164 28 Kista, Sweden) E-mail: mn@sics.se Real-time map construction and navigation are complex and computationally intensive tasks, but contain much potential parallelism. This paper describes how programming techniques based on committed-choice languages can be used to both concisely express algorithms for such problems, and extract their parallelism. Parallel Processing of ATMS on the Heterogeneous Distributed System NueLinda Hiroshi G. Okuno (NTT Basic Research Laboratories) Osamu Akashi, Kenichiro Murakami, Yoshiji Amagi (NTT Software Laboratories) E-mail: okuno@ntt-20.ntt.jp, murakami@ntt-20.ntt.jp, akashi@toshi.ntt.jp, amagi@nuesun.ntt.jp We have proposed NueLinda computation model which integrates various heterogeneous distributed systems and provides computing and data resources in a transparent and uniform manner. On the NueLinda model, We have designed and implemented TAO-Linda on the Lisp machine. ATMS (Assumption-based Truth Maintenance System) is an intelligent data base in the sense that it maintains the support sets for each data. A conventional database can contain only one consistent context of data, while the ATMS provides to the inference engine the multiple-context mechanism. ATMS is considered as one of the essential facilities for AI systems of the next generation and its execution speed needs to be improved drastically. In this paper, we discuss about the parallel processing of ATMS with TAO-Linda and compare the resulting implementation with the parallel processing of ATMS on a shared-memory machine. PARALLEL COMPUTING MODEL-------------------------- Message-flow: A New Computation Model for MIMD-type Parallel Machines Hiroaki Fujii (Hitachi Ltd.) Kiyoshi Shibayama (Faculty of Engineering, Kyoto University) A Hybrid Group Reflective Architecture for Object-Oriented Concurrent Programming Takuo Watanabe, Satoshi Matsuoka, Akinori Yonezawa (Department of Information Science, The University of Tokyo) E-mail: {takuo,matsu,yonezawa}@is.s.u-tokyo.ac.jp The benefits of computational reflection are the abilities to reason and alter the dynamic behavior of computation from within the language framework. This is more beneficial in concurrent/distributed computing, where the complexity of the system is much greater compared to sequential computing; we have demonstrated various benefits in our past research of Object- Oriented Concurrent Reflective (OOCR) architectures. Unfortunately, attempts to formulate reflective features provided in practical reflective systems, such as resource management, have led to some difficulties in maintaining the linguistic lucidity necessary in computational reflection. The primary reason is that previous OOCR architectures lack the ingredients for group-wide object coordination. We present a new OOCR system architecture called "Hybrid Group Reflective Architecture (HGRA)", and a new language ABCL/R2 based on this architecture. The key features of ABCL/R2 are the notion of heterogeneous object groups and coordinated management of group shared computational resources. We describe how such management can be effectively modeled and adaptively modified/controlled with the reflective features of ABCL/R2. We also illustrate that this architecture is totally defined in meta-circular way (not adopting ad-hoc primitives), embodying two directions of reflective towers. Towards Realistic Type Inference for Guarded Horn Clauses (ENGLISH) Dongwook Shin (Fujitsu Laboratories, IIAS) E-mail: shin@iias.flab.fujitsu.co.jp This paper proposes a type inference system for Guarded Horn Clauses, GHC, based on the notion of value and communication type. A value type is a type that a predicate can have, guaranteeing that a goal predicate of the value type does not raise type errors at run time. A communication type is a type under which several predicates communicate with one another. These types are obtained by constraint solving and the pre-evaluation of a GHC program to some extent. We are expecting that these types contribute to the early detection of errors in GHC program development. ALGORITHMS---------------------------- A Process Control Scheme for Distributed Processing Systems Using Weighted Throw Counting Kazuaki Rokusawa (Systems Laboratory, OKI) E-mail: rokusawa@okilab.oki.co.jp (or) rokusawa@icot.or.jp Nobuyuki Ichiyoshi (Institute for New Generation Computer Technology) E-mail: ichiyoshi@icot.or.jp This paper proposes a new scheme for aborting/stopping/restarting (in general, changing the execution state of) a pool of processes in a distributed environment where there may be processes in transit. The scheme guarantees that all processes belonging to the pool change state and to detect the completion of state change, and works under FIFO and non-FIFO communication. It uses broadcasting and weighted throw counting, and only requires a few words per processor per process pool. Sort m Smallest Elements Problem on a Linearly Connected Processor Array with Multiple Buses Satoshi Fujita, Masafumi Yamashita, Tadashi Ae (Faculty of Engineering, Hiroshima University) Time Bounds for Sorting and Routing Problems on Mesh-Bus Computers Kazuo Iwama, Eiji Miyano (Faculty of Engineering, Kyushu University) Yahiko Kambayashi (Faculty of Engineering, Kyoto University) SUPER PARALLEL APPROXIMATE COMPUTING MODEL-------------------- Fuzzy 0-1 Combinatorial Optimization through Neural Networks Masatoshi Sakawa, Toru Mitani (Department of Industrial and Systems Engineering, Faculty of Engineering, Hiroshima University) Kazuya Sawada (Information System Center, Matsushita Electric Works, Ltd.) E-mail: sakawa@msl.sys.hiroshima-u.ac.jp Dynamic Modification of the Free Energy Function Improves Ability to Find Good Solutions on a Hopfield Neural Networks Yutaka Akiyama, Tatsumi Furuya (Electrotechnical Laboratory) E-mail: yakiyama@etl.go.jp Four novel techniques for global optimization on a Hopfield neural network are proposed. The sharpening method dynamically modifies the gain of the neuron's input/output function. The excess bias method provides an excessive input bias to improve the energy "landscape". The emphasizing method dynamically changes balance among constraints. And the annealing method controls randomness in the stochastic Hopfield model (the Gaussian Machine). By combining these techniques, the neural network shows excellent ability to solve optimization problems. The Chain Reaction in Adaptive Junction Networks Yoshiaki Ajioka, Yuichiro Anzai (Department of Computer Science, Keio University) E-mail: ajioka@aa.cs.keio.ac.jp Although Neural Networks are useful for pattern recognition, they are not common for sequential processing. We made Adaptive Junction, which is a feedback-type neural network recognizing spatio-temporal patterns. This paper proves that Adaptive Junction networks can perform the chain reaction for any spatio-temporal patterns when each neuron has a 1-degree feature pattern. From this result, the order of the number of neurons desired to recognize some spatio-temporal patterns becomes clear in Adaptive Junction networks. A Genetic Algorithms Approach to How to Represent the Basin of Associative Memory Model Keiji Suzuki, Yukinori Kakazu (Department of Engineering, Hokkaido University) ------------------------------------------------------------------------ INFORMATION PROCESSING SOCIETY OF JAPAN Vol 32, No. 4 SPECIAL ISSUE ON MASSIVELY PARALLEL COMPUTERS AND APPLICATIONS The Way to Massively Parallel Computers Takanobu Baba (Department of Information Science, Utsnomiya University) Realization Technologies for Massively Parallel Machines Shigeru Oyanagi, Noboru Tanabe (Toshiba R&D Center) Super-parallel Computer ADENA for Scientific Simulation Tatsuo Nogi (Division of Applied Systems Science, Faculty of Engineering, Kyoto University) Neural Network Model Processing on Massively Parallel Computers Noboru Sonehara, Makoto Hirayama (ATR Auditory and Visual Research Laboratories) Commercial Massive Parallel SIMD Computer and its Application Masaru Kitsuregawa (Institute of Industrial Science, University of Tokyo) Taiichi Yuasa (Toyohashi University of Technology) Logic Programming Oriented Inference Machine Hidehiko Tanaka (Department of Electrical Engineering, University of Tokyo) Implementation for Sequential Logic Programming Languages Minoru Yokota (Computer System Research Laboratory, C&C Systems Research Laboratories, NEC Corporation) Parallel Implementation Schemes of Logic Programming Languages Nobuyuki Ichiyoshi (Institute for New Generation Computer Technology) Architecture of Sequential Inference Machine Yukio Kaneda, Hideo Matsuda (Dept. of Systems Engineering, Faculty of Engineering, Kobe University) Parallel Inference Machine Architecture Atsuhiro Goto (Software Research Laboratory, NTT Software Laboratories) -----------------------END OF REPORT------------------------------------ -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell