ABSTRACT. An overview is given of the Joint Symposium on Parallel 
Processing '91, held in Kobe Japan, 14-16 May 1991, as well as titles and 
some abstracts.  Also appended are the titles/authors of IFIP Vol 33#4, 
which was a special issue on massively parallel computers.  

The Joint Symposium of Parallel Processing is an annual research 
conference associated with parallel processing. Approximately 250 people 
attended this year's conference, which was held on an artificial island 
in the Kobe harbor. (Kobe is an important port city near Osaka.) There 
were 59 half hour papers in three parallel sessions, one panel discussion 
on the future of parallel processing, and two invited lectures, by C.  
Polychronopoulos (Illinois) and D. Gannon (Indiana).  The cross section 
of topics was as follows.  

      Architecture              25 papers
      Applications              10
      Systems                    9
      Neurocomputing             4
      Fundamentals               6
      Operating systems          3
      Invited papers             2

Except for the lectures by the two invited speakers all the presentations 
were in Japanese. A few papers are printed in English in the bound 
Proceedings. The titles and authors of all the papers are appended to the 
end of this report, as are the electronic mail addresses of many of the 
authors. I wish to thank the many Japanese scientists who took the time 
and effort to provide me with English translations of their abstracts, 
and these are also included, as are some comments. This report also 
contains the titles of papers published in a special 1990 issue of the 
Japan IFIP Vol 33#4, entirely devoted to massively parallel computers.  

The organizers told me that they made extra efforts to encourage papers 
with more software and application content, but that the resulting mix 
was still heavily weighted toward hardware.  

I concentrated on the applications papers and discovered that there were 
only a very few surprises; perhaps being here a year and a half helps.  
One surprise was the paper on Super Data Base Computer being developed by 
                Dr. Masaru Kitsuregawa
                Institute of Industrial Science
                University of Tokyo
                Roppongi, Minato-ku, Tokyo Japan
                Tel: +81-3-3402-6231x2356, Fax: +81-3-3479-1706
                Email: a80509@tansei.cc.u-tokyo.ac.jp
especially since I was part of a JTEC team here in March to study 
Japanese activities in the database area. Another surprise was the paper 
on the next generation of the ETL parallel computer (EM-5), in which it 
was stated emphatically that this would not be a dataflow machine in any 
sense. I reported on this earlier (see data.eng, 30 May 1991) where Dr.  
Sakai one of the designers explained that this comment was an error in 
the English translation.  

While I have reported on Japanese parallel computing in the past it is 
worth repeating that there are a number of highly capable parallel 
machines (MIMD) that are being used here for real science applications. 
There are also some SIMD machines, typically associated with even more 
specialized applications such as image, text, or speech processing.  Most 
Japanese parallel computers are in the hands of very friendly users, or 
in prototype form. They have from 64 to about 1000 processors, and have 
peak performance of several tens of gigaflops (perhaps more when fully 
configured).  However, thus far I have not seen any general purpose 
parallel computers in the sense of CM, Hypercube, etc.  An exception to 
this is the PIE (Parallel Inference Engine) computers being developed by 
ICOT, but these have not been used for numerical computation.  Instead 
parallel computers in Japan have been developed by Japanese companies 
with very specific applications in mind.  Some examples follow. It seems 
to me that these companies are being very conservative about marketing 
parallel computers.  Senior administrators in two different organizations 
told me that they were not sure about the market size for highly parallel 
machines. They felt that it was necessary to have an active research  
effort but would be tentative about going further. In my opinion parallel 
computers from NEC and Fujitsu could easily be commercialized. At the 
same time these two companies are very aggressively pursuing the 
traditional supercomputer market. In fact while I was at this meeting, 
NEC announced that their one processor SX-3/14 had taken first place in
Dongarra's LINPACK benchmarks with 314MFLOPs for n=100, and 4.2GFLOPs for 
n=1000, mostly through tuning and enhancements in the Fortran system.  
The list of examples of parallel computing given below is definitely not 
exhaustive, but simply meant to suggest the level of activity. There is 
one Connection Machine in Japan, at the ATR lab between Kyoto and Osaka. 
Researchers there have been using it for speech processing related 
research, and while there were no papers about that work presented at 
this meeting, one paper appeared in the IFIP journal whose titles are 
listed at the end of this report.  

   Hitachi: Developing the 64 node H2P and the parallel programming 
   language Paragram (see parallel.903, 6 Nov 1990). An Hitachi 
   researcher gave a talk describing various comparisons between 
   Multigrid, Jacobi, Red-Black SOR, ADI, PCG-ICCG, and Gaussian 
   elimination for solving the pde "div(-k gradU)=Q" on a rectangle.
   Hitachi also has general purpose neurocomputer with peak performance 
   of 2.3GCUPS, worlds fastest.  Practical applications like stock 
   prediction expected in 2-3 years.  

   Fujitsu: 1024 PE version of AP1000 to be available in 1991. At this 
   meeting Fujitsu researchers described using AP1000 to perform 
   molecular dynamics on the 64 node AP1000 using an adaptation of AMBER 
   (Assisted Model Building with Energy Refinement), developed by A. 
   Kollman at U-Cal San Francisco. Speedup with 64 processors was about 
   55 (86%), and they predict that with 128 processors it will be about 
   80%. AP1000 is the most "general purpose" of the Japanese parallel 
   computers.  See my remarks about this machine in the report 
   (parallel.902, 6 Nov 1990).  An AP1000 is installed at the Australian 
   National University in Canberra, where I will be visiting next month, 
   so I hope to have additional details at that time.  Fujitsu also 
   described their work on the non-numeric parallel processor, MAPLE-RP 
   (routing processor) for laying out IC designs. In one benchmark 
   (384x256 grid) known as the "Burnstein switch box problem" the 4096 PE 
   MAPLE-RP ran 300 times faster than a Sun4/1. Fujitsu is responsible 
   for the parallel inference machine of the 5th generation project.  
   This year Fujitsu will complete a neural computer to rival Hitachi's.  

   NEC: Steady preparations for super parallel machines, including trials 
   for in-house semiconductor design via 64 processor Cenju. See my 
   report on Cenju in (spice, 2 July 1990). At this meeting NEC presented 
   a nice application of Cenju for a completely different application, 
   plasma simulation in magneto hydrodynamics (MHD). The major issue here 
   is solving the specially block structured linear equations that arise 
   after the discretization. For this problem a speedup of about 40 with 
   64 PEs was reported. The authors also suggest that a version of Cenju 
   with 512 processors is somewhere in the development stage.  NEC 
   Keyboarded neurocomputer being sold for PC applications.  

   Matsushita: Developing ADENA with Kyoto University, see my report 
   (parallel.904, 6 Nov 1990). At this meeting a description of the 
   Fortran compiler, and the preprocessor for the special purpose 
   language ADETRAN was given.  Matsushita also has worked on OHM256, 
   with 25GFLOPS peak performance, and may combine four of them to reach 
   100GFLOPS. Matsushita is also marketing a sweeper assembled with 
   application of neurotechnology.  
   Anritsu: The commercial version of Tsukuba University's PAX. At this 
   meeting one talk was given analyzing the number of computations for a 
   parallel implementation of Gaussian elimination on PAX. We reported in 
   (chep.91, 22 May 1991) that support for a new version of PAX has been 
   approved by the Ministry of Education. A very early version of PAX was 
   also marketed by Mitsubishi. Prof Y. Oyanagi, one of the principal 
   investigators from Tsukuba has just moved to Tokyo University.  
                Professor Yoshio Oyanagi
                Department of Information Science
                Faculty of Science, University of Tokyo
                Hongo 7-3-1, Bunkyo, Tokyo, 113 JAPAN
                Tel: +81-3-3812-2111 ex. 4115, Fax: +81-3-3818-1073
                 Email: OYANAGI@IS.S.U-TOKYO.AC.JP

   Toshiba: 512 PE Prodigy.

   NTT: Research in using the 256 PE SIMD computer LISCAR for Japanese 
   full text retrieval.  Also NTT engages in research in applications of 
   neurocomputers to voice recognition and automatic translation systems.  
   NTT has also developed a 4-Kbit content addressable memory (CAM), 
   which is being used by Waseda University , ETL, as well as NTT itself 
   as part of a string-search chip.  

The universities are busy too. Several of the parallel computing projects 
that are now supported in companies began as university projects, 
including PAX and ADENA. We reported on Kyushu-U's reconfigurable 
parallel computer in (parallel.904, 6 Nov 1990) and that is still moving 
forward, although the main investigator, Professor Tomita, has just 
transferred to Kyoto University.  
                Professor Shinji Tomita
                Dept of Information Science
                Kyoto University
                606 Yoshidahonmachi, Sakyo-ku
                Kyoto, Japan
                 Tel: +81 75 753-5373
                 Email: TOMITA@KUIS.KYOTO-U.AC.JP
Kyushu also reported on several other projects, including a parallel 
rendering machine for high speed ray-tracing, a streaming FIFO processor, 
and a hyperscalar architecture. (This department supports an extremely 
large variety of projects.) Waseda University has two interesting 
independent projects directed by Prof.  Muraoka (the Harray system and 
its Fortran compiler), and Prof Kasahara (Oscar system). Keio university 
described the experimental system ATTEMPT 10 (A Typical Testing 
Environment of MultiProcessing Systems) for evaluation of the 
communication performance of multiprocessors, and this should be followed 
by those in the performance evaluation area.  Keio's Professor Boku 
presented a paper on DISTRAN (Distributed System Translator), a language 
for discretizing partial differential equations via explicit 
differencing, first into Prolog and then other languages so that they can 
be run on parallel machines.  Finally, the government labs ETL and ICOT 
are very active, with ICOT especially presenting five 
papers on diverse topics.  See my report on ICOT (data.eng, 30 May 

Because there are (as yet) no general purpose parallel computers from 
Japan, universities here are far behind in the kind of algorithmic work 
that is common in Western universities. There are also very few Western 
commercial general purpose parallel computers at Japanese universities.  
There is an iPSC/2 in the Information Science Department at the 
University of Tokyo, Alliants at the University of Tsukuba and Hiroshima, 
one or two BBN machines at other universities, and perhaps a few other 
machines scattered about, but these are the exceptions, and they are not 
common.  (There may be more at industrial research labs.)  Reliable 
machines like these are very useful for experimentation without having to 
worry too much about the system staying up.  Naturally, those headaches 
reduce the time and resources available for development of algorithms, 
system software and tools, and ultimately the time available for solving 
real problems.  There is a great deal of tool building on Unix 
workstations however, and much of that is directly related to parallel 
processing. On the other hand, there is much more system building 
(hardware) here than in the West and this is reflected in the mix of 
accepted papers for this conference.  

------------------JOINT SYMPOSIUM ON PARALLEL PROCESSING '91-------------
                                May 14-16, 1991

INVITED LECTURES----------------------------

alpha-Coral: A Control/Data Flow Multiprocessor and its Compiler
     Constantline D. Polychronopoulos (Center for Supercomputing Research and
     Development and Dept. of Electrical and Computer Engineering, University
     of Illinois at Urbana-Champaign
          E-mail: cdp@csrd.unuc.edu

Objected Oriented Parallelism: PC++Ideas and Experiments
     Dennis Gannon, Jenq Kueu Lee (Department of Computer Science, Indiana
     University, Bllmington Indiana 47401)
          E-mail: gannon@iuvax.cs.indiana.edu.

PANEL DISCUSSION----------------------------
Research Trends on Parallel Processing 
     Hiromu Hayashi (Information Processing Division, Fujitsu Laboratories,
Expected Features of the Future Parallel Processing - What to do now -
     T. Hiraki (Tokyo University/Electro Technical Laboratory)
Future Parallel Processing Systems
     Hironori Kasahara (Dept. of Information & Computer Sciences, Waseda
Expected Features of the Future Parallel Processing - What to do now -
     M. Kitsuregawa (Institute of Industrial Science, Tokyo University)
Expected Features of the Future Parallel Processing: - What to do now -
     Kazuo Taki (Institute for New Generation Computer Technology)
          E-mail: taki@icot.or.jp
Future Operating Systems
     Yutaka Ishikawa (Electrotechnical Laboratory)

DATA BASE & MEMORY---------------------------------
A Scheduling-Based Cache Coherence Scheme
     Masaru Takesue (NTT Software Laboratories)
          E-mail: takesue@lucifer.ntt.jp

Implementation and Evaluation of Coherency Protocol for Virtual Shared Memory
         in the Network-connected Parallel Computer
     Hironori Nakajo, Newton Kl Miura, Yukio Kaneda (Department of Systems
     Engineering, Faculty of Engineering, Kobe University)
     Koichi Wada (Institute of Information Science and Electronics, University
     of Tsukuba)
        The parallel logic simulation is treated as a parallel event 
     simulation. In parallel event simulation, the time keeping is 
     important. There are two time keeping algorithms, which are 
     conservative method and the virtual time method.  As conservative 
     method may introduce a deadlock, the means to avoid the deadlock is 
     important. The virtual time method, although deadlock never takes 
     place, needs a rollback operation when there occurs a time 
     discrepancy. The authors have implemented parallel logic simulation 
     program based on virtual time method on their parallel computer 
     Multi-PSI, which has 64 PSI computers interconnected with orthogonal 
     bus. The performance observed by experiment is 60 kilo events per 
     seconds and the speed-up ratio obtained is more than 40 by using 64 
        A comment made by Prof. Yasuura of Kyoto University, however, 
     pointed out that even a single workstation can attain as high as 100 
     kilo events per seconds.  

Multiple Processing Module Control on SDC, The Super Database Computer
     S. Hirano, M. Harada, M. Nakamura, Y. Aiba, K. Suzuki, M. Kitsuregawa, M.
     Takagi, W. Yang (Institute of Industrial Science, University of Tokyo)
      E-mail: hirano@tkl.iis.u-tokyo.ac.jp
         SDC, The Super Database Computer is a highly parallel relational 
     database server which serves SQL. In this paper we describe SDC's 
     process model which is a basic framework for parallel data 
     processing and multiple module control scheme on the framework. We 
     have developed two module version of SDC for feasibility study, the 
     result is also presented. SDC archived abount 30 times faster 
     performance than Teradata DBC/1024.  

Full-Text Retrieval System using a SIMD Parallel Processor
     Sueharu Miyahara, Toshio Kondo (NTT Human Interface Laboratories,
     Yokosuka Kanagawa)
     Syunkichi Tada (NTT Intelligent Technology Corp, Naka-ku, Yokohama,

PARALLEL INFERENCE MACHINE---------------------------

The Architecture of the Parallel Processing Management Kernel of PIE64
     Yasuo Hidaka, Hanpei Koike, Hidehiko Tanaka (Department of Electrical
     Engineering, Faculty of Engineering, The University of Tokyo)
      E-mail: {hidaka,koike,tanaka}@mtl.t.u-tokyo.ac.jp
        We have noticed that the overhead of parallel processing is 
     mainly caused by communication, synchronization and parallel 
     processing management.  Therefore, we have introduced a network 
     interface processor and a management processor into the processing 
     element(PE) of the parallel inference engine PIE64.  
        In this paper, the architecture of the "parallel processing 
     management kernel" executed by the management processor will be 
     described, focusing on how to treat parallel processing management, 
     e.g. load distribution and scheduling, which becomes significant in 
     fine-grained highly parallel processing.  
        The parallel processing management kernel performs dynamic load 
     partitioning, a part of the general load distribution process.  The 
     partitioning decision is based on parallelism, so that it eliminates 
     excessive concurrency and reduces communication.  The scheduling 
     strategy of the kernel introduces dynamic priorities based on 
     parallelism and room in heap memory, in order to avoid exhaustion of 
     resources caused by explosive parallelism and also in order to 
     increase parallelism when it is insufficient.  Thus a programmer 
     need not be concerned with parallelism explosion.  It also 
     introduces respite time in starting execution of each thread in 
     order to reduce cost of suspension and context switching.  
       The paper also presents a comparison of static partitioning by the 
     compiler and dynamic partitioning by the kernel.  When the 
     parallelism exceeds the number of PEs to a high degree, the simple 
     dynamic method with little overhead is more effective than the 
     sophisticated static method.  However, dynamic partitioning becomes 
     ineffective if the parallelism and the number of PEs are comparable 
     degree.  We concludes that the most promising method is the 
     composite method of both the static and dynamic methods.  

Evaluation of Instruction Level Parallelism on Parallel Inference Machine 
     Teruhiko Oohara, Koichi Takeda, Masatoshi Sato (Oki Electric Industry
     Co., Ltd.)

The Inference Processor UNIRED II: Evaluation by Simulation
     Kentaro Shimada, Hanpei Koike, Hidehiko Tanaka (Department of Electrical
     Engineering, Faculty of Engineering, University of Tokyo)
      E-mail: {shimada,koike,tanaka}@mtl.t.u-tokyo.ac.jp
        UNIREDII is the high performance inference processor of the 
     parallel inference machine PIE64. It is designed for the committed 
     choice language Fleng, and for use as an element processor of 
     parallel machines. Its main features are: 1) tag architecture, 2) 
     three independent memory buses (instruction fetching, data reading, 
     and data writing), 3) multi-context processing for reducing pipeline 
     interlocking and cost of context-switching for inter-processor 
     synchronization.  In this paper, several architectural features of 
     UNIREDII are evaluated by register transfer level simulation.  High 
     performance (over 1MLIPS) was attained, as predicted from its 
     design, and it was indicated that three memory buses and multi-
     context processing are yielding improved performance.  

DEDICATED MACHINE-------------------------------

Image Logic Algebra (ILA) and its Optical Implementations
     Masaki Fukui, Kenichi Kitayama (NTT Transmission Systems Laboratories)

A Single-Chip Vector-Processor Prototype Based on Streaming/FIFO
          Architecture - Evaluation of Macro Operation, 
          Vector-Scalar Cooperation and Terminating Vector Operations
     Takashi Hashimoto, Keizou Okazaki, Tetsuo Hironaka, Kazuaki Murakami
     (Interdisciplinary Graduate School of Engineering Sciences, Kyushu
     Shinji Tomita (Kyoto University)
          E-mail: {hashimot,keizo,hironaka,murakami}@is.kyushu-u.ac.jp

A Parallel Rendering Machine for High Speed Ray-Tracing - Instruction-
            Level Parallelism in the Macropipeline Stages 
     Seiji Murata, Oubong Gwun, Kazuaki Murakami (Interdisciplinary Graduate
     School of Engineering Sciences, Kyushu University)
     Shinji Tomita (Kyoto University)
          E-mail: {murata,gwun,murakami}@is.kyushu-u.ac.jp

SUPERSCALAR ARCHITECTURE----------------------------

A Pipeline Architecture for Parallel Processing Across Basic Blocks
     Toshikazu Marushima, Naoki Nishi, Ryosei Nakazaki (NEC Corporation)
     Kenji Ohsawa (NEC Scientific Information System Development Ltd.)

DSNS Processor Prototype - Evaluation of the Architecture and the Effect 
                 of Static Code Schedule 
     Akira Noudomi, Morihiro Kuga, Kazuaki Murakami (Interdisciplinary
     Graduate School of Engineering Sciences, Kyushu University)
     Tetsuya Hara (Mitsubishi Electric Co.)
     Shinji Tomita (Kyoto University)
          E-mail:  {noudomi,kuga,murakami}@is.kyushu-u.ac.jp

Hyperscalar Processor Architecture - The Fifth Approach to Instruction-Level
                 Parallel Processing
     Kazuaki Murakami (Interdisciplinary Graduate School of Engineering
     Sciences, Kyushu University)
          E-mail: murakami@is.kyushu-u.ac.jp

DATA FLOW MACHINE-----------------------------

Evaluation of Parallel Performance on Highlly Parallel Computer EM-4
     Yuetsu Kodama, Shuichi Sakai, Yoshinori Yamaguchi (Electrotechnical
          E-mail: saka@etl.go.jp

Architectural Design of a Parallel Supercomputer EM-5 (English)
     Shuichi Sakai, Yuetsu Kodama, Yoshinori Yamaguchi (Electrotechnical
     Email: sakai@au-bon-pain.lcs.mit.edu (or) sakai@etl.go.jp
        This paper describes an architecture of a parallel supercomputer 
     EM-5.  The EM-5 design objective is to construct a feasible parallel 
     supercomputer whose target performance is over 1 TFLOPS.  The design 
     principles of the EM-5 are: (1) an object-oriented data-driven 
     model; (2) an advanced direct matching scheme; (3) a highly fused 
     pipeline; (4) a RISC processor EMC-G for a highly parallel computer; 
     (5) a functional interconnection network; and (6) a maintenance 
     architecture which can provide real-time monitoring facilities.  
     After examining these features, this paper shows the architectural 
     design of the EM-5, whose target structure will have 16,384 
     processing elements and whose peak performance is about 655 GIPS and 
     1.3 TFLOPS (double precision).  

A Scheme to Reduce the Access Rate to Shared Memory for the Parallel
                 Processing System - Harray
     Hayato Yamana, Satoshi Ohdan, Yoichi Muraoka (School of Science and
     Engineering, Waseda Universuty)
     Email: muraoka@jpnwas00.bitnet

INTERCONNECTION NETWORK------------------------

An Approach to Realizing a Reconfigurable Interconnection Network Using 
                 Field Programmable Gate Arrays 
     Toshinori Sueyoshi, Itsujiro Arita (Kyushu Institute of Technology) 
     Kouhei Hano (Kyocera Inc.) 
      E-mail: sueyoshi@ai.kyutech.ac.jp
        We present a new reconfigurable interconnection network utilizing 
     the reconfigurability facilities of FPGA (Field Programmable Gate 
     Array), a kind of programmable logic LSI. Reconfiguration for the 
     desired connections  on our proposed reconfigurable interconnection 
     network is performed by programming the configuration data to each 
     FPGA, so that it can be directly implemented without simulation to 
     both: the static networks such as mesh and hypercube networks, and 
     dynamic networks such as baseline and omega networks. Consequently, 
     the optimum connections for interprocess communications or memory 
     reference patterns in executing application programs over the 
     reconfigurable multiprocessor can be configured adaptively by 

Integrated Parallelizing Compiler - Network Synthesizer
     Hiroki Akaboshi, Kazuaki murakami, Akira Fukuda (Interdisciplinary
     Graduate School of Engineering Sciences, Kyushu University)
     Shinji Tomita (Kyoto University)
          E-mail: {akaboshi,murakami,fukuda}@is.kyushu-u.ac.jp

Evaluation for Various Implementation of base-m n-cube Network
     Yasushi Kawakura, Noboru Tanabe, Takashi Suzuoka (Toshiba Research and
     Development Center)

MULTIPROCESSOR I---------------------------

A Node Processor for the A-NET Multicomputer and its Execution Scheme
     Tsutomu Yoshinaga, Mitsuru Suzuki, Takashi Teraoka, Hisashi Mogi,
     Takanobu Baba (Department of Information Science, Faculty of 
     Engineering, Utsunomiya University) 
      E-mail: yoshi@infor.utsunomiya-u.ac.jp
        The node processor of the A-NET parallel object-oriented computer 
     consists of a 40-bit processing element (PE) which executes methods 
     of allocated objects, a router which determines the path of a 
     message or transfers an object code, and 320KB of local memory. We 
     chose a high-level machine instruction set and a tagged architecture 
     for the PE, so that it may include supporting hardware units like an 
     instruction preprocessing unit and a tag processing unit. The 
     organization of the router is independent to the network-topology, 
     so that the message routing algorithm is programmable. The other 
     feature of the router is that it uses adaptable cut-through routing 
     for the packet switching, and circuit-switching object code transfer 
     as well.  

Performance Comparison of Parallel Wire-routing on Distributed 
                 Multiprocessors and Shared Memory Multiprocessors 
     Masahiko Sano, Yoshizo Takahashi (Department of Information Science and
     Intelligent Systems, Faculty of Engineerig, Tokushima University)
          E-mail: sano,taka@n30.is.tokushima-u.ac.jp

The Performance Evaluation of Communication Mechanism of Multiprocessor 
                 Test Bed ATTEMPT 
     Sunao Torii, Hideharu Amano (Department of Computer Science, Keio

MULTIPROCESSOR II------------------------------

Functional Memory Type Parallel Processors FMPP on a CAM and its 
     Hiroto Yasuura, Akihiro Watanabe, Ryugo Sadachi, 
     Keikichi Tamaru (Department of Electronics, Kyoto University) 

Demand/Accept Control Mechanism and Hardware of a Parallel Computer
     Masaki Tomisawa (Department of Computer Science, Faculty of 
     Technology, Tokyo Univ. of Agr. and Tech.) 

KRPP: Kyushu University Reconfigurable Parallel Processor
     Naoya Tokunaga, Shinichiro Mori, Kazuaki Murakami, Akira Fukuda
     (Interdisciplinary Graduate School of Engineering Sciences, Kyushu
     Tomoo Ueno (Kyushu Nippon Electric Co.)
     Eiji Iwata (Sony Co.)
     Koji Kai (Matsushita Electric Ind. Co.)
     Shinji Tomita (Kyoto University)
          E-mail: {tokunaga,mori,murakami,fukuda}@is.kyushu-u.ac.jp

PARALLEL LANGUAGE------------------------------

Distributed Implementation of Stream Communication in A'UM-90
     Koichi Konishi, Tsutomu Maruyama, Akihiko Konagaya (C&C Systems Research
     Laboratories, NEC Corporation)
     Kaoru Yoshida, Takashi Chikayama (Institute for New Generation Computer

Intra-object Parallelism on Parallel Object Oriented Languages
     Minoru Yoshida, Hidehiko Tanaka (Faculty of Engineering, University 
     of Tokyo) 
      E-mail: {minoru,tanaka}@mtl.t.u-tokyo.ac.jp
         Intra-object parallelism is important because server objects 
     must process many messages in short time and because concurrency in 
     an object makes its implementation easy. The paper presents a model, 
     in which messages are interpreted parallelly and instance variables 
     are accessed instantaneously. These two points were chief 
     sequentiality in intra-object parallel processing. Using single-
     assigned variables, instance variables can be accessed for an 
     instant. A language based on the model is also introduced. Because 
     the order of messages does not matter, it has the expressive power 
     for natural concurrent programming using an atomic access to 
     instance variables.  

Hyper DEUB: A Multiwindow Debugger for Parallel Logic Programs and 
                 Committed-Choice Language 
     Junichi Takemura, Hanpei Koike, Hidehiko Tanaka (Faculty of Engineering,
     The University of Tokyo)
      E-mail: {tatemura,koike,tanaka}@mtl.t.u-tokyo.ac.jp
         The debugging of parallel programs is more difficult than that 
     of sequential programs. Since a Committed-Choice Language (CCL), 
     which is a kind of parallel logic programming language, enables 
     fine-grained highly parallel execution, it is very hard to examine 
     and to manipulate its numerous complicated control/data flows. A 
     debugger, whose role is to show users a model abstracted from 
     execution of a program, needs a model to represent execution of 
     fine-grained highly parallel program. To represent execution of a 
     CCL program, we propose a communicating process model which has 
     flexible levels and aspects of abstraction. Our debugger represents 
     this model. A parallel program has multiple complicated control/data 
     flows which are considered to be high-dimensional information.  
     Therefore, a high-dimensional interface is necessary to debug it.  
     Since a user compares a model represented by a debugger with 
     expected behavior of the program in order to find a bug in the 
     program, the debugger must provide the kind of view he/she wants.  
     Accordingly, the debugger must provide views which have flexible 
     levels and aspects of abstraction. We developed a multiwindow 
     debugger HyperDEBU which provides a high-dimensional interface.  
     HyperDEBU  provides windows flexible enough for programmers to 
     examine and manipulate complicated structures composed of multiple 
     control/data flows.  

PARALLEL SYSTEM/ EVALUATION----------------------------

On the Real Number Index Sperce Array in the Dataflow Stream Language VISDAL
     Hirohisa Mori, Kazuhiko Kato, Hiroaki Takada (Dept. of Information
     Science, Faculty of Science, University of Tokyo)

Quantitative Evaluation of Several Synchronization Mechanisms Based on 
                  Static Scheduling and Fuzzy Barriere 
     Hiromitsu Takagi, Takaya Arita, Masahiro Sowa (Department of Electrical
     Engineering and Computer Science, Nagoya Institute of Technology)
          E-mail: takagi@craps.elcom.nitech.ac.jp

Parallel Garbage Collection on a Shared Memory Multi-Processor and its 
     Akira Imai (Institute for New Generation Computer Technology)
     Evan Tick (Univ. of Oregon)
     Katsuto Nakajima (Mitsubishi Electric Co.)
     Atsuhiro Goto (NTT)

PARALLELIZING COMPILER--------------------------

Prototype FORTRAN to Data Flow-Compile for Parallel Processing System - 
     Toshiaki Yasue, Jun Kohdate, Hayato Yamana, Yoichi Muraoka 
     (School of Science and Engineering, Waseda University) 
          E-mail: yasu@muraoka.info.waseda.ac.jp

APARC: Parallelizing Compiler for Parallel Computer ADENART
     Koji Zaiki, Akiyoshi Wakatani, Tadashi Okamoto (Matsushita Electric
     Industrial Co., Ltd., Semiconductor Research Center))
     Shigeru Kuroda (Matsushita Softresearch, Inc.)
      E-mail: zaiki@vdrl.src.mei.co.jp
        The parallelizing compiler, APARC translates FORTRAN programs 
     into ADETRAN programs that are high level parallel language for the 
     parallel computer ADENART.    Mainly APARC changes do loops into 
     parallel executable codes by control flow analysis and data 
     dependence analysis.  
        ADENART has a fast data communication network between 
     PE's(Processing Element) and synchronization mechanism.    APARC 
     uses this advantages in parallelization.    Especially, even if do 
     loops have goto statements that branch out of do loops, they can be 
     changed into parallel executable codes by APARC with exception 
     handling routines inserted.  Now, a prototype version of APARC is 
     available, and some applications can be translated.    In the near 
     future, we will make APARC available for many applications.  

DISTRAN System (Distributed Systems Translator) Implementation on 
                    Parallel Computers 
     Kiyohiro Suzuki, Nobuyuki Yamasaki, Takao Yumiba, Kaoru Murata, Taisuke
     Boku (Faculty of Science and Technology, Keio University)
      Email: taisuke@kw.phys.keio.ac.jp
       When solving problems described with partial differential 
     equations, the most general method is to discritize the space and 
     time domains, and calculate all spatial domains step by step. This 
     method requires a large amount of calculation if the density of the 
     mesh is high enough to get accurate solutions. However, all 
     spatially discritized domains can be calculated in parallel, and it 
     is possible to achieve high performance when calculating them on 
     large scale multiprocessors.  
       DISTRAN is a partial differential equation solver on parallel 
     processors using this method. With DISTRAN, a user can solve the 
     problem only describing a very simple form of problem specification, 
     consisting of the original partial differential equations, boundary 
     and initial conditions, and domain information. No actual 
     programming by the user is needed.  
       DISTRAN analyzes the given equations and checks their consistency.  
     The problem domain is discritized automatically, and all spatial 
     points and boundaries are calculated to satisfy given conditions.  
     Finally, DISTRAN generates a program to solve the problem on a 
     sequential or parallel processor. Currently, we have implemented 
     three versions of DISTRAN for three types of parallel processors, 
     MiPAX-32 [a commercial version of U-Tsukuba's PAX], QCDPAX and a 
     Transputer system. The first two machines are based on a shared 
     memory and global synchronization mechanism. The last one is based 
     on message passing links. We calculated the same problem on each 
     system, and confirmed that DISTRAN achieves actual high performance.  
       In this paper, we describe how to design and implement such an 
     automated programming and solving system on several types of 
     multiprocessors. We also show the actual performance of each system 
     and evaluate the calculation efficiency by DISTRAN.  

PARALLEL OS-------------------------------

A Testbed OS for Evaluation of Parallel Algorithms
     Takahiro Yakoh, Yuichiro Anzai (Department of Computer Science, Keio 

Parallel Processings in OS Kernel by the Process Network Architecture
     Yasuichi Nakayama, Iwao Morishita (University of Tokyo)
     Kazuya Tago (IBM Japan Tokyo Laboratories)
      E-mail: yasu@meip7s.t.u-tokyo.ac.jp
         A parallel operating system has been designed and implemented on 
     a loosely-coupled multiprocessor system employing the process 
     network architecture.  
         The operating system consists of a number of light-weight 
     processes interconnected by rendezvous communications and is 
     compatible with the UNIX system. It has been shown that when this 
     process network is distributed on multiple computer units with an 
     optimum assignment, some processes can run in parallel with the 
         In this paper we consider parallel processings in OS kernel in 
     order to improve the response of a system call.  

On Paralleling Transaction Processes by Exchanging Messages
     Haruo Yokota, Yasuo Noguchi, Riichiro Take (Fujitsu Laboratories, Ltd.)


Parallelizing Gaussian Elimination on PAX 
     Kimio Takahashi (Scientific Technology, Tsukuba Univ.) 

Study on the Algorithms for Matrix Solver on Massively Parallel Computer 
     Mitsuyoshi Igai (Hitachi VLSI Engineering Corp.) 
     Toshio Okouchi, Chisato Konno (Central Research Lab, Hitachi, Ltd.) 

Molecular Dynamics Simulation on a Highly Parallel Computer AP1000 
     Yoshiyuki Sato (Computer-Based Systems Lab., Fujitsu Labs Ltd.) 
          E-mail: hsat@flab.fujitsu.co.jp 
     Yasumasa Tanaka (Fujitsu Ltd.) 
     Hiroshi Iwama, Shigetsugu Kawakita, Minoru Saito, Kenji Morikami, 
     Toru Yao (Protein Engineering Research Institute) Shigenoru 
     Tsutsumi, Hideaki Yoshijima (Fujitsu Kyushu System Engineering) 

Parallel Nonlinear MHD Plasma Simulator
     Satoshi Matsushita, Nobuhiko Koike (NEC Corporation)
     Masaru Narusawa (NEC Scientific Information System Development Ltd.)
     Genichi Kurita, Toshihide Tsunematsu, Tatsuoki Takeda (Japan Atomic
     Energy Research Institute)
       Email: {matsushita, koike}@csl.cl.nec.co.jp
     AEOLUS is a non-linear Plasma simulator for instability (called 
     disruption) analysis of Tokamak Plasma in a Nuclear Fusion Reactor, 
     which is very time consuming. As most of AEOLUS's calculation is 
     non-linear, it employs explicit time integration. However, by 
     applying an implicit method to the linear part, we have improved its 
     convergence. We tried to parallelize the AEOLUS code developed and 
     tuned for a vector machine at the Japan Atomic Energy Research 
     Institute. The vector code ran 6 to 7 times faster than its scalar 
     counterpart. The small parallelism in the implicit part limits the 
     speed-up. We propose a novel parallel algorithm for MIMD parallel 
     machines, and successfully parallelized the implicit part of the 
     simulation. We have achieved a speed-up of 42 using the 64 processor 
     Cenju. (Cenju is a multiprocessor system with a distributed shared 
     memory scheme developed mainly for circuit simulation. Cenju is 
     designed for effective execution of our modular circuit simulation 
     algorithms.) (References follow.)
       1. T.Takeda, K.Tani,  S.Matsushita, et al.: 
         Plasma Simulator METIS and Tokamaku Plasma Analysis,
         US-Japan Workshop on Advances in Computer Simulation Techniques
         Applied to Plasma and Fusion, (1990).
       2. T. Nakata et. al: Cenju: A Multiprocessor System with
         a Distributed Shared Memory Scheme for Modular Circuit Simulation,
         Proc. International Symposium on Shared Memory Multiprocessing ,
         pp.82-90, April  (1991).


Parallel Logic Simulation based on Virtual Time
     Yukinori Matsumoto, Kazuo Taki (Institute for New Generation 
             Computer Technology) 
      Email: yumatumo@icot.or.jp
     Author's abstract: This paper focuses on parallel logic simulation.  
     An efficient logic simulation system on a large-scale multiprocessor 
     is targeted. The Time Warp mechanism, an optimistic approach, was 
     experimented and evaluated though it has been said that rollback 
     processes costed much. The system is implemented on the Multi-PSI, a 
     distributed memory multiprocessor. It includes several new ideas to 
     enhance the performance, such as local message scheduler, 
     antimessage reduction mechanism and load distribution scheme. In our 
     experiment, using 64 processors, about 48-fold speedup was attained 
     and the performance of the whole system amounted to about 60 k 
     events/sec that is fairly good as a full software simulator.  Then 
     this paper reports the empirical comparison between the Time Warp 
     mechanism and two conservative mechanisms: an asynchronous approach 
     using null messages and a synchronous approach. The comparison shows 
     that the Time Warp mechanism will be the most efficient of the 
     three, and could be the most suitable for large-scale 
     [Comment: The parallel logic simulation is treated as a parallel 
     event simulation. In parallel event simulation, the time keeping is 
     important. There are two time keeping algorithms, which are 
     conservative method and the virtual time method. As conservative 
     method may introduce a deadlock, the means to avoid the deadlock is 
     important. The virtual time method, although deadlock never takes 
     place, needs a rollback operation when there occurs a time 
     discrepancy. The authors have implemented parallel logic simulation 
     program based on virtual time method on their parallel computer 
     Multi-PSI, which has 64 PSI computers interconnected with orthogonal 
     bus. The performance observed by experiment is 60 kilo events per 
     seconds and the speed-up ratio obtained is more than 40 by using 64 
        A comment made by Prof. Yasuura of Kyoto University, however, 
     pointed out that even a single workstation can attain as high as 
     100 kilo events per seconds.]

Massively Parallel Layout Engine - Routing Processor
     K. Kawamura, T. Shindo, T. Shibuya, H. Miwatari, Y. Ohki, T. Doi
     (Computer-Based Systems Lab., Fujitsu Laboratoties Ltd.)
        The authors have developed a new algorithm called the constrained 
     relaxational maze running algorithm for automated wire-routing. In 
     this method the intersection of nets are allowed but is evaluated by 
     a cost function. By iterating the routing by decrementing the cost 
     of penalty, the optimum routings are finally obtained.  
        They have built a massively parallel computer to implement this 
     algorithm. This machine is called MAPLE-RP, which has 8K 1bit PU 
     connected in lattice and operate in SIMD. The performance is 40 GOPS 
     when 64K PU are used. The performances of routing rate and the 
     routing speed were observed quite satisfactory.  

A Parallel Router based on a Concurrent Object-oriented Model
     Hiroshi Date, Yoshihisa Ohtake, Kazuo Taki (Institute for New 
            Generation Computer Technology) 
      E-mail: date@icot.or.jp
     Author's abstract: The design of LSI routing is well known as a 
     process theat requires massive computational power. So speedup using 
     parallel processing leads to a shortening in the LSI design period. 
     This paper presents a new parallel router based on a concurrent 
     object-oriented model. The objects corresponding to line segments 
     find the path between terminals by exchanging messages with each 
     other. This method has high parallelisms. The searching algorithm of 
     our model is based on a look-ahead line search algorithm. We 
     implemented this algorithm using the KL1 language on Multi-PSI. We 
     have been verifying our router using real LSI data, the initial 
     results are described.  
     [Comment. This paper presents a parallel routing algorithm based on 
     look-ahead line-search algorithm and the result of speedup obtained 
     by running the program on their parallel computer Multi-PSI. The 
     algorithm is based on the object-oriented model in the sense that 
     each net is considered an object which exchange messages to avoid 
     intersection. Although the obtained speedup was favorable, the 
     routing rate was not.] 


A Parallel Processing Feature of a DBMS with SCMP for OLTP
     Kazumi Hayashi, Kazuhiko Saitoh, Tomohiro Hayashi, Masaaki Mitani,
     Hiroshi Ohsato, Takashi Obata, Yutaka Sekine, Mitsuhiro Ura, Takuji Ishii
     (2nd Software Division, Computer System Group, Fujitsu Ltd.)

Parallel Dynamic Map Construction and Navigation in Real-Time for 
                  Autonomous Robots (ENGLISH) 
     Martin Nilsson (Swedish Institute of Computer Science, Box 1263, S-164 28
     Kista, Sweden)
      E-mail: mn@sics.se
         Real-time map construction and navigation are complex and 
     computationally intensive tasks, but contain much potential 
     parallelism.  This paper describes how programming techniques based 
     on committed-choice languages can be used to both concisely express 
     algorithms for such problems, and extract their parallelism.  

Parallel Processing of ATMS on the Heterogeneous Distributed System 
     Hiroshi G. Okuno (NTT Basic Research Laboratories)
     Osamu Akashi, Kenichiro Murakami, Yoshiji Amagi (NTT Software
      E-mail: okuno@ntt-20.ntt.jp, murakami@ntt-20.ntt.jp,
              akashi@toshi.ntt.jp, amagi@nuesun.ntt.jp
         We have proposed NueLinda computation model which integrates 
     various heterogeneous distributed systems and provides computing and 
     data resources in a transparent and uniform manner. On the NueLinda 
     model, We have designed and implemented TAO-Linda on the Lisp 
         ATMS (Assumption-based Truth Maintenance System) is an 
     intelligent data base in the sense that it maintains the support 
     sets for each data. A conventional database can contain only one 
     consistent context of data, while the ATMS provides to the inference 
     engine the multiple-context mechanism. ATMS is considered as one of 
     the essential facilities for AI systems of the next generation and 
     its execution speed needs to be improved drastically.  
         In this paper, we discuss about the parallel processing of ATMS 
     with TAO-Linda and compare the resulting implementation with the 
     parallel processing of ATMS on a shared-memory machine.  

PARALLEL COMPUTING MODEL--------------------------

Message-flow: A New Computation Model for MIMD-type Parallel Machines
     Hiroaki Fujii (Hitachi Ltd.)
     Kiyoshi Shibayama (Faculty of Engineering, Kyoto University)

A Hybrid Group Reflective Architecture for Object-Oriented Concurrent 
     Takuo Watanabe, Satoshi Matsuoka, Akinori Yonezawa (Department of
     Information Science, The University of Tokyo)
      E-mail: {takuo,matsu,yonezawa}@is.s.u-tokyo.ac.jp
        The benefits of computational reflection are the abilities to 
     reason and alter the dynamic behavior of computation from within the 
     language framework. This is more beneficial in 
     concurrent/distributed computing, where the complexity of the system 
     is much greater compared to sequential computing; we have 
     demonstrated various benefits in our past research of Object-
     Oriented Concurrent Reflective (OOCR) architectures. Unfortunately, 
     attempts to formulate reflective features provided in practical 
     reflective systems, such as resource management, have led to some 
     difficulties in maintaining the linguistic lucidity necessary in 
     computational reflection. The primary reason is that previous OOCR 
     architectures lack the ingredients for group-wide object 
       We present a new OOCR system architecture called "Hybrid Group 
     Reflective Architecture (HGRA)", and a new language ABCL/R2 based on 
     this architecture. The key features of ABCL/R2 are the notion of 
     heterogeneous object groups and coordinated management of group 
     shared computational resources. We describe how such management can 
     be effectively modeled and adaptively modified/controlled with the 
     reflective features of ABCL/R2. We also illustrate that this 
     architecture is totally defined in meta-circular way (not adopting 
     ad-hoc primitives), embodying two directions of reflective towers.  

Towards Realistic Type Inference for Guarded Horn Clauses (ENGLISH)
     Dongwook Shin (Fujitsu Laboratories, IIAS)
      E-mail: shin@iias.flab.fujitsu.co.jp
         This paper proposes a type inference system for Guarded Horn 
     Clauses, GHC, based on the notion of value and communication type.  
     A value type is a type that a predicate can have, guaranteeing that 
     a goal predicate of the value type does not raise type errors at run 
     time.  A communication type is a type under which several predicates 
     communicate with one another. These types are obtained by constraint 
     solving and the pre-evaluation of a GHC program to some extent. We 
     are expecting that these types contribute to the early detection of 
     errors in GHC program development.  


A Process Control Scheme for Distributed Processing Systems Using 
                  Weighted Throw Counting 
     Kazuaki Rokusawa (Systems Laboratory, OKI)
          E-mail: rokusawa@okilab.oki.co.jp (or) rokusawa@icot.or.jp
     Nobuyuki Ichiyoshi (Institute for New Generation Computer 
      E-mail: ichiyoshi@icot.or.jp
         This paper proposes a new scheme for 
     aborting/stopping/restarting (in general, changing the execution 
     state of) a pool of processes in a distributed environment where 
     there may be processes in transit. The scheme guarantees that all 
     processes belonging to the pool change state and to detect the 
     completion of state change, and works under FIFO and non-FIFO 
     communication. It uses broadcasting and weighted throw counting, and 
     only requires a few words per processor per process pool.  

Sort m Smallest Elements Problem on a Linearly Connected Processor Array 
                  with Multiple Buses 
     Satoshi Fujita, Masafumi Yamashita, Tadashi Ae (Faculty of Engineering,
     Hiroshima University)

Time Bounds for Sorting and Routing Problems on Mesh-Bus Computers
     Kazuo Iwama, Eiji Miyano (Faculty of Engineering, Kyushu University)
     Yahiko Kambayashi (Faculty of Engineering, Kyoto University)


Fuzzy 0-1 Combinatorial Optimization through Neural Networks
     Masatoshi Sakawa, Toru Mitani (Department of Industrial and Systems
     Engineering, Faculty of Engineering, Hiroshima University)
     Kazuya Sawada (Information System Center, Matsushita Electric Works,
          E-mail: sakawa@msl.sys.hiroshima-u.ac.jp

Dynamic Modification of the Free Energy Function Improves Ability to Find 
                Good Solutions on a Hopfield Neural Networks 
     Yutaka Akiyama, Tatsumi Furuya (Electrotechnical Laboratory)
      E-mail: yakiyama@etl.go.jp
        Four novel techniques for global optimization on a Hopfield 
     neural network are proposed. The sharpening method dynamically 
     modifies the gain of the neuron's input/output function. The excess 
     bias method provides an excessive input bias to improve the energy 
     "landscape". The emphasizing method dynamically changes balance 
     among constraints. And the annealing method controls randomness in 
     the stochastic Hopfield model (the Gaussian Machine). By combining 
     these techniques, the neural network shows excellent ability to 
     solve optimization problems.  

The Chain Reaction in Adaptive Junction Networks
     Yoshiaki Ajioka, Yuichiro Anzai (Department of Computer Science, Keio
      E-mail: ajioka@aa.cs.keio.ac.jp
         Although Neural Networks are useful for pattern recognition, 
     they are not common for sequential processing. We made Adaptive 
     Junction, which is a feedback-type neural network recognizing 
     spatio-temporal patterns.  This paper proves that Adaptive Junction 
     networks can perform the chain reaction for any spatio-temporal 
     patterns when each neuron has a 1-degree feature pattern.  From this 
     result, the order of the number of neurons desired to recognize some 
     spatio-temporal patterns becomes clear in Adaptive Junction 

A Genetic Algorithms Approach to How to Represent the Basin of 
              Associative Memory Model 
     Keiji Suzuki, Yukinori Kakazu (Department of Engineering, Hokkaido


                                 Vol 32, No. 4

The Way to Massively Parallel Computers
     Takanobu Baba (Department of Information Science, Utsnomiya University)

Realization Technologies for Massively Parallel Machines
     Shigeru Oyanagi, Noboru Tanabe (Toshiba R&D Center)

Super-parallel Computer ADENA for Scientific Simulation
     Tatsuo Nogi (Division of Applied Systems Science, Faculty of Engineering,
     Kyoto University)

Neural Network Model Processing on Massively Parallel Computers
     Noboru Sonehara, Makoto Hirayama (ATR Auditory and Visual Research

Commercial Massive Parallel SIMD Computer and its Application
     Masaru Kitsuregawa (Institute of Industrial Science, University of Tokyo)
     Taiichi Yuasa (Toyohashi University of Technology)

Logic Programming Oriented Inference Machine
     Hidehiko Tanaka (Department of Electrical Engineering, University of

Implementation for Sequential Logic Programming Languages
     Minoru Yokota (Computer System Research Laboratory, C&C Systems Research
     Laboratories, NEC Corporation)

Parallel Implementation Schemes of Logic Programming Languages
     Nobuyuki Ichiyoshi (Institute for New Generation Computer Technology)

Architecture of Sequential Inference Machine
     Yukio Kaneda, Hideo Matsuda (Dept. of Systems Engineering, Faculty of
     Engineering, Kobe University)

Parallel Inference Machine Architecture
     Atsuhiro Goto (Software Research Laboratory, NTT Software Laboratories)

-----------------------END OF REPORT------------------------------------

