rick@cs.arizona.edu (Rick Schlichting) (06/13/91)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Asia (ONR/Asia).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution
From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: Joint Symposium on Parallel Processing '91, Kobe Japan, 14-16 May 1991
13 June 1991
This file is named "jspp.91"
ABSTRACT. An overview is given of the Joint Symposium on Parallel
Processing '91, held in Kobe Japan, 14-16 May 1991, as well as titles and
some abstracts. Also appended are the titles/authors of IFIP Vol 33#4,
which was a special issue on massively parallel computers.
INTRODUCTION.
The Joint Symposium of Parallel Processing is an annual research
conference associated with parallel processing. Approximately 250 people
attended this year's conference, which was held on an artificial island
in the Kobe harbor. (Kobe is an important port city near Osaka.) There
were 59 half hour papers in three parallel sessions, one panel discussion
on the future of parallel processing, and two invited lectures, by C.
Polychronopoulos (Illinois) and D. Gannon (Indiana). The cross section
of topics was as follows.
Architecture 25 papers
Applications 10
Systems 9
Neurocomputing 4
Fundamentals 6
Operating systems 3
Invited papers 2
Except for the lectures by the two invited speakers all the presentations
were in Japanese. A few papers are printed in English in the bound
Proceedings. The titles and authors of all the papers are appended to the
end of this report, as are the electronic mail addresses of many of the
authors. I wish to thank the many Japanese scientists who took the time
and effort to provide me with English translations of their abstracts,
and these are also included, as are some comments. This report also
contains the titles of papers published in a special 1990 issue of the
Japan IFIP Vol 33#4, entirely devoted to massively parallel computers.
The organizers told me that they made extra efforts to encourage papers
with more software and application content, but that the resulting mix
was still heavily weighted toward hardware.
SUMMARY.
I concentrated on the applications papers and discovered that there were
only a very few surprises; perhaps being here a year and a half helps.
One surprise was the paper on Super Data Base Computer being developed by
Dr. Masaru Kitsuregawa
Institute of Industrial Science
University of Tokyo
Roppongi, Minato-ku, Tokyo Japan
Tel: +81-3-3402-6231x2356, Fax: +81-3-3479-1706
Email: a80509@tansei.cc.u-tokyo.ac.jp
especially since I was part of a JTEC team here in March to study
Japanese activities in the database area. Another surprise was the paper
on the next generation of the ETL parallel computer (EM-5), in which it
was stated emphatically that this would not be a dataflow machine in any
sense. I reported on this earlier (see data.eng, 30 May 1991) where Dr.
Sakai one of the designers explained that this comment was an error in
the English translation.
While I have reported on Japanese parallel computing in the past it is
worth repeating that there are a number of highly capable parallel
machines (MIMD) that are being used here for real science applications.
There are also some SIMD machines, typically associated with even more
specialized applications such as image, text, or speech processing. Most
Japanese parallel computers are in the hands of very friendly users, or
in prototype form. They have from 64 to about 1000 processors, and have
peak performance of several tens of gigaflops (perhaps more when fully
configured). However, thus far I have not seen any general purpose
parallel computers in the sense of CM, Hypercube, etc. An exception to
this is the PIE (Parallel Inference Engine) computers being developed by
ICOT, but these have not been used for numerical computation. Instead
parallel computers in Japan have been developed by Japanese companies
with very specific applications in mind. Some examples follow. It seems
to me that these companies are being very conservative about marketing
parallel computers. Senior administrators in two different organizations
told me that they were not sure about the market size for highly parallel
machines. They felt that it was necessary to have an active research
effort but would be tentative about going further. In my opinion parallel
computers from NEC and Fujitsu could easily be commercialized. At the
same time these two companies are very aggressively pursuing the
traditional supercomputer market. In fact while I was at this meeting,
NEC announced that their one processor SX-3/14 had taken first place in
Dongarra's LINPACK benchmarks with 314MFLOPs for n=100, and 4.2GFLOPs for
n=1000, mostly through tuning and enhancements in the Fortran system.
The list of examples of parallel computing given below is definitely not
exhaustive, but simply meant to suggest the level of activity. There is
one Connection Machine in Japan, at the ATR lab between Kyoto and Osaka.
Researchers there have been using it for speech processing related
research, and while there were no papers about that work presented at
this meeting, one paper appeared in the IFIP journal whose titles are
listed at the end of this report.
Hitachi: Developing the 64 node H2P and the parallel programming
language Paragram (see parallel.903, 6 Nov 1990). An Hitachi
researcher gave a talk describing various comparisons between
Multigrid, Jacobi, Red-Black SOR, ADI, PCG-ICCG, and Gaussian
elimination for solving the pde "div(-k gradU)=Q" on a rectangle.
Hitachi also has general purpose neurocomputer with peak performance
of 2.3GCUPS, worlds fastest. Practical applications like stock
prediction expected in 2-3 years.
Fujitsu: 1024 PE version of AP1000 to be available in 1991. At this
meeting Fujitsu researchers described using AP1000 to perform
molecular dynamics on the 64 node AP1000 using an adaptation of AMBER
(Assisted Model Building with Energy Refinement), developed by A.
Kollman at U-Cal San Francisco. Speedup with 64 processors was about
55 (86%), and they predict that with 128 processors it will be about
80%. AP1000 is the most "general purpose" of the Japanese parallel
computers. See my remarks about this machine in the report
(parallel.902, 6 Nov 1990). An AP1000 is installed at the Australian
National University in Canberra, where I will be visiting next month,
so I hope to have additional details at that time. Fujitsu also
described their work on the non-numeric parallel processor, MAPLE-RP
(routing processor) for laying out IC designs. In one benchmark
(384x256 grid) known as the "Burnstein switch box problem" the 4096 PE
MAPLE-RP ran 300 times faster than a Sun4/1. Fujitsu is responsible
for the parallel inference machine of the 5th generation project.
This year Fujitsu will complete a neural computer to rival Hitachi's.
NEC: Steady preparations for super parallel machines, including trials
for in-house semiconductor design via 64 processor Cenju. See my
report on Cenju in (spice, 2 July 1990). At this meeting NEC presented
a nice application of Cenju for a completely different application,
plasma simulation in magneto hydrodynamics (MHD). The major issue here
is solving the specially block structured linear equations that arise
after the discretization. For this problem a speedup of about 40 with
64 PEs was reported. The authors also suggest that a version of Cenju
with 512 processors is somewhere in the development stage. NEC
Keyboarded neurocomputer being sold for PC applications.
Matsushita: Developing ADENA with Kyoto University, see my report
(parallel.904, 6 Nov 1990). At this meeting a description of the
Fortran compiler, and the preprocessor for the special purpose
language ADETRAN was given. Matsushita also has worked on OHM256,
with 25GFLOPS peak performance, and may combine four of them to reach
100GFLOPS. Matsushita is also marketing a sweeper assembled with
application of neurotechnology.
Anritsu: The commercial version of Tsukuba University's PAX. At this
meeting one talk was given analyzing the number of computations for a
parallel implementation of Gaussian elimination on PAX. We reported in
(chep.91, 22 May 1991) that support for a new version of PAX has been
approved by the Ministry of Education. A very early version of PAX was
also marketed by Mitsubishi. Prof Y. Oyanagi, one of the principal
investigators from Tsukuba has just moved to Tokyo University.
Professor Yoshio Oyanagi
Department of Information Science
Faculty of Science, University of Tokyo
Hongo 7-3-1, Bunkyo, Tokyo, 113 JAPAN
Tel: +81-3-3812-2111 ex. 4115, Fax: +81-3-3818-1073
Email: OYANAGI@IS.S.U-TOKYO.AC.JP
Toshiba: 512 PE Prodigy.
NTT: Research in using the 256 PE SIMD computer LISCAR for Japanese
full text retrieval. Also NTT engages in research in applications of
neurocomputers to voice recognition and automatic translation systems.
NTT has also developed a 4-Kbit content addressable memory (CAM),
which is being used by Waseda University , ETL, as well as NTT itself
as part of a string-search chip.
The universities are busy too. Several of the parallel computing projects
that are now supported in companies began as university projects,
including PAX and ADENA. We reported on Kyushu-U's reconfigurable
parallel computer in (parallel.904, 6 Nov 1990) and that is still moving
forward, although the main investigator, Professor Tomita, has just
transferred to Kyoto University.
Professor Shinji Tomita
Dept of Information Science
Kyoto University
606 Yoshidahonmachi, Sakyo-ku
Kyoto, Japan
Tel: +81 75 753-5373
Email: TOMITA@KUIS.KYOTO-U.AC.JP
Kyushu also reported on several other projects, including a parallel
rendering machine for high speed ray-tracing, a streaming FIFO processor,
and a hyperscalar architecture. (This department supports an extremely
large variety of projects.) Waseda University has two interesting
independent projects directed by Prof. Muraoka (the Harray system and
its Fortran compiler), and Prof Kasahara (Oscar system). Keio university
described the experimental system ATTEMPT 10 (A Typical Testing
Environment of MultiProcessing Systems) for evaluation of the
communication performance of multiprocessors, and this should be followed
by those in the performance evaluation area. Keio's Professor Boku
presented a paper on DISTRAN (Distributed System Translator), a language
for discretizing partial differential equations via explicit
differencing, first into Prolog and then other languages so that they can
be run on parallel machines. Finally, the government labs ETL and ICOT
are very active, with ICOT especially presenting five
papers on diverse topics. See my report on ICOT (data.eng, 30 May
1991).
Because there are (as yet) no general purpose parallel computers from
Japan, universities here are far behind in the kind of algorithmic work
that is common in Western universities. There are also very few Western
commercial general purpose parallel computers at Japanese universities.
There is an iPSC/2 in the Information Science Department at the
University of Tokyo, Alliants at the University of Tsukuba and Hiroshima,
one or two BBN machines at other universities, and perhaps a few other
machines scattered about, but these are the exceptions, and they are not
common. (There may be more at industrial research labs.) Reliable
machines like these are very useful for experimentation without having to
worry too much about the system staying up. Naturally, those headaches
reduce the time and resources available for development of algorithms,
system software and tools, and ultimately the time available for solving
real problems. There is a great deal of tool building on Unix
workstations however, and much of that is directly related to parallel
processing. On the other hand, there is much more system building
(hardware) here than in the West and this is reflected in the mix of
accepted papers for this conference.
------------------JOINT SYMPOSIUM ON PARALLEL PROCESSING '91-------------
May 14-16, 1991
INVITED LECTURES----------------------------
alpha-Coral: A Control/Data Flow Multiprocessor and its Compiler
Constantline D. Polychronopoulos (Center for Supercomputing Research and
Development and Dept. of Electrical and Computer Engineering, University
of Illinois at Urbana-Champaign
E-mail: cdp@csrd.unuc.edu
Objected Oriented Parallelism: PC++Ideas and Experiments
Dennis Gannon, Jenq Kueu Lee (Department of Computer Science, Indiana
University, Bllmington Indiana 47401)
E-mail: gannon@iuvax.cs.indiana.edu.
PANEL DISCUSSION----------------------------
Research Trends on Parallel Processing
Hiromu Hayashi (Information Processing Division, Fujitsu Laboratories,
Ltd.)
Expected Features of the Future Parallel Processing - What to do now -
T. Hiraki (Tokyo University/Electro Technical Laboratory)
Future Parallel Processing Systems
Hironori Kasahara (Dept. of Information & Computer Sciences, Waseda
University)
Expected Features of the Future Parallel Processing - What to do now -
M. Kitsuregawa (Institute of Industrial Science, Tokyo University)
Expected Features of the Future Parallel Processing: - What to do now -
Kazuo Taki (Institute for New Generation Computer Technology)
E-mail: taki@icot.or.jp
Future Operating Systems
Yutaka Ishikawa (Electrotechnical Laboratory)
DATA BASE & MEMORY---------------------------------
A Scheduling-Based Cache Coherence Scheme
Masaru Takesue (NTT Software Laboratories)
E-mail: takesue@lucifer.ntt.jp
Implementation and Evaluation of Coherency Protocol for Virtual Shared Memory
in the Network-connected Parallel Computer
Hironori Nakajo, Newton Kl Miura, Yukio Kaneda (Department of Systems
Engineering, Faculty of Engineering, Kobe University)
Koichi Wada (Institute of Information Science and Electronics, University
of Tsukuba)
The parallel logic simulation is treated as a parallel event
simulation. In parallel event simulation, the time keeping is
important. There are two time keeping algorithms, which are
conservative method and the virtual time method. As conservative
method may introduce a deadlock, the means to avoid the deadlock is
important. The virtual time method, although deadlock never takes
place, needs a rollback operation when there occurs a time
discrepancy. The authors have implemented parallel logic simulation
program based on virtual time method on their parallel computer
Multi-PSI, which has 64 PSI computers interconnected with orthogonal
bus. The performance observed by experiment is 60 kilo events per
seconds and the speed-up ratio obtained is more than 40 by using 64
processors.
A comment made by Prof. Yasuura of Kyoto University, however,
pointed out that even a single workstation can attain as high as 100
kilo events per seconds.
Multiple Processing Module Control on SDC, The Super Database Computer
S. Hirano, M. Harada, M. Nakamura, Y. Aiba, K. Suzuki, M. Kitsuregawa, M.
Takagi, W. Yang (Institute of Industrial Science, University of Tokyo)
E-mail: hirano@tkl.iis.u-tokyo.ac.jp
SDC, The Super Database Computer is a highly parallel relational
database server which serves SQL. In this paper we describe SDC's
process model which is a basic framework for parallel data
processing and multiple module control scheme on the framework. We
have developed two module version of SDC for feasibility study, the
result is also presented. SDC archived abount 30 times faster
performance than Teradata DBC/1024.
Full-Text Retrieval System using a SIMD Parallel Processor
Sueharu Miyahara, Toshio Kondo (NTT Human Interface Laboratories,
Yokosuka Kanagawa)
Syunkichi Tada (NTT Intelligent Technology Corp, Naka-ku, Yokohama,
Kanagawa)
PARALLEL INFERENCE MACHINE---------------------------
The Architecture of the Parallel Processing Management Kernel of PIE64
Yasuo Hidaka, Hanpei Koike, Hidehiko Tanaka (Department of Electrical
Engineering, Faculty of Engineering, The University of Tokyo)
E-mail: {hidaka,koike,tanaka}@mtl.t.u-tokyo.ac.jp
We have noticed that the overhead of parallel processing is
mainly caused by communication, synchronization and parallel
processing management. Therefore, we have introduced a network
interface processor and a management processor into the processing
element(PE) of the parallel inference engine PIE64.
In this paper, the architecture of the "parallel processing
management kernel" executed by the management processor will be
described, focusing on how to treat parallel processing management,
e.g. load distribution and scheduling, which becomes significant in
fine-grained highly parallel processing.
The parallel processing management kernel performs dynamic load
partitioning, a part of the general load distribution process. The
partitioning decision is based on parallelism, so that it eliminates
excessive concurrency and reduces communication. The scheduling
strategy of the kernel introduces dynamic priorities based on
parallelism and room in heap memory, in order to avoid exhaustion of
resources caused by explosive parallelism and also in order to
increase parallelism when it is insufficient. Thus a programmer
need not be concerned with parallelism explosion. It also
introduces respite time in starting execution of each thread in
order to reduce cost of suspension and context switching.
The paper also presents a comparison of static partitioning by the
compiler and dynamic partitioning by the kernel. When the
parallelism exceeds the number of PEs to a high degree, the simple
dynamic method with little overhead is more effective than the
sophisticated static method. However, dynamic partitioning becomes
ineffective if the parallelism and the number of PEs are comparable
degree. We concludes that the most promising method is the
composite method of both the static and dynamic methods.
Evaluation of Instruction Level Parallelism on Parallel Inference Machine
PIM/i
Teruhiko Oohara, Koichi Takeda, Masatoshi Sato (Oki Electric Industry
Co., Ltd.)
The Inference Processor UNIRED II: Evaluation by Simulation
Kentaro Shimada, Hanpei Koike, Hidehiko Tanaka (Department of Electrical
Engineering, Faculty of Engineering, University of Tokyo)
E-mail: {shimada,koike,tanaka}@mtl.t.u-tokyo.ac.jp
UNIREDII is the high performance inference processor of the
parallel inference machine PIE64. It is designed for the committed
choice language Fleng, and for use as an element processor of
parallel machines. Its main features are: 1) tag architecture, 2)
three independent memory buses (instruction fetching, data reading,
and data writing), 3) multi-context processing for reducing pipeline
interlocking and cost of context-switching for inter-processor
synchronization. In this paper, several architectural features of
UNIREDII are evaluated by register transfer level simulation. High
performance (over 1MLIPS) was attained, as predicted from its
design, and it was indicated that three memory buses and multi-
context processing are yielding improved performance.
DEDICATED MACHINE-------------------------------
Image Logic Algebra (ILA) and its Optical Implementations
Masaki Fukui, Kenichi Kitayama (NTT Transmission Systems Laboratories)
A Single-Chip Vector-Processor Prototype Based on Streaming/FIFO
Architecture - Evaluation of Macro Operation,
Vector-Scalar Cooperation and Terminating Vector Operations
Takashi Hashimoto, Keizou Okazaki, Tetsuo Hironaka, Kazuaki Murakami
(Interdisciplinary Graduate School of Engineering Sciences, Kyushu
University)
Shinji Tomita (Kyoto University)
E-mail: {hashimot,keizo,hironaka,murakami}@is.kyushu-u.ac.jp
A Parallel Rendering Machine for High Speed Ray-Tracing - Instruction-
Level Parallelism in the Macropipeline Stages
Seiji Murata, Oubong Gwun, Kazuaki Murakami (Interdisciplinary Graduate
School of Engineering Sciences, Kyushu University)
Shinji Tomita (Kyoto University)
E-mail: {murata,gwun,murakami}@is.kyushu-u.ac.jp
SUPERSCALAR ARCHITECTURE----------------------------
A Pipeline Architecture for Parallel Processing Across Basic Blocks
Toshikazu Marushima, Naoki Nishi, Ryosei Nakazaki (NEC Corporation)
Kenji Ohsawa (NEC Scientific Information System Development Ltd.)
DSNS Processor Prototype - Evaluation of the Architecture and the Effect
of Static Code Schedule
Akira Noudomi, Morihiro Kuga, Kazuaki Murakami (Interdisciplinary
Graduate School of Engineering Sciences, Kyushu University)
Tetsuya Hara (Mitsubishi Electric Co.)
Shinji Tomita (Kyoto University)
E-mail: {noudomi,kuga,murakami}@is.kyushu-u.ac.jp
Hyperscalar Processor Architecture - The Fifth Approach to Instruction-Level
Parallel Processing
Kazuaki Murakami (Interdisciplinary Graduate School of Engineering
Sciences, Kyushu University)
E-mail: murakami@is.kyushu-u.ac.jp
DATA FLOW MACHINE-----------------------------
Evaluation of Parallel Performance on Highlly Parallel Computer EM-4
Yuetsu Kodama, Shuichi Sakai, Yoshinori Yamaguchi (Electrotechnical
Laboratory)
E-mail: saka@etl.go.jp
Architectural Design of a Parallel Supercomputer EM-5 (English)
Shuichi Sakai, Yuetsu Kodama, Yoshinori Yamaguchi (Electrotechnical
Laboratory)
Email: sakai@au-bon-pain.lcs.mit.edu (or) sakai@etl.go.jp
This paper describes an architecture of a parallel supercomputer
EM-5. The EM-5 design objective is to construct a feasible parallel
supercomputer whose target performance is over 1 TFLOPS. The design
principles of the EM-5 are: (1) an object-oriented data-driven
model; (2) an advanced direct matching scheme; (3) a highly fused
pipeline; (4) a RISC processor EMC-G for a highly parallel computer;
(5) a functional interconnection network; and (6) a maintenance
architecture which can provide real-time monitoring facilities.
After examining these features, this paper shows the architectural
design of the EM-5, whose target structure will have 16,384
processing elements and whose peak performance is about 655 GIPS and
1.3 TFLOPS (double precision).
A Scheme to Reduce the Access Rate to Shared Memory for the Parallel
Processing System - Harray
Hayato Yamana, Satoshi Ohdan, Yoichi Muraoka (School of Science and
Engineering, Waseda Universuty)
Email: muraoka@jpnwas00.bitnet
INTERCONNECTION NETWORK------------------------
An Approach to Realizing a Reconfigurable Interconnection Network Using
Field Programmable Gate Arrays
Toshinori Sueyoshi, Itsujiro Arita (Kyushu Institute of Technology)
Kouhei Hano (Kyocera Inc.)
E-mail: sueyoshi@ai.kyutech.ac.jp
We present a new reconfigurable interconnection network utilizing
the reconfigurability facilities of FPGA (Field Programmable Gate
Array), a kind of programmable logic LSI. Reconfiguration for the
desired connections on our proposed reconfigurable interconnection
network is performed by programming the configuration data to each
FPGA, so that it can be directly implemented without simulation to
both: the static networks such as mesh and hypercube networks, and
dynamic networks such as baseline and omega networks. Consequently,
the optimum connections for interprocess communications or memory
reference patterns in executing application programs over the
reconfigurable multiprocessor can be configured adaptively by
programming.
Integrated Parallelizing Compiler - Network Synthesizer
Hiroki Akaboshi, Kazuaki murakami, Akira Fukuda (Interdisciplinary
Graduate School of Engineering Sciences, Kyushu University)
Shinji Tomita (Kyoto University)
E-mail: {akaboshi,murakami,fukuda}@is.kyushu-u.ac.jp
Evaluation for Various Implementation of base-m n-cube Network
Yasushi Kawakura, Noboru Tanabe, Takashi Suzuoka (Toshiba Research and
Development Center)
MULTIPROCESSOR I---------------------------
A Node Processor for the A-NET Multicomputer and its Execution Scheme
Tsutomu Yoshinaga, Mitsuru Suzuki, Takashi Teraoka, Hisashi Mogi,
Takanobu Baba (Department of Information Science, Faculty of
Engineering, Utsunomiya University)
E-mail: yoshi@infor.utsunomiya-u.ac.jp
The node processor of the A-NET parallel object-oriented computer
consists of a 40-bit processing element (PE) which executes methods
of allocated objects, a router which determines the path of a
message or transfers an object code, and 320KB of local memory. We
chose a high-level machine instruction set and a tagged architecture
for the PE, so that it may include supporting hardware units like an
instruction preprocessing unit and a tag processing unit. The
organization of the router is independent to the network-topology,
so that the message routing algorithm is programmable. The other
feature of the router is that it uses adaptable cut-through routing
for the packet switching, and circuit-switching object code transfer
as well.
Performance Comparison of Parallel Wire-routing on Distributed
Multiprocessors and Shared Memory Multiprocessors
Masahiko Sano, Yoshizo Takahashi (Department of Information Science and
Intelligent Systems, Faculty of Engineerig, Tokushima University)
E-mail: sano,taka@n30.is.tokushima-u.ac.jp
The Performance Evaluation of Communication Mechanism of Multiprocessor
Test Bed ATTEMPT
Sunao Torii, Hideharu Amano (Department of Computer Science, Keio
University)
MULTIPROCESSOR II------------------------------
Functional Memory Type Parallel Processors FMPP on a CAM and its
Applications
Hiroto Yasuura, Akihiro Watanabe, Ryugo Sadachi,
Keikichi Tamaru (Department of Electronics, Kyoto University)
Demand/Accept Control Mechanism and Hardware of a Parallel Computer
Masaki Tomisawa (Department of Computer Science, Faculty of
Technology, Tokyo Univ. of Agr. and Tech.)
KRPP: Kyushu University Reconfigurable Parallel Processor
Naoya Tokunaga, Shinichiro Mori, Kazuaki Murakami, Akira Fukuda
(Interdisciplinary Graduate School of Engineering Sciences, Kyushu
University)
Tomoo Ueno (Kyushu Nippon Electric Co.)
Eiji Iwata (Sony Co.)
Koji Kai (Matsushita Electric Ind. Co.)
Shinji Tomita (Kyoto University)
E-mail: {tokunaga,mori,murakami,fukuda}@is.kyushu-u.ac.jp
PARALLEL LANGUAGE------------------------------
Distributed Implementation of Stream Communication in A'UM-90
Koichi Konishi, Tsutomu Maruyama, Akihiko Konagaya (C&C Systems Research
Laboratories, NEC Corporation)
Kaoru Yoshida, Takashi Chikayama (Institute for New Generation Computer
Technology)
Intra-object Parallelism on Parallel Object Oriented Languages
Minoru Yoshida, Hidehiko Tanaka (Faculty of Engineering, University
of Tokyo)
E-mail: {minoru,tanaka}@mtl.t.u-tokyo.ac.jp
Intra-object parallelism is important because server objects
must process many messages in short time and because concurrency in
an object makes its implementation easy. The paper presents a model,
in which messages are interpreted parallelly and instance variables
are accessed instantaneously. These two points were chief
sequentiality in intra-object parallel processing. Using single-
assigned variables, instance variables can be accessed for an
instant. A language based on the model is also introduced. Because
the order of messages does not matter, it has the expressive power
for natural concurrent programming using an atomic access to
instance variables.
Hyper DEUB: A Multiwindow Debugger for Parallel Logic Programs and
Committed-Choice Language
Junichi Takemura, Hanpei Koike, Hidehiko Tanaka (Faculty of Engineering,
The University of Tokyo)
E-mail: {tatemura,koike,tanaka}@mtl.t.u-tokyo.ac.jp
The debugging of parallel programs is more difficult than that
of sequential programs. Since a Committed-Choice Language (CCL),
which is a kind of parallel logic programming language, enables
fine-grained highly parallel execution, it is very hard to examine
and to manipulate its numerous complicated control/data flows. A
debugger, whose role is to show users a model abstracted from
execution of a program, needs a model to represent execution of
fine-grained highly parallel program. To represent execution of a
CCL program, we propose a communicating process model which has
flexible levels and aspects of abstraction. Our debugger represents
this model. A parallel program has multiple complicated control/data
flows which are considered to be high-dimensional information.
Therefore, a high-dimensional interface is necessary to debug it.
Since a user compares a model represented by a debugger with
expected behavior of the program in order to find a bug in the
program, the debugger must provide the kind of view he/she wants.
Accordingly, the debugger must provide views which have flexible
levels and aspects of abstraction. We developed a multiwindow
debugger HyperDEBU which provides a high-dimensional interface.
HyperDEBU provides windows flexible enough for programmers to
examine and manipulate complicated structures composed of multiple
control/data flows.
PARALLEL SYSTEM/ EVALUATION----------------------------
On the Real Number Index Sperce Array in the Dataflow Stream Language VISDAL
Hirohisa Mori, Kazuhiko Kato, Hiroaki Takada (Dept. of Information
Science, Faculty of Science, University of Tokyo)
Quantitative Evaluation of Several Synchronization Mechanisms Based on
Static Scheduling and Fuzzy Barriere
Hiromitsu Takagi, Takaya Arita, Masahiro Sowa (Department of Electrical
Engineering and Computer Science, Nagoya Institute of Technology)
E-mail: takagi@craps.elcom.nitech.ac.jp
Parallel Garbage Collection on a Shared Memory Multi-Processor and its
Evaluation
Akira Imai (Institute for New Generation Computer Technology)
Evan Tick (Univ. of Oregon)
Katsuto Nakajima (Mitsubishi Electric Co.)
Atsuhiro Goto (NTT)
PARALLELIZING COMPILER--------------------------
Prototype FORTRAN to Data Flow-Compile for Parallel Processing System -
Harray
Toshiaki Yasue, Jun Kohdate, Hayato Yamana, Yoichi Muraoka
(School of Science and Engineering, Waseda University)
E-mail: yasu@muraoka.info.waseda.ac.jp
APARC: Parallelizing Compiler for Parallel Computer ADENART
Koji Zaiki, Akiyoshi Wakatani, Tadashi Okamoto (Matsushita Electric
Industrial Co., Ltd., Semiconductor Research Center))
Shigeru Kuroda (Matsushita Softresearch, Inc.)
E-mail: zaiki@vdrl.src.mei.co.jp
The parallelizing compiler, APARC translates FORTRAN programs
into ADETRAN programs that are high level parallel language for the
parallel computer ADENART. Mainly APARC changes do loops into
parallel executable codes by control flow analysis and data
dependence analysis.
ADENART has a fast data communication network between
PE's(Processing Element) and synchronization mechanism. APARC
uses this advantages in parallelization. Especially, even if do
loops have goto statements that branch out of do loops, they can be
changed into parallel executable codes by APARC with exception
handling routines inserted. Now, a prototype version of APARC is
available, and some applications can be translated. In the near
future, we will make APARC available for many applications.
DISTRAN System (Distributed Systems Translator) Implementation on
Parallel Computers
Kiyohiro Suzuki, Nobuyuki Yamasaki, Takao Yumiba, Kaoru Murata, Taisuke
Boku (Faculty of Science and Technology, Keio University)
Email: taisuke@kw.phys.keio.ac.jp
When solving problems described with partial differential
equations, the most general method is to discritize the space and
time domains, and calculate all spatial domains step by step. This
method requires a large amount of calculation if the density of the
mesh is high enough to get accurate solutions. However, all
spatially discritized domains can be calculated in parallel, and it
is possible to achieve high performance when calculating them on
large scale multiprocessors.
DISTRAN is a partial differential equation solver on parallel
processors using this method. With DISTRAN, a user can solve the
problem only describing a very simple form of problem specification,
consisting of the original partial differential equations, boundary
and initial conditions, and domain information. No actual
programming by the user is needed.
DISTRAN analyzes the given equations and checks their consistency.
The problem domain is discritized automatically, and all spatial
points and boundaries are calculated to satisfy given conditions.
Finally, DISTRAN generates a program to solve the problem on a
sequential or parallel processor. Currently, we have implemented
three versions of DISTRAN for three types of parallel processors,
MiPAX-32 [a commercial version of U-Tsukuba's PAX], QCDPAX and a
Transputer system. The first two machines are based on a shared
memory and global synchronization mechanism. The last one is based
on message passing links. We calculated the same problem on each
system, and confirmed that DISTRAN achieves actual high performance.
In this paper, we describe how to design and implement such an
automated programming and solving system on several types of
multiprocessors. We also show the actual performance of each system
and evaluate the calculation efficiency by DISTRAN.
PARALLEL OS-------------------------------
A Testbed OS for Evaluation of Parallel Algorithms
Takahiro Yakoh, Yuichiro Anzai (Department of Computer Science, Keio
University)
Parallel Processings in OS Kernel by the Process Network Architecture
Yasuichi Nakayama, Iwao Morishita (University of Tokyo)
Kazuya Tago (IBM Japan Tokyo Laboratories)
E-mail: yasu@meip7s.t.u-tokyo.ac.jp
A parallel operating system has been designed and implemented on
a loosely-coupled multiprocessor system employing the process
network architecture.
The operating system consists of a number of light-weight
processes interconnected by rendezvous communications and is
compatible with the UNIX system. It has been shown that when this
process network is distributed on multiple computer units with an
optimum assignment, some processes can run in parallel with the
others.
In this paper we consider parallel processings in OS kernel in
order to improve the response of a system call.
On Paralleling Transaction Processes by Exchanging Messages
Haruo Yokota, Yasuo Noguchi, Riichiro Take (Fujitsu Laboratories, Ltd.)
NUMERIC PROCESSING
Parallelizing Gaussian Elimination on PAX
Kimio Takahashi (Scientific Technology, Tsukuba Univ.)
Study on the Algorithms for Matrix Solver on Massively Parallel Computer
Mitsuyoshi Igai (Hitachi VLSI Engineering Corp.)
Toshio Okouchi, Chisato Konno (Central Research Lab, Hitachi, Ltd.)
Molecular Dynamics Simulation on a Highly Parallel Computer AP1000
Yoshiyuki Sato (Computer-Based Systems Lab., Fujitsu Labs Ltd.)
E-mail: hsat@flab.fujitsu.co.jp
Yasumasa Tanaka (Fujitsu Ltd.)
Hiroshi Iwama, Shigetsugu Kawakita, Minoru Saito, Kenji Morikami,
Toru Yao (Protein Engineering Research Institute) Shigenoru
Tsutsumi, Hideaki Yoshijima (Fujitsu Kyushu System Engineering)
Parallel Nonlinear MHD Plasma Simulator
Satoshi Matsushita, Nobuhiko Koike (NEC Corporation)
Masaru Narusawa (NEC Scientific Information System Development Ltd.)
Genichi Kurita, Toshihide Tsunematsu, Tatsuoki Takeda (Japan Atomic
Energy Research Institute)
Email: {matsushita, koike}@csl.cl.nec.co.jp
AEOLUS is a non-linear Plasma simulator for instability (called
disruption) analysis of Tokamak Plasma in a Nuclear Fusion Reactor,
which is very time consuming. As most of AEOLUS's calculation is
non-linear, it employs explicit time integration. However, by
applying an implicit method to the linear part, we have improved its
convergence. We tried to parallelize the AEOLUS code developed and
tuned for a vector machine at the Japan Atomic Energy Research
Institute. The vector code ran 6 to 7 times faster than its scalar
counterpart. The small parallelism in the implicit part limits the
speed-up. We propose a novel parallel algorithm for MIMD parallel
machines, and successfully parallelized the implicit part of the
simulation. We have achieved a speed-up of 42 using the 64 processor
Cenju. (Cenju is a multiprocessor system with a distributed shared
memory scheme developed mainly for circuit simulation. Cenju is
designed for effective execution of our modular circuit simulation
algorithms.) (References follow.)
1. T.Takeda, K.Tani, S.Matsushita, et al.:
Plasma Simulator METIS and Tokamaku Plasma Analysis,
US-Japan Workshop on Advances in Computer Simulation Techniques
Applied to Plasma and Fusion, (1990).
2. T. Nakata et. al: Cenju: A Multiprocessor System with
a Distributed Shared Memory Scheme for Modular Circuit Simulation,
Proc. International Symposium on Shared Memory Multiprocessing ,
pp.82-90, April (1991).
COMPUTER AIDED DESIGN FOR LARGE SCALE INTEGRATION--------------
Parallel Logic Simulation based on Virtual Time
Yukinori Matsumoto, Kazuo Taki (Institute for New Generation
Computer Technology)
Email: yumatumo@icot.or.jp
Author's abstract: This paper focuses on parallel logic simulation.
An efficient logic simulation system on a large-scale multiprocessor
is targeted. The Time Warp mechanism, an optimistic approach, was
experimented and evaluated though it has been said that rollback
processes costed much. The system is implemented on the Multi-PSI, a
distributed memory multiprocessor. It includes several new ideas to
enhance the performance, such as local message scheduler,
antimessage reduction mechanism and load distribution scheme. In our
experiment, using 64 processors, about 48-fold speedup was attained
and the performance of the whole system amounted to about 60 k
events/sec that is fairly good as a full software simulator. Then
this paper reports the empirical comparison between the Time Warp
mechanism and two conservative mechanisms: an asynchronous approach
using null messages and a synchronous approach. The comparison shows
that the Time Warp mechanism will be the most efficient of the
three, and could be the most suitable for large-scale
multiprocessors.
[Comment: The parallel logic simulation is treated as a parallel
event simulation. In parallel event simulation, the time keeping is
important. There are two time keeping algorithms, which are
conservative method and the virtual time method. As conservative
method may introduce a deadlock, the means to avoid the deadlock is
important. The virtual time method, although deadlock never takes
place, needs a rollback operation when there occurs a time
discrepancy. The authors have implemented parallel logic simulation
program based on virtual time method on their parallel computer
Multi-PSI, which has 64 PSI computers interconnected with orthogonal
bus. The performance observed by experiment is 60 kilo events per
seconds and the speed-up ratio obtained is more than 40 by using 64
processors.
A comment made by Prof. Yasuura of Kyoto University, however,
pointed out that even a single workstation can attain as high as
100 kilo events per seconds.]
Massively Parallel Layout Engine - Routing Processor
K. Kawamura, T. Shindo, T. Shibuya, H. Miwatari, Y. Ohki, T. Doi
(Computer-Based Systems Lab., Fujitsu Laboratoties Ltd.)
The authors have developed a new algorithm called the constrained
relaxational maze running algorithm for automated wire-routing. In
this method the intersection of nets are allowed but is evaluated by
a cost function. By iterating the routing by decrementing the cost
of penalty, the optimum routings are finally obtained.
They have built a massively parallel computer to implement this
algorithm. This machine is called MAPLE-RP, which has 8K 1bit PU
connected in lattice and operate in SIMD. The performance is 40 GOPS
when 64K PU are used. The performances of routing rate and the
routing speed were observed quite satisfactory.
A Parallel Router based on a Concurrent Object-oriented Model
Hiroshi Date, Yoshihisa Ohtake, Kazuo Taki (Institute for New
Generation Computer Technology)
E-mail: date@icot.or.jp
Author's abstract: The design of LSI routing is well known as a
process theat requires massive computational power. So speedup using
parallel processing leads to a shortening in the LSI design period.
This paper presents a new parallel router based on a concurrent
object-oriented model. The objects corresponding to line segments
find the path between terminals by exchanging messages with each
other. This method has high parallelisms. The searching algorithm of
our model is based on a look-ahead line search algorithm. We
implemented this algorithm using the KL1 language on Multi-PSI. We
have been verifying our router using real LSI data, the initial
results are described.
[Comment. This paper presents a parallel routing algorithm based on
look-ahead line-search algorithm and the result of speedup obtained
by running the program on their parallel computer Multi-PSI. The
algorithm is based on the object-oriented model in the sense that
each net is considered an object which exchange messages to avoid
intersection. Although the obtained speedup was favorable, the
routing rate was not.]
ARTIFICIAL INTELLIGENCE/DATA BASE------------------------
A Parallel Processing Feature of a DBMS with SCMP for OLTP
Kazumi Hayashi, Kazuhiko Saitoh, Tomohiro Hayashi, Masaaki Mitani,
Hiroshi Ohsato, Takashi Obata, Yutaka Sekine, Mitsuhiro Ura, Takuji Ishii
(2nd Software Division, Computer System Group, Fujitsu Ltd.)
Parallel Dynamic Map Construction and Navigation in Real-Time for
Autonomous Robots (ENGLISH)
Martin Nilsson (Swedish Institute of Computer Science, Box 1263, S-164 28
Kista, Sweden)
E-mail: mn@sics.se
Real-time map construction and navigation are complex and
computationally intensive tasks, but contain much potential
parallelism. This paper describes how programming techniques based
on committed-choice languages can be used to both concisely express
algorithms for such problems, and extract their parallelism.
Parallel Processing of ATMS on the Heterogeneous Distributed System
NueLinda
Hiroshi G. Okuno (NTT Basic Research Laboratories)
Osamu Akashi, Kenichiro Murakami, Yoshiji Amagi (NTT Software
Laboratories)
E-mail: okuno@ntt-20.ntt.jp, murakami@ntt-20.ntt.jp,
akashi@toshi.ntt.jp, amagi@nuesun.ntt.jp
We have proposed NueLinda computation model which integrates
various heterogeneous distributed systems and provides computing and
data resources in a transparent and uniform manner. On the NueLinda
model, We have designed and implemented TAO-Linda on the Lisp
machine.
ATMS (Assumption-based Truth Maintenance System) is an
intelligent data base in the sense that it maintains the support
sets for each data. A conventional database can contain only one
consistent context of data, while the ATMS provides to the inference
engine the multiple-context mechanism. ATMS is considered as one of
the essential facilities for AI systems of the next generation and
its execution speed needs to be improved drastically.
In this paper, we discuss about the parallel processing of ATMS
with TAO-Linda and compare the resulting implementation with the
parallel processing of ATMS on a shared-memory machine.
PARALLEL COMPUTING MODEL--------------------------
Message-flow: A New Computation Model for MIMD-type Parallel Machines
Hiroaki Fujii (Hitachi Ltd.)
Kiyoshi Shibayama (Faculty of Engineering, Kyoto University)
A Hybrid Group Reflective Architecture for Object-Oriented Concurrent
Programming
Takuo Watanabe, Satoshi Matsuoka, Akinori Yonezawa (Department of
Information Science, The University of Tokyo)
E-mail: {takuo,matsu,yonezawa}@is.s.u-tokyo.ac.jp
The benefits of computational reflection are the abilities to
reason and alter the dynamic behavior of computation from within the
language framework. This is more beneficial in
concurrent/distributed computing, where the complexity of the system
is much greater compared to sequential computing; we have
demonstrated various benefits in our past research of Object-
Oriented Concurrent Reflective (OOCR) architectures. Unfortunately,
attempts to formulate reflective features provided in practical
reflective systems, such as resource management, have led to some
difficulties in maintaining the linguistic lucidity necessary in
computational reflection. The primary reason is that previous OOCR
architectures lack the ingredients for group-wide object
coordination.
We present a new OOCR system architecture called "Hybrid Group
Reflective Architecture (HGRA)", and a new language ABCL/R2 based on
this architecture. The key features of ABCL/R2 are the notion of
heterogeneous object groups and coordinated management of group
shared computational resources. We describe how such management can
be effectively modeled and adaptively modified/controlled with the
reflective features of ABCL/R2. We also illustrate that this
architecture is totally defined in meta-circular way (not adopting
ad-hoc primitives), embodying two directions of reflective towers.
Towards Realistic Type Inference for Guarded Horn Clauses (ENGLISH)
Dongwook Shin (Fujitsu Laboratories, IIAS)
E-mail: shin@iias.flab.fujitsu.co.jp
This paper proposes a type inference system for Guarded Horn
Clauses, GHC, based on the notion of value and communication type.
A value type is a type that a predicate can have, guaranteeing that
a goal predicate of the value type does not raise type errors at run
time. A communication type is a type under which several predicates
communicate with one another. These types are obtained by constraint
solving and the pre-evaluation of a GHC program to some extent. We
are expecting that these types contribute to the early detection of
errors in GHC program development.
ALGORITHMS----------------------------
A Process Control Scheme for Distributed Processing Systems Using
Weighted Throw Counting
Kazuaki Rokusawa (Systems Laboratory, OKI)
E-mail: rokusawa@okilab.oki.co.jp (or) rokusawa@icot.or.jp
Nobuyuki Ichiyoshi (Institute for New Generation Computer
Technology)
E-mail: ichiyoshi@icot.or.jp
This paper proposes a new scheme for
aborting/stopping/restarting (in general, changing the execution
state of) a pool of processes in a distributed environment where
there may be processes in transit. The scheme guarantees that all
processes belonging to the pool change state and to detect the
completion of state change, and works under FIFO and non-FIFO
communication. It uses broadcasting and weighted throw counting, and
only requires a few words per processor per process pool.
Sort m Smallest Elements Problem on a Linearly Connected Processor Array
with Multiple Buses
Satoshi Fujita, Masafumi Yamashita, Tadashi Ae (Faculty of Engineering,
Hiroshima University)
Time Bounds for Sorting and Routing Problems on Mesh-Bus Computers
Kazuo Iwama, Eiji Miyano (Faculty of Engineering, Kyushu University)
Yahiko Kambayashi (Faculty of Engineering, Kyoto University)
SUPER PARALLEL APPROXIMATE COMPUTING MODEL--------------------
Fuzzy 0-1 Combinatorial Optimization through Neural Networks
Masatoshi Sakawa, Toru Mitani (Department of Industrial and Systems
Engineering, Faculty of Engineering, Hiroshima University)
Kazuya Sawada (Information System Center, Matsushita Electric Works,
Ltd.)
E-mail: sakawa@msl.sys.hiroshima-u.ac.jp
Dynamic Modification of the Free Energy Function Improves Ability to Find
Good Solutions on a Hopfield Neural Networks
Yutaka Akiyama, Tatsumi Furuya (Electrotechnical Laboratory)
E-mail: yakiyama@etl.go.jp
Four novel techniques for global optimization on a Hopfield
neural network are proposed. The sharpening method dynamically
modifies the gain of the neuron's input/output function. The excess
bias method provides an excessive input bias to improve the energy
"landscape". The emphasizing method dynamically changes balance
among constraints. And the annealing method controls randomness in
the stochastic Hopfield model (the Gaussian Machine). By combining
these techniques, the neural network shows excellent ability to
solve optimization problems.
The Chain Reaction in Adaptive Junction Networks
Yoshiaki Ajioka, Yuichiro Anzai (Department of Computer Science, Keio
University)
E-mail: ajioka@aa.cs.keio.ac.jp
Although Neural Networks are useful for pattern recognition,
they are not common for sequential processing. We made Adaptive
Junction, which is a feedback-type neural network recognizing
spatio-temporal patterns. This paper proves that Adaptive Junction
networks can perform the chain reaction for any spatio-temporal
patterns when each neuron has a 1-degree feature pattern. From this
result, the order of the number of neurons desired to recognize some
spatio-temporal patterns becomes clear in Adaptive Junction
networks.
A Genetic Algorithms Approach to How to Represent the Basin of
Associative Memory Model
Keiji Suzuki, Yukinori Kakazu (Department of Engineering, Hokkaido
University)
------------------------------------------------------------------------
INFORMATION PROCESSING SOCIETY OF JAPAN
Vol 32, No. 4
SPECIAL ISSUE ON MASSIVELY PARALLEL COMPUTERS AND APPLICATIONS
The Way to Massively Parallel Computers
Takanobu Baba (Department of Information Science, Utsnomiya University)
Realization Technologies for Massively Parallel Machines
Shigeru Oyanagi, Noboru Tanabe (Toshiba R&D Center)
Super-parallel Computer ADENA for Scientific Simulation
Tatsuo Nogi (Division of Applied Systems Science, Faculty of Engineering,
Kyoto University)
Neural Network Model Processing on Massively Parallel Computers
Noboru Sonehara, Makoto Hirayama (ATR Auditory and Visual Research
Laboratories)
Commercial Massive Parallel SIMD Computer and its Application
Masaru Kitsuregawa (Institute of Industrial Science, University of Tokyo)
Taiichi Yuasa (Toyohashi University of Technology)
Logic Programming Oriented Inference Machine
Hidehiko Tanaka (Department of Electrical Engineering, University of
Tokyo)
Implementation for Sequential Logic Programming Languages
Minoru Yokota (Computer System Research Laboratory, C&C Systems Research
Laboratories, NEC Corporation)
Parallel Implementation Schemes of Logic Programming Languages
Nobuyuki Ichiyoshi (Institute for New Generation Computer Technology)
Architecture of Sequential Inference Machine
Yukio Kaneda, Hideo Matsuda (Dept. of Systems Engineering, Faculty of
Engineering, Kobe University)
Parallel Inference Machine Architecture
Atsuhiro Goto (Software Research Laboratory, NTT Software Laboratories)
-----------------------END OF REPORT------------------------------------
--
=========================== MODERATOR ==============================
Steve Stevenson {steve,fpst}@hubcap.clemson.edu
Department of Computer Science, comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell