rick@cs.arizona.edu (Rick Schlichting) (10/02/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Far East (ONRFE).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
To: Distribution
>From: David K. Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp]
Tony F. Chan UCLA [chan@math.ucla.edu]
Re: 4th ISR Supercomputing Workshop 29-31 August 1990, Hakone, Japan.
Date: 27 Sept 1990
ABSTRACT. This report describes the 4th ISR Supercomputing Workshop: The
Road to Parallel Applications, held from August 29 to 31, 1990 in Hakone,
Japan. In addition, some observations on the trends and characteristics
of parallel supercomputing research in Japan are presented.
Most of the text of this report was prepared by Professor T. F. Chan
Dept. of Mathematics, Univ. of Calif. at Los Angeles, CA 90024. In some
places I have inserted references to earlier reports of mine (DKK) when
these supplement Chan's comments. Chan's travel expenses were supported
by ISR and some local expenses were supported by my office, ONRFE.
INTRODUCTION.
The Institute for Supercomputing Research (ISR) is a private non-profit
research institute established in 1987 to "conduct research on issues in
supercomputing and parallel processing, ... , and to strengthen ties with
universities and research centers in Japan". It is funded by the Recruit
Corporation, which is a multi-billion dollar company in Japan whose main
business is in recruiting college graduates for the major corporations
but it also has a division which sells computer services. The director
is Dr. Raul Mendez, who has a Ph.D. from U.C. Berkeley under Alexander
Chorin and who is well-known for some of the earliest benchmark tests on
the Japanese supercomputers in the early 80's.
The ISR has been organizing a series of annual workshops on various
topics in supercomputing. Typically, both Japanese and US researchers
are invited. Last summer it was held in Hawaii and this year the venue
was Hakone, a resort about 2 hours from Tokyo, famous for its
hotsprings and the view of Mt. Fuji. There were about 40 registered
participants, mostly Japanese, with three speakers from the US: Olof
Lubeck of Los Alamos, John Levesque of Pacific-Sierra Research and
myself. There were 13 talks total and a panel discussion on "The future
and evolution of scientific computing". A program is attached and an
informal proceedings was available at the conference. The atmosphere was
relaxed but intimate, and there were many lively discussions both during
and after the formal lectures.
LECTURES.
Four main themes of the conference can be identified: parallel algorithms
(with emphasis on PDEs), hardware (both general and special purpose) for
scientific computing, dataflow, and computing environments (languages,
networks, programming tools). This reflects the organizers' attempt to
cover the main issues in parallel supercomputing and it mostly succeeded
because there were many discussions during the workshop on how these
areas should interact.
Algorithms.
The numerical solution of partial differential equations (PDEs)
represents a major demand for supercomputing resources and they are
widely employed in many areas of science and engineering, as a result of
the fundamental fact that most physical laws are expressed as PDEs
mathematically. It therefore makes sense to look at some of the basic
PDE algorithms more carefully, especially in view of the advent of
parallel computing. Several speakers addressed this issue. Prof.
Toshio Kawai of Keio University tried to convince the audience that
nature is the best parallel supercomputer and it also provides a very
powerful class of algorithms for these machines. He calls these
"natural algorithms" -- namely explicit in time algorithms which are
based on local interactions in space. He has produced a programming
system called DISTRAN (written in PROLOG and publicly available), an
ELLPACK-like system which allows the user to easily specify the PDE and
obtain reliable results quickly. (See also my report 11 April 1990 in
which this topic is also mentioned. At that time I thought the idea was
too good to be true. Perhaps someone can request the program and perform
a critical evaluation. DKK)
On the other hand, Chan's talk tried to argue that the most appropriate
class of algorithms for massively parallel computers are hierarchical
(multilevel) ones. He based his arguments on the observation that many
problems in nature are hierarchical in nature (e.g. having many different
scales in time and space) and therefore the most efficient algorithms
require some of form of global communication. Hierarchical algorithms
are a reasonable compromise between explicit algorithms, which are high
parallelizable but slowly convergent, and fully implicit algorithms,
which are fast convergent but difficult to parallelize. Besides they can
be implemented efficiently on hierarchical parallel computers, such as
the CM-2, the hypercubes and clustered hierarchical shared memory
systems.
Very often, existing algorithms for a particular problem are not
naturally parallelizable and one has to devise novel parallel algorithms.
Prof. Yoshizo Takahashi of Tokushima University presented several such
algorithms for a automated wire-routing problem specifically adapted to
the Coral parallel computer, a binary tree distributed memory MIMD
machine based on the MC68000 chip. These algorithms are particularly
interesting because they are true MIMD algorithms for a realistic
unstructured problem running on a real parallel machine and they
outperform the best commericial software running on a SUN 3/260.
A central issue in the design of parallel algorithms for MIMD computers
is how to map the data into the processors so as to minimize data
communication. George Abe of ISR presented results on comparing a ring
mapping to a 2D mapping for a semiconductor device modelling problem on
the iPSC/1. Comparisons with similar results on an Alliant FX/8-4 are
also given. He concluded that in two dimensions the difference in
performance for the two mappings can be large, with the two dimensional
mapping being more efficient.
Hardware.
With the advent of multiprocessor systems with a relatively large number
of off-the-shelf inexpensive processors, it has become increasingly easy
and cost-effective to build special purpose hardware for special
applications, as an alternative to conventional mainframe general purpose
supercomputers. Prof. Yoshio Oyanagi of the University of Tsukuba calls
these "multi-purpose" computers. Japan, long recognized for its
manufacturing prowness especially in electronics and computers, is primed
for following this approach.
Physics seems to be the primary field for which special purpose
computers have been built. Three machines of this kind were discussed at
the conference. The first is QCDPAX which is for QCD lattice
simulations. Apparently, the world-wide physics community has recognized
the potential of parallel computing and several countries (including
Italy, USA and Japan) have initialized projects to build special purpose
hardware for this application. QCDPAX is a MIMD machine with 432
processing units, connected through a 2D nearest neighbor grid and a
common bus. Each processing element consists of a 32 bit microprocessor
MC68020, a floating point chip L64133 and an LSI for vector operation, 2
MB of fast memory and 4MB of slow memory. Measured peak performance is
12.25 Gflops. For matrix vector multiplies, 5 Gflops is attainable. For
the QCD problem, a preconditioned conjugate gradient method is used. The
project was funded at a level of about two million US dollars for the
FY87 to FY89. A commerical product is now being marketed by the Anritsu
Corporation (model DSV 6450, 4 sold). (See also reports on PAX and
Anritsu, 11, 12 April 1990, and 28 April 1990, DKK).
Another special purpose machine discussed (by J. Makino of the Dept. of
Earth Sciences and Astronomy of the Univ. of Tokyo and ISR) is the GRAPE-
1 (GRAvitational PipE) developed at the University of Tokyo for
gravitational N-body problems. It is not really a computer in the usual
sense because it is not programmable but instead is viewed as a backend
computational processor for performing only the N-body force
computations. Effective performance of 120 Mflops has been achieved.
The high performance derives from the use of three arithmetic pipelines
corresponding to the three spatial co-ordinates. An interesting feature
is the use of variable precision: 8 bits for force calculations, 16 bits
for positional data, and 48 bits for force additions. A General Purpose
Interface Bus (GPIB) connects the GRAPE-1 with the host (a Sony
workstation). This project is most impressive in its speed of
completion. The design started in March 89, the hardware was ready by
September 89 and production runs began at the same time. A follow-up
GRAPE-2 project is now in progress, with parallel pipelines, and improved
precisions (64/32 bits). Makino estimates that a 50 board, 15 Gflops
system can be built for US $100,000 and a 500 board, 150 Gflops system
for US $300,000. A GRAPE-3 system is also under design. Following
Makino, Junichi Ebisuzaki (Dept. of Earth Sciences and Astronomy of the
Univ. of Tokyo) talked about adapting other many body simulations for the
GRAPE system. The basic modification needed is to accomodate the
different forms of the force law. He discussed applications in plasma
physics and molecular dynamics.
Prof. Nobuyasu Ito of the Department of Physics at the University of
Tokyo gave a seemingly exciting and entertaining talk (judged only from
the reaction of the audience, since it was given in Japanese!), in which
he described the m-TIS (Mega spin per second University of Tokyo Ising
Spin) computer for simulating the many body problem arising from Ising
systems. A successor m-TISII system has also been built.
Lest you think the Japanese supercomputer field is only producing special
purpose hardware, rest assured that the really big boys have also been
doing their homeworks. Akihiro Iwaya of NEC described the NEC SX-3
computer, which was widely reported in the US press as the fastest
general purpose supercomputer today. He reported that the performance
ranges from 0.68 to 22 Gflop, depending on the particular computation
performed. The machine has a SIMD architecture (which he estimated is
sufficient to handle more than 80% of all applications), with shared
memory (because "FORTRAN is based on shared memory") and up to four
processors (he estimated that 16-32 such processors is within practical
limits) each with multiple pipelined arithmetic processors. He also
discussed several system issues such as synchronization primitives,
ParallelDo and ParallelCase statements, and micro/macro-tasking. All in
all a very Cray-like machine with blazingly fast peak performance. (See
also reports on SX-3 25 April 1990, and 19 Sept 1990, DKK.)
Finally, Shin Hashimoto of Fijitsu described the High Speed Parallel
Processor (HPP), which has been developed under a joint project between
MITI and six computer companies (including Fujitsu, NEC and Hitachi) from
1981 to 1990. The main idea is to connect several conventional
supercomputers (e.g. Fujitsu VP2000) via a Common Storage Unit (CSU) and
a Large High-Speed Storage (LHS). The data transfer rate between the HPP
and the LHS is 1.5 Gbytes/sec. The peak performance is over 10 Gflops.
It comes with its own parallel language Phil, which has the usual
parallel-do and lock and barrier statements, and a very user-friendly
programming environment with execution viewers, cost analyzer and a
parallel verifier. Surprisingly, there has been no plan yet for turning
it into a commercial product. (See report of the highspeed project, 3
July 1990, DKK.)
Dataflow.
One of the most difficult tasks in designing parallel programming systems
is the automatic detection and extraction of parallelism in programs.
The dataflow approach has long been advocated as one model for achieving
this goal and in a fundamental way it is very attractive because it
looks at the basic level of computation. While the dataflow approach has
not yet been demonstrated to be competitive in practice (practical
dataflow machines are not exactly prolifilating at this moment), we
should aim for the ideal nontheless, as Olaf Lubeck of the Computing
Divison at the Los Alamos National Laboratory implored us to do in his
talk. He has been working closely with both the group led by Arvind at
MIT and the SIGMA-1 group at Electrotechnical Laboratory (ETL) of Japan.
He claims that the main advantages of dataflow is that it produces
deterministic computations and it extracts maximum parallelism. In
addition to some general comments about dataflow, he also discussed a
more technical problem concerning how to "throttle" loop activations so
that loops statements do not generate a big demand on system resources
(i.e. memory) in the early iterations in a dataflow model. (See also
reports on ETL projects, 2 July 1990, 16 August 1990, DKK).
Toshio Sekiguchi, also from ETL, described his efforts in designing the
parallel dataflow language DFC II for the SIGMA-1 dataflow computer
currently being developed. The SIGMA-1 is an instruction-level dataflow
machine, with 128 processing elements, 640 MIPS, 427 MFlops and 330
Mbytes of memory. DFC II is C based (functional langugages were
deliberately not chosen because they want the language to be useful "for
practical problems") and allows synchronization, global variables and, of
course, automatic detection of parallelism. The motto is: "sequential
description, parallel execution". Applications that have been run
include QCD, PIC, Keno and LINPACK.
Environment.
It is wide recognized that one of the potential stumbling blocks on the
road to the utopia of parallel computing for the masses is that parallel
programming is an order of magnitude more difficult than vector
programming, not to mention sequential programming. Without user-
friendly and yet powerful programming enviroments, parallel computing may
never reach the promised land. One of the main themes of the workshop is
on environments.
John Levesque of the Pacific Sierra Research Corp. (PSR) was the main
speaker on this issue. John is one of the leaders in this field and he
had just published a book on optimization techniques for supercomputers.
He described the philosophy behind the FORGE and MIMDizer systems that
have been developed at PSR. FORGE is an integrated environment
consisting of program development modules, static and dynamic performance
monitors, sequential and parallel debugging, memory mapping modules,
automatic optimization and a menu driven interface. John stressed the
importance of building a database of information about the program and
collecting both static and runtime statistics in order to optimize
performance. MINDizer is a brand new system scheduled to be delivered
this October. As the name suggests, it is designed for easing the
porting of programs to distributed memory MIMD machines. The key idea is
"array decomposition", i.e. the user specifies the mapping of data arrays
and MIMDizer handles automatically all communication interfaces. This
appears to be a very practical approach between automatic parallel
compilers and explicit data mapping and message passing by the user.
Anyone of us who uses electronic mail realizes the importance of
networks. But networks can also play a critical road in the computing
environment for supercomputing in the near future, according to Raul
Mendez in his banquet talk. His dream is "supercomputing from a laptop"
--- and the way to achieve that is through networks. He discussed the
existing networks in the US and Europe, as well as the several networks
being developed in Japan and over the Pacific.
PANEL DISCUSSION.
The most lively discussions of the whole workshop occurred during the
panel discussion, which should come as no surprise when one considers
that the theme was: "The Future and Evolution of Scientific Computing",
obviously a subject matter very dear to every participants' heart. The
panelists were: Genki Yagawa (Dept. of Nuclear Eng., Univ. of Tokyo),
Katsunobu Nishihara (Inst. of Laser Eng., Osaka Univ.), Kida (Kyoto
Univ.), D. Sugimoto (Univ. of Tokyo), and four of the speakers: Lubeck,
LeVeque, Chan, and Oyanagi.
Mendez led off with the three main topics for discussion:
1. What will computational requirements be like in the next decade?
2. What is the outlook for SIMD and MIMD architectures?
Shared versus distributed memory?
3. What other trends will come to play a significant role: dedicated
machines, dataflow architectures, micropressors, etc.?
Concerning Question 1 above, it is clear from the discussions that
everyone thinks that there is no forseeable upper bound to the
computational requirements for supercomputers; in fact the demand is
limited by the current supercomputers at any one moment in time. Even
with a teraflop machine, practical engineering computations (100^3 grids,
with 3 variables for point) could still require one hour of CPU time.
And it will require enormous amount of memory. In fact, the cost of
memory may be a major barrier to building a teraflop machine: assuming a
scaling law of 1 Mbytes per 1 Mflops, a teraflop machine will require
about 20 billion dollars today just for the memory! Developments in
algorithm design will also have to follow the pace of hardware and
architectural advances (as it has been throughout the history of
computing).
Concerning Question 2, some interesting consensus emerged. While some
panelists think that the SIMD architecture is sufficient for many
problems (e.g. QCD), many personally prefer MIMD machines for their
flexibility. The most likely trend will be hybrid (or cluster,
hierarchical) architectures, with MIMD at the higher levels and SIMD at
the lower levels. Concerning memory architecture (shared or
distributed), many believe that hiding the storage structure of data will
undoubtedly lead to performance degradation and therefore some user input
is essential. No one believes we'll see automatic and efficient
compilers for parallel machines in the forseeable future.
Concerning Question 3, our representative from the dataflow camp (Lubeck)
said that ignoring dataflow will be settling for second best and we
should be "going for the gold", even though that may take some time.
Someone pointed out also that while current research has primarily
focused on the solution techniques, other aspects of the scientific
computing process, such as mesh generation and visualization, will be
playing a more important role in the future. And finally, while parallel
machines are much more difficult to use than vector machines, users are
willing to plunge in when given sufficient incentive (e.g. cost
effectiveness of the CM-2).
OBSERVATIONS (Chan).
As someone who works on parallel algorithms, the most obvious thing was
the small number of talks on this topic. I realize that this could be
just a feature of this particular workshop, but in general I have not
been aware of an active research community in parallel algorithms
development in Japan.
On the other hand, the hardware development in Japan has been truly
impressive, both in terms of raw power and the speed and low cost at
which special purpose machines are built. However, I did not see much
in architectural innovations, and most of the designs follow trends
already established in the industry. During the banquet, I was informed
by a Fujitsu engineer that the company is building Japan's first
commercial distributed memory MIMD machine --- from the terse description
it resembles the several US hypercubes (1K processors, SPARC chip, grid
connection topology and "wormhole" routing.)
Another observation that I made was that many of the talks were based on
work by interdisciplinary teams, consisting of physical scientists who
have real problems to solve and hardware and software computer designers.
In fact, Japanese physicists seem to play a very active role in parallel
computing --- all the special machines mentioned were built for physics
problems. Even though there were several academic engineers on the
panel, I could not tell how big an influence they have had in this field
in Japan.
Overall, attending this workshop was a very pleasant experience for me.
I met many interesting people (and everyone was very friendly and open)
and my hosts Raul Mendez and Chris Eoyang were most gracious. I just
wished my knowledge of Japanese was better than just reading of Kanji so
I could understand all the jokes during the few talks delivered in
Japanese!
(The observations above are Chan's. Nevertheless they mostly echo my own
feelings and I have often made similar remarks in my reports. In fact,
readers should note that many of the presentations describe work very
close to that published or presented elsewhere. However, I do not agree
entirely with the comment about architectural innovation. There are only
a few really different computer organizations. Innovation (as opposed to
revelation) comes from figuring out how to design so that all the pieces
work harmoniously. The Japanese researchers seem at least as capable as
those in the west in finding methods to do this. DKK)
PROGRAM: 4th ISR Supercomputing Workshop
Raul Mendez (ISR) Opening Remarks
Toshio Kawai (Keio University) "Standard Solutionf to Partial
Differential Equations on Supercomputers"
Yoshizo Takahashi (Tokashima University) "Parallel Automated Wire-Routing
With a Number of Cometing Processors"
Geroge Abe (ISR) "Partial Differential Equation Solvers and Architectures
for Parallel Scientific Computing"
Toshio Sekiguchi (Electrotechnical Laboratory) "The Design of the
Practical Language DPCII and its Data Structures"
Olaf Lubeck (Los Alamos National Laboratory) "Resource Management in
Dataflow: A Case Study Using Two Numerical Applications"
Yoshio Oyanagi (University of Tsukuba) "QCD Lattice Simulations With the
QCDPAX"
Daiichiro Sugimoto & Junichi Ebisuzaki (University of Tokyo) "Project
GRAPE and The Development of a Specialized Computer for the N-body
Problem"
Nobuyasu Ito (University of Tokyo) "A Trial to Break Through the Many
Body Problem With a Computer"
Panel: G. Yagawa, K. Nishihara, D. Sugimoto, Y. Oyanagi, O. Lubeck, J.
Levesque, R. Mendez (moderator)
Akihiro Iwaya (NEC Corp) "Parallel Processing on the NECSX-3
Supercomputer"
Shin Hashimoto (Fujitsu Ltd) "Parallel Application Development on the
HPP"
John Levesque (Pacific Sierra Research) "An Advanced Programming
Environment"
Raul Mendez (ISR) Closing Remarks
----------------------END REPORT-----------------------------------------