rick@cs.arizona.edu (Rick Schlichting) (06/02/91)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Asia (ONR/Asia).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution:
From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
Re: Parallel processing research in Japan, supplement.
30 May 1991
ABSTRACT. Parallel processing research (mostly associated with
dataflow model) and database research in Japan, based on visits to
various labs and attendance at the IEEE Data Engineering Conference in
Kobe (April 10-12, 1991) is summarized.
INTRODUCTION.
Professor Rishiyur Nikhil
MIT Laboratory for Computer Science
545 Technology Square
Cambridge, MA 02139, USA
Tel: (617)-253-0237, Fax: (617)-253-6652
Email: nikhil@lcs.mit.edu
spent two weeks in Japan (April 1991), visiting five research labs and
attending the IEEE Data Engineering Conference in Kobe. What follows
are Nikhil's observations, along with comments of my own when these are
relevant.
PARALLEL PROCESSING (ETL, ICOT, Terada's Osaka lab)
Nikhil writes as follows.
The projects at these labs are primarily focused on parallel processing
architectures and languages, and is closest to my own research work. I
continue to be very highly impressed with the machine-building
capabilities of our Japanese colleagues, but I think that with the
exception of the ICOT researchers, many of them are still very weak in
software. It is quite breathtaking to see how quickly new research
machines are designed and built, and by such small teams--- I wish we
could do as well, in the US. However, once built, their machines do not
seem to be evaluated thoroughly--- good languages, compilers and run time
systems are not developed and (consequently, perhaps) very few
applications are written. The new designs, therefore, do not benefit
from any deep lessons learned from previous designs. Also, in my
opinion, another consequence of the "hardware-centric" nature of the
machine builders is that certain functions are built into hardware that I
would expect ought to be done in software (such as resource allocation
decisions and load monitoring in the ETL machines).
In my opinion, ETL's EM-4 and proposed EM-5 are the most exciting
machines in Japan (and the world). The reason is this: as first
elucidated in [Arv87], a large, general purpose parallel machine must be
able to perform multi-threading efficiently at a fine granularity,
because this is the only way to deal effectively with the long inter-node
latencies of large, parallel machines. Von Neumann processors are very
bad at this, and dataflow architectures have always excelled at this.
However, previous dataflow architectures (including MIT's TTDA and
Monsoon, and ETL's previous Sigma-1) were weak in single-thread
performance and control over scheduling, two areas that are the forte of
von Neumann processors. Recently, new architectures have been proposed
to obtain the best of both worlds: our *T architecture at MIT, and the
EM-4 and EM-5 in Japan. I believe that these machines are the first
truly viable parallel MIMD machines.
EM-4 [Sak90, Yam89] is a medium sized machine (80 nodes), but does not have
any floating point arithmetic. However the chief problem is the lack of
any good programming language or compiler. It is currently programmed in
DFC ("dataflow C"), a very simple subset of C with single assignment
semantics. Perhaps this situation will change in the future: the ETL
researchers said that they have just hired a compiler expert, but they
still do not to expect a good programming environment for some years. I
also have my doubts regarding their choice of C as the programming
language for the EM-4 and EM-5.
Kahaner writes..
Dataflow research at ETL has a long history, including the Sigma-1, EM-4,
and the proposed EM-5. The EM-4 was designed to have 1024 processors. A
prototype with 80 processors is running and I am told that if the budget
is maintained then the full system will be built. See the reports (etl, 2
July 1990 and parallel.904, 6 Nov 1990).
My interpretations of the ETL research direction are that their evolving
designs are moving away from a pure data-flow model. At the same time
interest in numerical applications which was ambivalent, seems to have
increased. Nikhil agrees that the ETL group is now more explicit about
this, but feels that they were always interested in general purpose
computing, including scientific applications. Perhaps in the atmosphere
of the 80's when there was so much emphasis in Japan on knowledge
processing, they may have emphasized symbolic aspects, but in technical
discussions, they usually compared their machines to vector and other
supercomputers, and never to "symbolic supercomputers" such as Lisp
machines or ICOT's machines. In other words, they have may have always
considered CRAY's, NEC's, Fujitsu's, and Connection Machine's to be their
real competition. It is interesting to note that the Connection Machine
was also initially portrayed as a supercomputer for AI; the reality today
is that it is mostly used for scientific supercomputing.
Sigma-1 was pure data-flow, similar to MIT's Tagged Token Dataflow
Architecture. The EM-4 is based on what the ETL group called a strongly
connected arc model. Their description of that follows [Sak91]. "In a
dataflow graph, arcs are categorized into two types: normal arcs and
strongly connected arcs. A dataflow subgraph whose nodes are connected by
strongly connected arcs is called a strongly connected block (SCB).
There are two firing rules. One is that a node on a dataflow graph is
firable when all the input arcs have their own tokens (a normal data-
driven rule). The other is that after each SCB fires, all the processing
elements which will execute a node in the block should execute nodes in
the block exclusively....In the EM-4, each SCB is executed in a single PE
and tokens do not flow but are stored in a local register file. This
property enables fast-register execution of a dataflow graph, realizes an
advanced-control pipeline, and offers flexible resource management
facilities." The designers also wrote in 1989 that "the dataflow concept
can be applied not only to numerical computations involved in scientific
and technological applications but also to symbolic manipulations
involved in knowledge information processing. The application field of
the EM-4 is now focused on the latter." EM-4 was not originally designed
to have floating point support, but I was told that this was also a
budgetary issue.
For the EM-5, its objectives are as follows [Sak91].
"..to develop a feasible parallel supercomputer including more than
16,384 processors for general use, e.g., for numerical computation,
symbolic computation, and large scale simulations. The target performance
is more than 1.3 TFLOPS, i.e. 1.3*10^(12) FLOPS (double precision) and
655 GIPS. Unlike the EM-4, the EM-5 is not a dataflow machine in any
sense. It exploits side-effects and it treats location-oriented
computation" (see note below). "In addition the EM-5 is a 64-bit machine
while the EM-4 is a 32-bit machine." The EM-5 will be based on a "layered
activation model", a further generalization of the strongly connected
arc mode of the EM-4.
The machine will be highly pipelined, with a 25ns clock and 25ns pipeline
pitch. This is half the pitch of the EM-4, largely because of the use of
RISC technology. Each of the up to 16,384 processors (called EMC-G) is
64-bit, RISC, with global addressing and no embedded network switch.
Similarly the floating point unit will not be within the processor chip,
but separate, like a co-processor, because of limitations of pins and
space on the chip. At the present time the designers have not decided on
the topology of the interconnection network. Peak performance of the
floating point unit will be 80MFLOPS with maximum transfer rate of
335MB/sec. The EMC-G will be built in a CMOS standard-cell chip with 391
pins and 100K gates, using 1.0 micron rules. This processor will have
its logical design completed in 1991, and the gate design of the EMC-G
will be completed in 1992. A full 16,384 node system will be designed in
1993 and a prototype is planned to be operational by March 1994.
With regard to languages, new work will emphasize DFC-II as Nikhil
explained. This will have sequential description and parallel execution,
and is not a pure functional language. DFC-II can break a single
assignment rule and programs can contain global variables. The group is
also planning to implement several other languages, such as Id and
Fortran. Finally some object oriented model is also being considered.
In Japan at least, the ETL research group is considered to have some of
the best (most creative, energetic, visionary, etc.) staff among all the
non-university research labs.
Readers may be interested to know that Dr. Shuichi Sakai of ETL (the
chief designer of EM-4) is now visiting the dataflow group at MIT for one
year, as of April 1, 1991. He will be assisting the group in the design
of the new *T machine, which Nikhil mentions above [Nik91]. *T is based
on Nikhil's previous work on the P-RISC architecture [Nik89], and is a
synthesis of dataflow and von Neumann architectures (Nikhil says that one
should think of it as a step beyond EM-4-like machines). The group plans
to build this machine in collaboration with Motorola, in a 3 year project
that will follow the current MIT-Motorola project to build the Monsoon
dataflow machine.
Concerning the remarks that the EM-5 will NOT be a dataflow machine, I
passed them on to Nikhil who was also quite surprised. He comments that
the EM-5 is not fundamentally different from the EM-4. In both those
machines, as well as in MIT's P-RISC and *T, the execution model is a
HYBRID of dataflow and von Neumann models. In MIT's terminology, a
program is a dataflow graph where each node is a "thread". ETL's
equivalent of MIT's "thread" is the SCB, or Strongly Connected Block.
Dataflow execution is used to trigger and schedule threads, just as in
previous dataflow machines. In MIT's *T, this scheduling happens in the
Start Coprocessor; in ETL's machines, it happens in the FMU (Fetch and
Matching Unit).
Within a thread, instructions are scheduled using a conventional program
counter, as in von Neumann machines. In MIT's *T, this happens in the
Data CoProcessor; in ETL's machines, it happens in the EXU (Execution
Unit).
In both the EM-4 and EM-5 the processor is organized as an IBU (Input
Buffer Unit) followed by a FMU (Fetching and Matching Unit) followed by
an EXU (Execution Unit). The overall execution strategy is the same in
both machines.
The EM-5 and EM-4 differ in smaller details: EM-5 has newer chip
technology, a separate memory for packet buffers, a finer pitch pipeline,
a direct instruction pointer in packets, a floating point unit, a 64 bit
arch, etc., but the fundamental organization is the same.
Nikhil also asked Sakai about the statements in [Sak91]. Sakai claims
that what he meant was
"... the EM-5 is not a dataflow machine in SOME sense."
and faults his poor command of English for this error. With respect to
the second sentence: "It exploits side-effects and it treats location-
oriented computation", Nikhil is not sure what the authors meant by this.
He explains that
Dataflow architectures have never prohibited side-effects or enforced
single-assignment semantics. It is only dataflow languages that take
this position on side-effects. Dataflow architectures merely provided
support for this, while not enforcing it. Dataflow architectures are
equally appropriate for other languages, such as Fortran or C.
After visiting ICOT, Nikhil remarks that...
I got a sense of complementary strengths relative to ETL. ICOT
researchers seem to be very sophisticated with respect to parallel
languages, compilers and runtime systems; the parallel machines, on the
other hand, were not that exciting.
I do not think that anyone can claim any longer that the KL1 language
used extensively at ICOT is a logic programming language (ICOT
researchers themselves are quite frank about this). The main remaining
vestige of logic programming (albeit a very important one) is the "logic
variable" which is used for asynchronous communication. Logic variables
in KL1 are very similar (perhaps identical) to "I-structure variables" in
Id, the programming language developed at MIT over the last 6 years.
Regardless of whether we label KL1 as a logic programming language or
not, it is certainly a very interesting and expressive language, and is
perhaps the largest and most heavily used parallel symbolic processing
language in existence anywhere. Because of the sheer volume of
applications that people are writing in KL1 and running on ICOT's
parallel machines (we saw 5 demos from a very impressive suite of demos),
I think that ICOT researchers are certainly as experienced and
sophisticated as anyone in the world about parallel implementations of
symbolic processing: compilation, resource allocation, scheduling,
garbage collection, etc.
ICOT's machines are not as exciting as ETL's. The original PSIs (130
KLIPS) were heavily horizontally microcoded sequential machines, and one
must wonder whether they will not go the way of Lisp machines, i.e., made
obsolete by improving compiling technology on modern RISC machines. PSIs
were not originally conceived of as nodes of a parallel machine. Thus,
ICOT's two multi-PSIs, which are networks of PSIs (2D grid topology), are
just short term prototypes for experimentation. ICOT researchers want to
put one of the two Multi-PSIs on the Internet for open access, but they
are having trouble convincing MITI to allow this.
ICOT's real parallel targets are the PIM machines, the first of which (a
PIM/p) had just been delivered to ICOT during our visit (it was not yet
up and running). ICOT's machines are built by various industrial
partners, of course with heavy participation in the design by ICOT
researchers. There are 5 different PIM architectures (different node
architectures, different network architectures) with 5 different
industrial partners. I was surprised by this because this will lead to
serious portability problems for the software. On the positive side, I
suppose they will gain a lot of experience on a variety of architectures,
and on portability, and can learn from the best of each! From what
little I know about the PIM architectures, they do not seem to be as
exciting as ETL's EM-4 and EM-5 machines.
The NEC C&C Systems Research Lab has also been involved with ICOT in the
Fifth Generation project. NEC's CHI machine (300 KLIPS), a single user
microcoded machine for logic programming, predates and outperformed
ICOT's PSI machine. However, like the PSI machine and Lisp machines, I
expect that this type of machine will become obsolete as compiling
technology on RISC machines improves. NEC has also started work on an
implementation of ICOT's A'UM programming language.
Kahaner notes that additional technical details of ICOT research is given
in the report (icot-sci, 17 May 1991). A number of Japanese researchers
have remarked that one of the most important aspects of the ICOT project
is it gives many young Japanese researchers the opportunity to meet
together informally (outside their individual corporate or University
environment) and assists in the networking that is so prevalent in
Japanese science.
Nikhil continues...
Prof. Terada's dataflow laboratory at Osaka University is remarkable in
the degree to which they collaborate with industry.
Professor Hiroaki Terada
Department of Information Systems Engineering,
Faculty of Engineering,
Osaka University,
Yamadaoka, 2-1 Suita, Osaka, Japan 565
Tel: +81(Japan)-6-877-5111, Fax: +81 6 875 0506
They are one of the notable exceptions to my prior image of research at
Japanese universities: starved of funds from the education ministry and
generally not very exciting. Prof. Terada has close collaborations with
Mitsubishi, Sharp, Matsushita and Sanyo. They developed a TTL dataflow
machine Q-p in 1983-86 (2-4 MOPs); they now have Q-pv, a multi-chip VLSI
version (20 MFLOPs), and they are planning to integrate this further in
Q-v1, a single-chip version (50 MFLOPS). A unique aspect of their
technology approach is that they have an asynchronous, self-timed design;
they have consciously avoided clock-synchronous circuits.
The architecture of all these machines is a ring, similar to the
Manchester dataflow machine. Like the ETL project, this project again
seems weak in software, with the result that no significant applications
are written, which in turn means that the hardware design is difficult to
evaluate. Prof. Nishikawa, a member of the group, is leading a project
to develop AESOP, a program development environment for these dataflow
machines.
Professor Hiroaki Nishikawa
Department of Information Systems Engineering
Faculty of Engineering
Osaka University
Yamadaoka, 2-1 Suita, Osaka, Japan 565
Tel: +81 (6) 877 5111 ext. 5018,
Fax: 81 (6) 875 0506 Telex: 5286-227 FEOUJ J
Email: nisikawa@oueln0.ouele.osaka-u.ac.jp
It appears that he is aiming for some kind of a visual programming style,
where one draws dataflow graphs on a screen and chooses a mapping of
these graphs onto the physical rings of the dataflow machine.
Personally, I am not very impressed with the visual programming languages
that I have seen to date: they are too complicated and inflexible.
7TH IEEE CONFERENCE ON DATA ENGINEERING, KOBE, APRIL 10-12, 1991
The conference has become very large, with at least 3 parallel sessions
at all times and over 700 pages in the proceedings, so it was impossible
for anyone to get a complete overview.
Object Oriented Databases (OODBs) was the dominant topic. There were
papers on:
- notions of consistency
- declarative (or associative access) query languages
- storage and indexing
- models of views
- models of time
- user interfaces
- etc.
Unfortunately, it is disappointing that there is still very little work
on developing a simple and clear semantics for OODBs. Whereas the
relational model had a single, simple model that was agreed upon by a
large community of researchers, today each OODB seems to come with its
own, unique model, often described imprecisely or with arcane formalism.
Consequently, there is very little basis available for objective
comparisons of OODBs with each other or with relational DBs.
There were several papers on parallel implementations, including:
- Parallel transitive closure and join algorithms
- Scheduling on shared-memory and shared-nothing machines
- Data distribution strategies
- dataflow implementation
- etc.
Most parallel implementations are on stock parallel hardware. Parallel
database machines per-se seem to have fallen out of vogue---only one such
machine was described (FDS-R2 at Univ. of Tokyo; Kitsuregawa et. al.)
Kahaner notes that Professor M. Kitsuregawa, from the U-Tokyo Institute
of Industrial Science, has published several papers on SDC, the "Super
Database Computer", for example [Kit91]. He also notes that the National
Science Foundation (NSF), in cooperation with other agencies, has funded
the Japanese Technology Evaluation Center (JTEC) at Loyola College in
Maryland to assess the status and trends of Japanese research and
development in selected technologies. In March 1991, a JTEC team headed
by Professor Gio Wiederhold (Stanford University)
[gio@earth.stanford.edu], visited Japan to evaluate Japanese database
technology, and the team presented a workshop on their preliminary
results 30 April 1991 at the NSF. A comprehensive report is currently in
preparation.
Genomic databases generated a lot of excitement--- the panel on this
topic drew a huge audience. Genomic databases will contain huge volumes
of data with unique requirements: inaccurate information, incomplete
information, retrieval using approximate matching and sophisticated
inference. Many people seem to view genomic databases as the new
frontier and driving force in DB research, a beautiful application with
lots of exciting research problems (and lots of funding?) for the DB
community.
Deductive databases (the marriage of logic programming languages and
databases) seem to be generating less interest than they were some years
ago. There were a few papers on query optimization.
The remaining papers reported steady, if unspectacular progress on a
variety of topics:
- Distributed DBMSs (optimization, voting protocols)
- Concurrency control (in high contention DBs, parallel DBs)
- Indexing and query languages for temporal databases (time attributes,
versions)
- Indexing and query languages for spatial databases (e.g. geographic maps)
- Incomplete information (formal models, approximate answers to queries)
- Heterogeneous databases (transaction protocols, serializability)
- Efficient post-failure restart algorithms
- Simultaneous optimization for multiple queries
REFERENCES
[Arv87] Two Fundamental Issues in Multiprocessing,
Arvind and R.A.Iannucci,
Proc. DFVLR - Conf. on Parallel Processing in Science and Engineering,
Bonn-Bad Godesberg, W. Germany, June 25-29, 1987,
Springer-Verlag LNCS 295
[Kit91] Multiple Processing Module Control on SDC, the Super Database
Computer,
S.Hirano, M.Harada, M.Nakamura, Y.Aiba, K.Suzuki, M.Kitsuregawa,
M.Takagi, and W.Yang,
Proc Japan Soc Parallel Proc, Kobe Japan, May 14-16, 1991, pp 53-
60.
[Nik89] Can dataflow subsume von Neumann computing?
R.S.Nikhil and Arvind
Proc 16th Intl Symp on Computer Architecture, Jerusalem, Israel,
May 29-31, 1989, pp 262-272.
[Nik91] *T: a Killer Micro for a Brave New World,
R.S.Nikhil, G.M.Papadopoulos and Arvind
CSG Memo 325, MIT Laboratory for Computer Science
545 Technology Square, Cambridge, MA 02139, USA
January 1991
[Sak90] An Architecture of a Dataflow Single Chip Processor,
S. Sakai, Y. Yamaguchi, K. Hiraki and T. Yuba
Proc. 16th Annual International Symposium on Computer Architecture,
Jerusalem, Israel, May 28-June 1, 1989, pp 46-53
[Sak91] Architectural Design of a Parallel Supercomputer EM-5,
S. Sakai, Y. Kodama, Y. Yamaguchi
Proc Japan Soc Parallel Proc, Kobe Japan, May 14-16, 1991, pp
149-156.
[Yam89] An Architectural Design of a Highly Parallel Dataflow Machine
Y. Yamaguchi, S. Sakai, K. Hiraki, Y. Kodama and T. Yuba
Proc. Information Processing 89, Aug 28-Sep 1, San Francisco,
pp 1155-1160.
-------------------END OF REPORT-------------------------------------