collberg@dna.lth.se (Christian S. Collberg) (03/14/91)
A while back I sent out a request for information about distributed
applications running on networks of workstations. I received quite a
few replies and also requests for copies. I have therefore summarized
the responses. Thomas Fahringer of Universitaet Wien reminded me about
a similar survey he had made a short while back (which I saved and then
forgot about...). His summary, which in turn contains the summary of
a survey conducted by Karsten Morisse, is included at the end of this
summary.
Thanks to everyone who replied!
Christian Collberg
Christian.Collberg@dna.lth.se
-----------------------------------------------------------------------
"The Benevolent Bandit Laboratory: A Testbed for Distributed Algorithms"
R.E. Felderman, E.M. Schooler, L. Kleinrock
IEEE Journal on Selected Areas in Communications
Feb 1989
Vol 7, No 2
pp 303-311
Describes a system we built at UCLA to run distributed applications on
a network of PC-ATs running DOS.
Bob Felderman feldy@cs.ucla.edu
UCLA Computer Science ...!{rutgers,ucbvax}!cs.ucla.edu!feldy
-----------------------------------------------------------------------
Check the v-kernel and the x-kernel developed at our CS department.
RAUL IZAHI
izahi@nova.stanford.edu
-----------------------------------------------------------------------
>From sven@tde.lth.se Tue Mar 5 21:49:08 1991
Check out Cosmic Environment/REctive Kernel
Sven
-----------------------------------------------------------------------
>From jonathan%maligne-lk.cs.UAlberta.CA@scapa.cs.UAlberta.CA
I have a chess program that runs on a network of workstations,
if that is the type of application you are looking for. The
details were published in the Journal of Parallel and Distributed
Computing, 1988 I think.
-----------------------------------------------------------------------
>From ht@cogsci.edinburgh.ac.uk Wed Mar 6 10:50:04 1991
For a parser, see
Thompson, Henry S. 1991, "Chart Parsing for Loosely Coupled Parallel
Systems", in Tomita ed. Current Issues in Parsing Technology, Kluwer,
ISBN 0-7923-9131-4 (from Proceedings of the Internationl Workshop on
Parsing Technologies, 1989).
-----------------------------------------------------------------------
>From stratton@computing-maths.cardiff.ac.uk
Have you got in touch with Parasoft - they provide Express - that allows
running of programs across (say) networked SUN's, or transputers, or Ncube 1/2,
or Cray (soon) etc. They guarantee portability! They can probably give you
details fo applications available (such as their own circuit router {i think},
and their debugger, profiler etc).
Hope this is of some use,
Andy.S.
-----------------------------------------------------------------------
You might check with: Scientific Computing Associates, Inc.
246 Church St. Suite 307
New Haven, CT 06510
David Gelernter and James Philbin wrote an article in BYTE, May 1990 pp 213..
"Spending Your Free Time" which describes a PC-network implementation of
C-Linda. Some of SCA's literature describes applications ported to this
system.
Jack Merner
Jack N. Merner | Jack.Merner@tortuga.SanDiego.NCR.COM
NCR Corporation, MS4440 | An otherwise reliable source !
16550 W. Bernardo Dr. | Ph: (619)485-2184
San Diego, CA 92127 | Fax: (619)485-2020
-----------------------------------------------------------------------
There is a distributed program that can run on networks
of NeXT computers. This program is called Zilla. It
can be used to distribute the computation of anything
you might want (if you can figure out a way to split
up the work). NeXT has used it to factor large numbers
(that have only a few factors). Zilla is unique in that
it can be set up only to use otherwise idle cycles.
I think that NeXT is currently testing an improved
version of Zilla.
You might also want to check into a new language called
PCN. PCN can run on networks of NeXT machines (and
possibly Suns if you are using the cosmic environment).
John Garnett
University of Texas at Austin
garnett@cs.utexas.edu Department of Computer Science
Austin, Texas
-----------------------------------------------------------------------
>From zhou@ifi.unizh.ch Thu Mar 7 09:41:30 1991
I am working on a distributed implementation(simulation) of a spreadsheet
program on our local network of Sparcstations.
Honbo Zhou.
-----------------------------------------------------------------------
We have modified a calibration program to run in a distributed
fashion, using ISIS on a network of Sun Sparc machines.
Try ftp'ing to water.ca.gov, and look for pub/pstscrpt.out.
The postscript output is ~ 250K.
Ralph Finch 916-445-0088
rfinch@water.ca.gov ...ucbvax!ucdavis!caldwr!rfinch
Any opinions expressed are my own; they do not represent the DWR
-----------------------------------------------------------------------
We (the distributed problem solving group at EDRC, CMU) are working
on distributed AI applications over a network of UNIX workstations.
The applications span various domains --(3 active at present)are:
1) Assistance to human decision-makers (operators) in real-time control
applications like electric power networks.
2) Design of High-Rise buildings.
3) Solution of a set of complex nonLinear algebriac equations.
A startingPoint reference:
1) S.N.Talukdar, V.C.Ramesh, J.C.Nixon, "A distributed system of control
specialists for real-time operations", proceedings of the 3rd international
symposium on expert systems applications to power systems, Tokyo, Japan,
April 1991.
Ramesh
vcr@cs.cmu.edu
-----------------------------------------------------------------------
I recently read "Modules, Objects and
Distributed Programming: Issues in RPC and Remote Object Invocation"
H.M.Levy, E.D.Tempero in Software- Practice and Experience, Vol21(1),
jan. 91. Its an interesting comparison between two programming styles.
-- Bruno BARON
-- CRI, Ecole des Mines de Paris, 35 rue Saint Honore', 77305 Fontainebleau,
-- France -- baron@ensmp.fr -- tel: (1)64.69.48.38 -- fax: (1)64.69.47.09
-----------------------------------------------------------------------
From: segall@caip.rutgers.edu (Ed Segall)
Organization: Rutgers Univ., New Brunswick, N.J.
Check out recent issues of IEEE Transactions on Parallel and
Distributed Systems, and of the Journal of Parallel and Distributed
Processing (Academic Press).
-----------------------------------------------------------------------
I know of a program ISIS, which can be used for distributed applications, and
it is available from ananomous ftp, you better contact :
ken@cs.cornell.edu or
reen@cs.cornell.edu (administrative Aide)
Geert v.d. Heijden heijden@prles6.prl.philips.nl
Philips Research Lab.
Eindhoven
the Netherlands
I can give you a information summary on the public version of ISIS, which I
received from Ken Birman:
ISIS is available in two forms. The description below is for the public
copy, which is provided in source form and at no fee (except postage and
our manual, but you can also obtain the system and manual by FTP).
There is also a commercially enhanced, supported release of the system
by a company I founded 3 years ago. I will have a copy of the product
announcement sent to you. This version is priced inexpensively and
supports everything the public version does, but has a number of useful
enhancements. And, needless to say, the possibility of support is
important to some potential users. We have several contacts in the
Netherlands and eventually hope to have a local support organzation
close to you. We also have some contacts who could provide short
courses in distributed systems and specifically on programmign with ISIS,
if you need this at some stage of your effort.
Please let me know if you have other questions. I hope you find the system
useful. And, if you run into any problems at all, don't hesitate to ask
us for help. We stand behind our software -- both the public release
and the commercial one. If you have problems setting things up, we will
be happy to help you work around them.
Ken Birman
--- ISIS V2.1 blurb ---
This is to announce the availability of a public distribution of
the ISIS System, a toolkit for distributed and fault-tolerant
programming. The initial version of ISIS runs on UNIX on SUN,
DEC, GOULD, AUX and HP systems; ports to other UNIX-like
systems are planned for the future. No kernel changes are needed
to support ISIS; you just roll it in and should be able to use it
immediately. The current implementation of ISIS performs well in
networks of up to about 100-200 sites. Most users, however, run on
a smaller number of sites (16-32 is typical) and other sites connect
as "remote clients" that don't actually run ISIS directly. In this
mode many hundreds of ISIS users can be clustered around a smaller
set of ISIS "mother sites"; many users with large networks favor
such an architecture.
--- Who might find ISIS useful? ---
You will find ISIS useful if you are interested in developing
relatively sophisticated distributed programs under UNIX (eventu-
ally, other systems too). These include programs that distribute
computations over multiple processes, need fault-tolerance, coor-
dinate activities underway at several places in a network,
recover automatically from software and hardware crashes, and/or
dynamically reconfigure while maintaining some sort of distri-
buted correctness constraint at all times. ISIS is also useful
in building certain types of distributed real time systems.
Here are examples of problems to which ISIS has been applied:
o On the factory floor, we are working with an industrial
research group that is using ISIS to program decentralized
cell controllers. They need to arrive at a modular, expand-
able, fault-tolerant distributed system. ISIS makes it pos-
sible for them to build such a system without a huge invest-
ment of effort. (The ISIS group also working closely with
an automation standards consortium called ANSA, headed by
Andrew Herbert in Cambridge).
o As part of a network file system, we built an interface to
the UNIX NFS (we call ours "DECEIT") that supports tran-
sparent file replication and fault-tolerance. DECEIT
speaks NFS protocols but employs ISIS internally to maintain
a consistent distributed state. For most operations,
DECEIT performance is at worst 50-75% of that of a normal NFS
-- despite supporting file replication and fault-tolerance.
Interestingly, for many common operations, DECEIT substantially
outperforms NFS (!) and it is actually fairly hard to come up
with workloads that demonstate replication-related degradation.
o A parallel "make" program. Here, ISIS was used within a
control program that splits up large software recompilation
tasks and runs them on idle workstations, tolerating
failures and dynamically adapting if a workstation is
reclaimed by its owner.
o A system for monitoring and reacting to sensors scattered around
the network, in software or in hardware. This system, Meta, is
actually included as part of our ISIS V2.1 release. We are adding
a high level language to it now, Lomita, in which you can specify
reactive control rules or embed such rules into your C or Fortran
code, or whatever.
o In a hospital, we have looked at using ISIS to manage repli-
cated data and to coordinate activities that may span multi-
ple machines. The problem here is the need for absolute
correctness: if a doctor is to trust a network to carry out
orders that might impact on patient health, there is no room
for errors due to race conditions or failures. At the same
time, cost considerations argue for distributed systems that
can be expanded slowly in a fully decentralized manner.
ISIS addresses both of these issues: it makes it far easier
to build a reliable, correct, distributed system that will
manage replicated data and provide complex distributed
behaviors. And, ISIS is designed to scale well.
o For programming numerical algorithms. One group at Cornell
used ISIS to distribute matrix computations over large
numbers of workstations. They did this because the worksta-
tions were available, mostly idle, and added up to a tremen-
dous computational engine. Another group, at LANL, uses ISIS
in a parallel plasma physics application.
o In a graphics rendering application. Over an extended period,
a Cornell graphics group (not even in our department) has used
ISIS to build distributed rendering software for image
generation. They basically use a set of machines as a parallel
processor, with a server that farms out rendering tasks and
a variable set of slave computing units that join up when their
host machine is fairly idle and drop out if the owner comes
back to use the machine again. This is a nice load sharing
paradigm and makes for sexy demos too.
o In a wide-area seismic monitoring system (i.e. a system that
has both local-area networks and wide-area connections between
them), developed by a company called SAIC on a DARPA contract.
The system gathers seismic data remotely, preprocesses it, and
ships event descriptions to a free-standing analysis "hub", which
must run completely automatically (their people in San Diego don't like
to be phoned in the middle of the night to debug problems in Norway).
The hub may request data transfers and other complex computations,
raising a number of wide-area programming problems. In addition, the
hub system itself has a lot of programs in various languages and
just keeping it running can be a challenge.
o On brokerage and banking trading floors. Here, ISIS tends to be
an adjunct to a technology for distributing quotes, because the
special solutions for solving that specific problem are so fast
that it is hard for us to compete with them (we normally don't
have the freedom of specifying the hardware... many "ticker plant
vendors" wire the whole floor for you). However, to the extent
that these systems have problems requiring fault-tolerance, simple
database integration mechanisms, dynamic restart of services,
and in general need "reactive monitoring and control" mechanisms,
ISIS works well. And, with our newer versions of the ISIS protocols,
performance is actually good enough to handle distribution of
stock quotes or other information directly in ISIS, although
one has to be a bit careful in super performance intensive settings.
(The commercial ISIS release should compete well with the sorts of
commercial alternatives listed above on a performance basis, but
more than 10 trading groups are using ISIS V2.1 despite the fact that
it is definitely slower!).
The problems above are characterized by several features. First,
they would all be very difficult to solve using remote procedure
calls or transactions against some shared database. They have
complex, distributed correctness constraints on them: what hap-
pens at site "a" often requires a coordinated action at site "b"
to be correct. And, they do a lot of work in the application
program itself, so that the ISIS communication mechanism is not
the bottleneck.
If you have an application like this, or are interested in taking
on this kind of application, ISIS may be a big win for you.
Instead of investing resources in building an environment within
which to solve your application, using ISIS means that you can
tackle the application immediately, and get something working
much faster than if you start with RPC (remote procedure calls).
On the other hand, don't think of ISIS as competing with RPC or
database transactions. We are oriented towards online control and
coordination problems, fault-tolerance of main-memory databases, etc.
ISIS normally co-exists with other mechanisms, such as conventional
streams and RPC, databases, or whatever. The system is highly portable
and not very intrusive, and many of our users employ it to control some
form of old code running a computation they don't want to touch at
any price.
--- What ISIS does ---
The ISIS system has been under development for several years at
Cornell University. After an initial focus on transactional
"resilient objects", the emphasis shifted in 1986 to a toolkit
style of programming. This approach stresses distributed con-
sistency in applications that manage replicated data or that
require distributed actions to be taken in response to events
occurring in the system. An "event" could be a user request on a
distributed service, a change to the system configuration result-
ing from a process or site failure or recovery, a timeout, etc.
The ISIS toolkit uses a subroutine call style interface similar
to the interface to any conventional operating system. The pri-
mary difference, however, is that ISIS functions as a meta-
operating system. ISIS system calls result in actions that may
span multiple processes and machines in the network. Moreover,
ISIS provides a novel "virtual consistency" property to its
users. This property makes it easy to build software in which
currently executing processes behave in a coordinated way, main-
tain replicated data, or otherwise satisfy a system-wide correct-
ness property. Moreover, virtual synchrony makes even complex
operations look atomic, which generally implies that toolkit
functions will not interfere with one another. One can take
advantage of this to develop distributed ISIS software in a sim-
ple step-by-step style, starting with a non-distributed program,
then adding replicated data or backup processes for fault-
tolerance or higher availability, then extending the distributed
solution to support dynamic reconfiguration, etc. ISIS provides
a really unique style of distributed programming -- at least if
your distributed computing problems run up against the issues we
address. For such applications, the ISIS programming style is
both easy and intuitive.
ISIS is really intended for, and is good at, problems that draw
heavily on replication of data and coordination of actions by a
set of processes that know about one another's existence. For
example, in a factory, one might need to coordinate the actions
of a set of machine-controlled drills at a manufacturing cell.
Each drill would do its part of the overall work to be done,
using a coordinated scheduling policy that avoids collisions
between the drill heads, and with fault-tolerance mechanisms to
deal with bits breaking. ISIS is ideally suited to solving prob-
lems like this one. Similar problems arise in any distributed
setting, be it local-area network software for the office or a
CAD problem, or the automation of a critical care system in a
hospital.
ISIS is not intended for transactional database applications. If
this is what you need, you should obtain one of the many such
systems that are now available. On the other hand, ISIS would be
useful if your goal is to build a front-end in a setting that
needs databases. The point is that most database systems are
designed to avoid interference between simultaneously executing
processes. If your application also needs cooperation between
processes doing things concurrently at several places, you may
find this aspect hard to solve using just a database because
databases force the interactions to be done indirectly through
the shared data. ISIS is good for solving this kind of problem,
because it provides a direct way to replicate control informa-
tion, coordinate the actions of the front-end processes, and to
detect and react to failures.
ISIS itself runs as a user-domain program on UNIX systems sup-
porting the TCP/IP protocol suite. It currently is operational
on SUN, DEC, GOULD and HP versions of UNIX. Language interfaces
for C, C++, FORTRAN, and Common LISP (both Lucid and Allegro) are
included, and a new C-Prolog interface is being tested now. Recent
ports available in V2.1 include AUX for the Apple Mac. II, AIX on the
IBM RS/6000 and also the older PC/RT. A Cray UNICOS port is (still)
under development at LANL, and a DEC VMS port is being done by
ISIS Distributed Systems, Inc.
ISIS runs over Mach on anything that supports Mach but will probably
look a little unnatural to you if you use the Mach primitives. We
are planning a version of ISIS that would be more transparent in a
Mach context, but it will be some time before this becomes available.
Meanwhile, you can use ISIS but may find some aspects of the interface
inconsistent with the way that Mach does things.
The actual set of tools includes the following:
o High performance mechanisms supporting lightweight tasks in
UNIX, a simple message-passing facility, and a very simple
and uniform addressing mechanism. Users do not work
directly with things like ports, sockets, binding, connect-
ing, etc. ISIS handles all of this.
o A process "grouping" facility, which permits processes to
dynamically form and leave symbolically-named associations.
The system serializes changes to the membership of each
group: all members see the same sequence of changes. Groups
names can be used as a location-transparent address.
o A suite of broadcast protocols integrated with a group
addressing mechanism. This suite operates in a way that
makes it look as if all broadcasts are received "simultane-
ously" by all the members of a group, and are received in
the same "view" of group membership.
o Ways of obtaining distributed executions. When a request
arrives in a group, or a distributed event takes place, ISIS
supports any of a variety of execution styles, ranging from
a redundant computation to a coordinator-cohort computation
in which one process takes the requested actions while oth-
ers back it up, taking over if the coordinator fails.
o Replicated data with 1-copy consistency guarantees.
o Synchronization facilities, based on token passing or
read/write locks.
o Facilities for watching a for a process or site (computer)
to fail or recover, triggering execution of subroutines pro-
vided by the user when the watched-for event occurs. If
several members of a group watch for the same event, all
will see it at the same "time" with respect to arriving mes-
sages to the group and other events, such as group member-
ship changes.
o A facility for joining a group and atomically obtaining
copies of any variables or data structures that comprise its
"state" at the instant before the join takes place. The
programmer who designs a group can specify state information
in addition to the state automatically maintained by ISIS.
o Automatic restart of applications when a computer recovers
from a crash, including log-based recovery (if desired) for
cases when all representatives of a service fail simultane-
ously.
o Ways to build transactions or to deal with transactional
files and database systems external to ISIS. ISIS itself
doesn't know about files or transactions. However, as noted
above, this tool is pretty unsophisticated as transactional
tools go...
o Spooler/long-haul mechanism, for saving data to be sent to a
group next time it recovers, or for sending from one ISIS LAN
to another, physically remote one (e.g. from your Norway site
to your San Diego installation). Note: ISIS will not normally
run over communication links subject to frequent failures, al-
though this long-haul interface has no such restrictions.
Everything in ISIS is fault-tolerant. Our programming manual has
been written in a tutorial style, and gives details on each of
these mechanisms. It includes examples of typical small ISIS
applications and how they can be solved. The distribution of the
system includes demos, such as the parallel make facility men-
tioned above; this large ISIS application program illustrates
many system features.
To summarize, ISIS provides a broad range of tools, including
some that require algorithms that would be very hard to support
in other systems or to implement by hand. Performance is quite
good: most tools require between 1/20 and 1/5 second to execute
on a SUN 3/60, although the actual numbers depend on how big
processes groups get, the speed of the network, the locations of
processes involved, etc. Overall, however, the system is really
quite fast when compared with, say, file access over the network.
For certain common operations a five to ten-fold performance
improvement is expected within two years, as we implement a col-
lection of optimizations. The system scales well with the size
of the network, and system overhead is largely independent of
network size. On a machine that is not participating in any ISIS
application, the overhead of having ISIS running is negligible.
In certain communication scenarios, ISIS performance can be quite
good. These involve streaming data within a single group or certain
client-server interaction patterns, and make use of a new BYPASS
communication protocol suite. Future ISIS development is likely
to stress extensions and optimizations at this level of the system.
In addition, a lot of effort is going into scaling the system
to larger environments.
--- You can get a copy of ISIS now ---
Version V2.1 of ISIS is now fully operational and is being made
available to the public. This version consists of a C implementations
for UNIX, and has been ported to AIX, SUN, UNIX, MACH, ULTRIX, Gould UNIX,
HP-UX, AUX and APOLLO UNIX (release 10.1). Performance is uniformly good.
A 400 page tutorial and sys- tem manual containing numerous programming
examples is also available. Online manual pages are also provided.
The remainder of this posting focuses on how to get ISIS, and how
to get the manual. Everything is free except bound copies of the
manual. Source is included, but the system is in the public
domain, and is released on condition that any ports to other sys-
tems or minor modifications remain in the public domain. The
manual is copyrighted by the project and is available in hard-
copy form or as a DVI file, with figures available for free on
request.
We have placed a compressed TAR images in the following places:
* cu-arpa.cs.cornell.edu (anonymous login, binary mode pub/ISISV21.TAR.Z)
* Doc: cu-arpa.cs.cornell.edu (pub/ISISV21-DOC.TAR.Z)
* uunet.uu.net (anonymous login, binary mode networks/ISIS/ISISV21.TAR.Z)
* mcsun.eu.net (anonymous login, binary mode networks/ISIS/ISISV21.TAR.Z)
Also available are DVI and PS versions of our manual. Bound
copies will be available at $25 each. A package of figures to
glue into the DVI version will be provided free of charge.
A tape containing ISIS will be provided upon payment of a charge to
cover our costs in making the tape. Our resources are limited and
we do not wish to do much of this.
--- Copyright, restrictions ---
V2.1 of ISIS is subject to a restrictive copyright; basically, you can
use it without changing it in any way you like, but are not permitted
to develop "derivative versions" without discussing this with us.
V2.1 differs substantially from V1.3.1, which was released in the public
domain and remains available without any restrictions whatsoever.
On the other hand, whereas previous versions of ISIS required export
licenses to be sent to certain eastern-block countries, the present
version seems not to be subject to this restriction. Contact the US
Dept. of Commerce for details if you plan to export ISIS to a country
that might be subject to restrictions. Any place in Europe, Japan, etc.
should be fine and no license is required.
--- Commercial support ---
We are working with a local company, ISIS Distributed Systems
Inc., to provide support services for ISIS. This company will
prepare distributions and work to fix bugs. Support contracts
are available for an annual fee; without a contract, we will do
our best to be helpful but make no promises. Other services that
IDS plans to provide will include consulting on fault-tolerant
distributed systems design, instruction on how to work with ISIS,
bug identification and fixes, and contractual joint software
development projects. The company is also prepared to port ISIS
to other systems or other programming languages. Contact
"birman@gvax.cs.cornell.edu" for more information.
--- If you want ISIS, but have questions, let us know ---
Send mail to isis@cs.cornell.edu, subject "I want ISIS",
with electronic and physical mailing details. We will send you a
form for acknowledging agreement with the conditions for release
of the software and will later contact you with details on how to
actually copy the system off our machine to yours.
--- You can read more about ISIS if you like ---
The following papers and documents are available from Cornell.
We don't distribute papers by e-mail. Requests for papers should
be transmitted to "isis@cs.cornell.edu".
1. Exploiting replication. K. Birman and T. Joseph. This is a
preprint of a chapter that will appear in: Arctic 88, An
advanced course on operating systems, Tromso, Norway (July
1988). 50pp.
2. Reliable broadcast protocols. T. Joseph and K. Birman.
This is a preprint of a chapter that will appear in: Arctic
88, An advanced course on operating systems, Tromso, Norway
(July 1988). 30pp.
3. ISIS: A distributed programming environment. User's guide
and reference manual. K. Birman, T. Joseph, F. Schmuck.
Cornell University, March 1988. 275pp.
4. Exploiting virtual synchrony in distributed systems. K.
Birman and T. Joseph. Proc. 11th ACM Symposium on Operating
Systems Principles (SOSP), Nov. 1987. 12pp.
5. Reliable communication in an unreliable environment. K.
Birman and T. Joseph. ACM Transactions on Computer Systems,
Feb. 1987. 29pp.
6. Low cost management of replicated data in fault-tolerant
distributed systems. T. Joseph and K. Birman. ACM Transac-
tions on Computer Systems, Feb. 1986. 15pp.
7. Fast causal multicast. K. Birman, A. Schiper, P. Stephenson.
Dept. of Computer Science TR, May 1990.
8. Distributed application management. K. Marzullo, M. Wood, R.
Cooper, K. Birman. Dept. of Computer Science TR, June 1990.
We will be happy to provide reprints of these papers. Unless we
get an overwhelming number of requests, we plan no fees except
for the manual. We also maintain a mailing list for individuals
who would like to receive publications generated by the project
on an ongoing basis. The last two papers can be copied using FTP
from cu-arpa.cs.cornell.edu.
If you want to learn about the virtual synchrony as an approach
to distributed computing, the best place to start is with refer-
ence [1]. If you want to learn more about the ISIS system, how-
ever, start with the manual. It has been written in a tutorial
style and should be easily accessible to anyone familiar with the
C programming language. References [7] and [8] are typical of our
recent publications (there are others -- contact Maureen Robinson
for details).
----------------------------------------------------------------
----------------------------------------------------------------
/* THANKS TO ALL WHO CONTRIBUTED TO THIS SUMMARY */
HI,
Before you read the summary of my request for references
on "Parallelizing applications on a multi-workstation
network" (I posted about a week ago) two more questions:
I got a very interesting hint of two projects dealing exactly
with the topic of my netnews request:
1. "The Multi-Satellite Star", by Michael Stumm at Stanford University
2. "Marionette: Support for Highly Parallel Distributed Programs in Unix"
by Mark Sullivan, at Univ. of Berkeley, CA
Unfortunately I couldn't get any more information about this projects.
No e-mail addresses, technical reports nor published paper.
Is anyone out there who knows anything about the two projects mentioned
above (e-mail addresses, published papers, etc.)? It should be easy for
you guys at Stanford and Berkeley University to find out about, right?
Please let me know about.
Thanks in advance.
------------------------------------------------------------------
Now to the SUMMARY. I got more than 30 answers to my network request.
I am still receiving at least 3 responses a day. Anyway I think
it is time to post my summary as promised.
Some entries are undoubtably incomplete.
Corrections and additions are appreciated.
I also included the name of the contributor for
most of the contributors. I hope that this does not
violate some aspect of netiquette that I am unaware of.
Please forgive the faux pas otherwise.
Thanks to all those who contributed!
Thomas Fahringer
Universitaet Wien
Institut fuer Statistik und Informatik
Rathausstrasse 19/II/3
1010 Wien
Austria
tf@eacpc1.tuwien.ac.at
thomasf@globe.edrc.cmu.edu
tf%eacpc1.tuwien.ac.at@VMXA.TUWIEN.AC.AT
--------------------------------------------------------------
1.
People here at Yale and elsewhere are looking at using C-LINDA
to effectively use a network of workstations (homogeneous or
heterogeneous). The tuple space is assumed to be distributed
throughout the local memories of the nodes and the compiler performs
some optimisations to reduce data movement during a process's
communication with tuple space. i myself am using C-Linda on a
Sun-Sparc network for various linear algebra algorithms and have gotten good
speedups. contact Doug Gilmore at "gilmore@sca.com " for further details:
>From: David Kaminsky <kaminsky-david@CS.YALE.EDU>
I am working on a similar problem using TSnet. TSnet
is a network version of Linda.
>From: <bremner@cs.sfu.ca>
Carriero, N., Gelernter, D. [1986], "The S/Net's Linda Kernel", ACM
Transactions on Computer Systems,4,2, May 1986, pp. 110-129.
Carriero, N., Gelernter, D. [1988,1], "Linda in Context", Research
Report YALEU/DCS/RR-622, Yale University, Department of Computer
Science, April 1988.
Carriero, N., Gelernter, D. [1988,2], "How to Write Parallel Programs:
A Guide to the Perplexed", Research Report YALEU/DCS/RR-628, Yale
University, Department of Computer Science, April 1988.
Carriero, N., Gelernter, D. [1989] Technical Correspondence,
Communications of the ACM, 32,19, pp. 1256-1258
Davidson, C., [1989], ibid, pp. 1249-1251
Gelernter, D. [1984], "Dynamic global name spaces on network
computers", Proceedings of International Conference on Parallel
Processing, August, 1984, pp. 25-31.
Gelernter, D. [1985], "Generative Communication ,in Linda", ACM
Transactions on Programming Languages and Systems, 7, 1, pp. 80-112.
Leler, Wm [1990], "Linda Meets Unix", IEEE Computer, 23, 2, pp. 43-54.
-------------------------------------------------------------
2.
>From: anand@top.cis.syr.edu
We just had a discussion of this topic on the net a while back. I have
used ISIS for parallel processing of the type that you are interested
in.
>From: "Chang L. Lee" <clee@polyslo.CalPoly.EDU>
You might want to look at ISIS from Cornell. It's a distributed
system toolkit. The idea is to build applications served by process
groups, with the virtual synchrony model helping to make the actual
implementation less painful.
---------------------------------------------------------------
3.
TCGMSG Send/receive subroutines .. version 3.0 (9/27/90)
--------------------------------------------------------
Robert J. Harrison
tel: (708) 972-7197
E-mail: harrison@tcg.anl.gov, harrison@anlchm.bitnet
letter: Bldg. 200,
Theoretical Chemistry Group,
Argonne National Laboratory,
9700 S. Cass Avenue, Argonne, IL 60439.
These routines have been written with the objective of providing a
robust, portable, high performance message passing system for
distributed memory FORTRAN applications. The C interface is also
portable. The syntax is nearly identical to that of the iPSC
subroutines, but the functionality is restricted to improve efficiency
and speed implementation. On machines with vector hardware sustained
interprocess communication rates of 6.0Mb/s have been observed. This
toolkit (referred to as TCGMSG) only strives to provide the minimal
functionallity needed for our applications. It is only a stop gap
until some better model becomes widely (and cheaply) available.
However, I believe that many (not all) chemistry and physics problems
are readily and efficiently coded with this simple functionality, and
that such effort will not be wasted when better tools are found.
----------------------------------------------------------------
4.
>From: Rochelle Grober <argosy!rocky@decwrl.dec.com>
Have you ever heard of what has been done on Apollo workstations (now owned
by HP)? Their animated movies are created by a "master" node grabbing any
other computer on the net and shipping off a frame and appropriate code to
process the frame using ray tracing algorithms. The master locates available
computers, coordinates shippping off a frame and appropriate code to
process the frame using ray tracing algorithms. The master locates available
computers, coordinates shipping out and retrieving the frames and code, etc.
It can grow with the number of computers available. The way most of the
movies have been made is that the job is run during off hours and weekends.
I believe one of their claims was something like 25,000 hours of compute time
to generate one of their shors was accomplished over two weekends using the
company's at that time ~1000 node network.
I, mayself participated in a project that took our large data files and broke
them up into blocks, shipping blocks and copy of the necessary code to up
to five other computers on the net. It worked very well, and this was in
1986. The apollo operating system and system support is designed to facilitate
this kind of network distribution of work, so the code to do this sort of
subtasking took a knowledgeable person a day to write, and perhaps a couple
to ensure it worked properly.
-------------------------------------------------------------------------
5.
>From: Colin Brough <cmb@castle.edinburgh.ac.uk>
Have you heard of CSTools, from Meiko Scientific? It is a message
passing environment, originally designed for their Computing Surface
reconfigurable processor arrays. There is now a version for a Sun
cluster, as well as the Transputer and i860 versions.
-----------------------------------------------------------------------
6.
>From: "V.S.Sunderam" <vss@mathcs.emory.edu>
We have a system called PVM that does what you describe. A paper on
this has appeared in "Concurrency: Practice & Experience"
December 1990.
----------------------------------------------------------------------
7.
>From: "Veikko M. Keha" <keha@hydra.Helsinki.FI>
I am working on a methodology that tries to automate the process
of converting a serial program into objects that are executed parallel
in a local area network. This "Implicit Parallelism Model" is
targeted to programmers who are used to write serial programs and
don't wont to know much about parallelism and its complexity.
The model is based on remote procedure calls. The source code is
modified by a precompiler to produce such code that remote
procedure calls are executed parallel. The model is suitable to
be used with any 3rd generation language.
I have built some prototypes to demonstrate the model. The language
has been C++.
----------------------------------------------------------------------
8.
>From: rgb@mcc.com (Roger Bolick)
You posted a request for information concerning multi-workstation
programming. If I understand your search correctly then our
project may be of interest. It is a programming environment for
C++ with a runtime kernel running on either a Unix box or on the
native hardware which supports remote objects. This is used on
both multi-computer hardware as well as multi-workstations.
This means that you program runs on the available hardware as
described in that applications configuration file, assigning
objects to as many nodes as you have. Of course its not that
simple and there are limitations, but that is why its still in
research.
-------------------------------------------------------------------
9.
>From: Karsten Morisse <kamo@uni-paderborn.de>
I collected quite a few responses to my query on how
to connect an ensemble of suns into a multiprocessing team.
Here is a summary of what I received.
(It took some time for the mail to percolate overseas
and back, that is the reason for the delay in my replying.)
I heard about 9 different projects:
1. ISIS (Cornell)
If you want ISIS, send mail to
"croft@gvax.cs.cornell.edu," subject "I want ISIS".
2. Cosmic Environment (Caltech)
You can obtain a programming guide by sending e-mail
to chuck@vlsi.caltech.edu, or postal mail to:
3. DOMINO (U. Maryland, College Park)
DOMINO is a message passing environment for parallel computation.
See the Computer Science Dept. (U. Maryland) tech report # TR-1648
(April, 1986) by D. P. O'Leary, G. W. Stewart, and R. A. van de Geijn.
4. DPUP (U. of Colorado)
DPUP stands for Distributed Processing Utilities Package.
What follows is an abstract from a technical report
written at the Computer Science Dept. at the University of Colorado
by T. J. Garner, et. al
"DPUP is a library of utilities that support distributed concurrent
computing on a local area network of computers.
The library is built upon the interprocess communication
facilities in Berkeley Unix 4.2BSD."
5. TORiS (Toronto)
TORis implements a shared memory communication model.
Contact Orran Krieger at the University of Toronto for more information:
UUCP: {decvax,ihnp4,linus,utzoo,uw-beaver}!utcsri!eecg!okrieg
ARPA: okrieg%eecg.toronto.edu@relay.cs.net
CSNET: okrieg@eecg.toronto.edu
CDNNET: okrieg@eecg.toronto.cdn
6. LINDA (Yale, Scientific Computing Associates)
Linda is a parallel programming language for shared memory
implementations. It is simple and has only six operators. C-linda
has been implemented for a network of SUNs in the internet domain.
With LAN-LINDA (also called TSnet) you can write parallel or
distributed programs in C and run them on a network of workstations.
TSnet has been tested on Sun and IBM RT workstations.
Contact David Gelernter (project head) or Mauricio Arango at:
gelernter@cs.yale.edu
arango@cs.yale.edu
TSnet and other Linda systems are being distributed through
Scientific Computing Associates.
Contact
Dennis Philbin
Scientific Computing Associates
246 Church St., Suite 307
New Haven, CT 06510
203-777-7442
7. SR (U. Arizona)
"SR (Synchronizing Resources) is designed for writing distributed
programs. The main language constructs are resources and operations.
Resources encapsulate processes and variables they share;
operations provide the primary mechanism for process interaction.
SR provides a novel integratiotion of the mechanisms for invoking
and servicing operations. Consequently, all of local and remote
procedure call, rendezvous, message passing, dynamic process
creation, multicast, and semaphores are supported. An overview of the
language and implementation appeared in the January, 1988, issue of
TOPLAS (ACM Transactions on Programming Languages and Systems 10,1, 51-86).
"SR is available by anonymous FTP from Arizona.EDU (128.196.128.118 or
192.12.69.1).
[Copy over the README file for an explanation.]
You may reach the members of the SR project electronically at:
uunet!Arizona!sr-project
or by surface mail at:
SR Project
Department of Computer Science
University of Arizona
Tucson, AZ 85721
(602) 621-2018
8. MAITRD (U.C. Berkeley/U. Wash)
"The maitr'd software is remote process server that is designed to
farm out cpu expensive jobs to less loaded machines. It has a small
amount of built-in intelligence, in that it attempts to send jobs to
the least loaded machine of the set which is accepting off-site jobs."
`Maitrd' is available via anonymous ftp from
june.cs.washington.edu (128.95.1.4) as ~ftp/pub/Maitrd.tar.Z.
There is also a heterogeneous systems rpc package `hrpc.tar.Z'.
Contact Brian Bershad at U. Washington (brian@june.cs.washington.edu.)
for more information.
9. PARMACS (Argonne)
David Levine at Argonne National Laboratory tells us about a
"generic package to do send/recv message passing" with
"different versions (c, c++, fortran) [that] work on different machines."
For more information, send email to netlib@mcs.anl.gov, with subject
(or body) ``send index from parmacs.''
For more information send email to
levine@mcs.anl.gov or by uucp: {alliant,sequent,rogue}!anlams!levine.
------------------------------------------------------------------
10.
>From: Karen Tracey <kmt@pclsys48.pcl.nd.edu>
My implementation platform for this work is the ARCADE distributed
system. If you are not familiar with ARCADE I can also give you
references on it. The goal of the ARCADE project is to develop an
environment which allows programmers to easily build distributed
applications that consist of cooperating tasks which may run on
heterogeneous machines. ARCADE contains a number of facilities
(data transfer & sharing, inter-task control & synchronization)
that simplify development of cooperative distributed applications.
"ARCADE: A Platform for Heterogeneous Distributed
Operating Systems", by David L. Cohn, William P. Delaney,
and Karen M. Tracey, appeared in Proceedings of 1989
USENIX Workship on Experiences with Distributed and
Multiprocessor Systems, Fort Lauderdale, FL, October 1989.
-----------------------------------------------------------------
11.
>From: David Taylor <ddt@ccwf.cc.utexas.edu>
I'm not sure about this, but I believe you may be interested in a system
called Amoeba (sp?). I can't remember whether it's a true parallel-processing
environment or simply an OS that distributes processes on a network, but it's
probably worth looking into.
-------------------------------------------------------------------
12.
>From : roman@CCSF.Caltech.EDU (Roman Salvador)
We (ParaSoft Corporation) sell a system (parallel libraries, profilers,
debugger, semi-automatic data-distributer, automatic parallelizer, ...)
to do parallel processing on networks of workstations and most other
parallel computers.
-------------------------------------------------------------------
13.
>From: Joe Hummel <jhummel@ICS.UCI.EDU>
The Univ. of Washington has such a system, Amber. It runs a single appl
in parallel on a network of shared-memory workstations (e.g. Dec Firefly).
See 12th ACM Syp on Operating Systems, Dec. '89, pp 1147-158.
Also, take a look at Munin, at system from Rice Univ. They have it running
on a network of Suns. See '90 ACM Sigplan PPoPP conference.
------------------------------------------------------------------
14.
We did some work at UCLA using PCs on an ethernet
Felderman, R.E., Schooler, E.M., Kleinrock L.,
"The Benevolent Bandit Laboratory: A Testbed for Distributed Algorithms",
IEEE Journal on Selected Areas in Communications, Vol. 7, No. 2, February 1989.
----------------------------------------------------------------------
15.
"J. Eric Townsend" <JET@uh.edu>
I would suggest you look at CalTech's "Cosmic Environment",
which lets you write iPSC/2 code and run it on a network of workstations.
chuck@vlsi.caltech.edu is who you need to talk to.
---------------------------------------------------------------------
16.
>From: Andy Newman <andy@research.canon.oz.au>
Get in touch with AT&T and get some information on their
Concurrent C product. They have a version that runs on
multiple machines in the network and others that run on
actual multi-processors. Its basically C with Hoare's (sp?)
CSP constructs added to it (a sort of super-Occam).
-----------------------------------------------------------------------
17.
>From: paddy@daimi.aau.dk
About 2 years ago I was involved in a project which given an
Ada program with (logical) site annotations converted it into
a set of Ada programs which could be compiled using existing
compilers and the code run on separate machines
connected by an ethernet. There was a set of utilities which
invoked our convertor, followed by the Ada compiler/linker
etc.
-----------------------------------------------------------------------
18.
>From: Paulo V Rocha <P.Rocha@cs.ucl.ac.uk>
The PYGMALION Programming Environment, a ESPRIT II project, uses a
multi-workstation environment (up to 3 workstations if I am not wrong)
to run neural network applications. It uses remote procedure calls to
communicate.
----------------------------------------------------------------------
19.
>From: adamb@cs.utk.edu
We're currently working on a system called DagNet which will allow
the programmer to specify subroutines and their dependencies and then
have the subroutines scheduled around the internet. In DagNet the
data distribution is precisely associated with the dependencies. We
currently have a prototype working but the the system is still going
through changes.
I also developed a system called Phred which allows the visual
specification and analysis of parallel programs. Currently I'm working
on designing an execution system which will actually execute Phred
programs over a network of machines. Unfortunately the executioner is
still on paper at this point. Phred does have interesting properties
for sharing data among parallel processes which might interest you.
------------------------------------------------------------------
20.
>From: eric@castle.edinburgh.ac.uk
Meiko (from Bristol UK -- don't have actual address but somebody
on the net must) sell a product called CS-Tools which will run
jobs over a network of Suns (SPARCstations and SLC etc) and
their own boxes. Don't know how well it works. (I use it every
day but I only run on Meiko's Computing Surface so can't verify
its behaviour on the Suns.)
--------------------------------------------------------------------
21.
>From: Ozalp Babaoglu <ozalp@dm.unibo.it>
Paralex: An Environment for Parallel Programming
in Distributed Systems
One of the many advantages of distributed systems is their ability to
execute several computations on behalf of a single application in
parallel, thus improving performance. In fact, at a certain level of
aabstraction, there is little difference between a distributed system
and a losely-coupled multiprocessor computer. We cannot, however, treat
distributed systems as if they were uniform multiprocessor parallel
machines due to the following characteristics:
o High latency, low bandwidth communication
o Presence of heterogeneous processor architectures
o Communication link and processor failures
o Multiple independent administrative domains.
Thus, if we can address these issues, a distributed computing resource
such as a collection of workstations could be viewed and used as if it
were a poor man's ``super computer.'' To make a distributed system
suitable for long-running parallel computations, support must be
provided for fault tolerance. Many hours of computation can be wasted
not only if there are hardware failures, but also if one of the
processors is turned off, rebooted or disconnected from the network.
Given that the components of the system (workstations) may be under
the control of several independent administrative domais (typically a
single individual who ``owns'' the workstation), these events are much
more plausible and frequent than real hardware failures.
----------------------------------------------------------------------
22.
>From: eisen@cc.gatech.edu (Greg Eisenhauer)
I'm not sure if it's exactly what you're looking for, but you might look at my
old work on Distributed Ada. The idea was that you took a single application
(program) and gave it to a compiler along with a specification of how you
wanted parts of it to be distributed across a distributed memory architecture.
We did most of the development here on a network of Suns and eventually moved
to an Intel iPSC/2 Hypercube.
-----------------------------------------------------------------------
23.
>From: Ralph Noack <mdaeng!rwn@utacfd.uta.edu>
We've bought a package called Express by ParaSoft.
It runs on sun workstation networks and pc or macs with transputer
cards. It has two different modes of programming for parallelizing a task.
1) cubix: a single executable with appropriate calls to library
routines to: find id number, exchange data between nodes, etc.
I've written a small program with solves a simple pde on multiple
nodes. So it works. I could not get the main application running do to
problems with their current preprocessor(it translates fortran write
statements to subr calls so all io plays through node 0). They say
a new version of the preprocessor will be released soon.
2) host + node executables. A single host task communicated/controls
multiple tasks running a different executable.
-----------------------------------------------------------------------
24.
from rwolski@lll-crg.llnl.gov (Richard Wolski)
About a week ago I posted a request for reference information on
partitioning and scheduling for heterogeneous systems. I am pleased to say
the the response has been almost overwhelming. While I haven't completely
digested everything I have received, here is a summary of what I think
is pertinent.
Dr. Shahid Bokhari at icase suggests:
-------------------------------------
Author = "Shahid H. Bokhari",
Year = "July 1979",
Journal = "IEEE Transactions on Software Engineering",
Number = "5",
Pages = "341-349",
Title = "Dual processor scheduling with dynamic reassignment",
Volume = "SE-5",
Author = "Shahid H. Bokhari",
Year = "November 1981",
Journal = "IEEE Transactions on Software Engineering",
Number = "6",
Pages = "583-589",
Title = "A shortest tree algorithm for optimal assignments across space and time in
a distributed processor system",
Volume = "SE-7",
Author = "Shahid H. Bokhari", *** RECOMMENDED ***
Title = "Partitioning problems in parallel, pipelined and distributed computing",
Journal = "IEEE Transactions on Computers",
Year = "January, 1988",
Number ="1",
Volume="C-37",
Pages="48-57"
Author = "Shahid H. Bokhari",
Title = "Assignment problems in parallel and distributed computing",
Publisher = "Kluwer",
Address = "Boston",
Year = "1987"
Author = "Patricia J. Carstensen",
Title = "The Complexity of Some Problems in Parametric Linear and Combinatorial Programming",
Year = "1983",
Institution = "Department of Mathematics,
University of Michigan",
Author = "K. W. Doty",
Author = "P. L. McEntire",
Author = "J. G. O'Reilly",
Title = "Task allocation in a distributed
computer system",
Journal = "Proceedings of the IEEE Infocom 82",
Pages = "33-38",
Year = "1982",
Author = "Dan Gusfield",
Title = "Parametric combinatorial computing and a problem of program module distribution",
Journal = "Journal of the ACM",
Volume = "30",
Number = "3",
Pages = "551-563",
Year = "July 1983",
Author = "Robert E. Larson",
Author = "Paul E. McIntyre",
Author = "John G. O'Reilly",
Title = "Tutorial: Distributed Control",
Publisher = "IEEE Computer Society Press", Address = "Silver Spring, MD",
Year = "1982",
Author = "Virginia M. Lo", **** RECOMMENDED ****
Title = "Heuristic algorithms for task assignments in distributed systems",
Journal = "Proceedings of the 4th International Conference on Distributed Processing Systems",
Pages = "30-39",
Year = "May 1984",
Author = "Janet Michel",
Author = "Andries van Dam",
Title = "Experience with distributed processing on a host/satellite system",
Journal = "Computer Graphics (SIGGRAPH Newsletter)",
Volume = "10",
Number = "2",
Year = "1976",
Author = "Camille C. Price",
Author = "Udo W. Pooch",
Title = "Search Techniques for a nonlinear multiprocessor scheduling problem",
Journal = "Naval Research Logistics Quarterly",
Volume = "29",
Number = "2",
Pages = "213-233",
Year = "June 1982",
Author = "Gururaj S. Rao",
Author = "Harold S. Stone",
Author = "T. C. Hu",
Title = "Assignment of tasks in a distributed processor system with limited memory",
Journal = "IEEE TC",
Volume = "C-28",
Number = "4",
Pages = "291-299",
Year = "April 1979",
Author = "Harold S. Stone", **** RECOMMENDED ****
Title = "Multiprocessor scheduling with the aid of network flow algorithms",
Journal = "IEEE Transactions on Software Engineering",
Volume = "SE-3",
Number = "1",
Pages = "85-93",
Year = "January 1977",
Author = "Harold S. Stone",
Year = "1977",
Number = "ECE-CS-77-7",
Institution = "Department of Electrical & Computer Engineering, University of Massachusetts, Amherst",
Title = "Program assignment in three-processor systems and tricutset partitioning of graphs"
Author = "Harold S. Stone",
Title = "Critical load factors in two-processor distributed systems",
Journal = "IEEE Transactions on Software Engineering",
Volume = "SE-4",
Number = "3",
Pages = "254-258",
Year = "May 1978",
Author = "Donald F. Towsley",
Title = "Allocating programs containing branches and loops within a multiple processor system",
Journal = "IEEE Transactions on Software Engineering",
Volume = "SE-12",
Pages = "1018-1024",
Year = "October 1986",
Author = "Andries van Dam",
Author = "George M. Stabler",
Author = "Richard J. Harrington",
Title = "Intelligent satellites for interactive graphics",
Journal = "Proceedings of the IEEE",
Volume = "62",
Number = "4",
Pages = "483-492",
Year = "April 1974",
>From Alessandro Forin at CMU:
@article ( IEEECOMP,
key = "Agora" ,
author = "Bisiani, R. and Forin, A." ,
title = "Multilanguage Parallel
Programming on Heterogeneous Systems" ,
journal = "IEEE Transactions on Computers",
publisher= "IEEE-CS" ,
month = "August" ,
year = "1988" ,
)
inproceedings ( BISI87G,
key = "bisi87g" ,
author = "Bisiani,R. and Lecouat,F." ,
title = "A Planner for the Automatization of Programming
Environment Tasks" ,
booktitle= "21st Hawaii International Conference on System
Sciences" ,
publisher= "IEEE" ,
month = "January" ,
year = "1988" ,
bibdate = "Fri Aug 28 09:44:54 1987" ,
)
@inproceedings ( DBGWKSHP,
key = "Agora" ,
author = "Forin, Alessandro" ,
title = "Debugging of Heterogeneous Parallel Systems" ,
booktitle= "Intl. Workshop on Parallel and Distributed Debugging",
publisher= "SIGPLAN Notices, V24-1 Jan. 1989",
address = "Madison, WI",
month = "May" ,
year = "1988" ,
pages = "130-141",
)
@techreport ( ASMREPORT,
key = "Agora" ,
author = "R. Bisiani, F. Alleva, F. Correrini, A. Forin, F. Lecouat, R. L
erner",
title = "Heterogeneous Parallel Processing, The Agora
Shared Memory" ,
institution= "Carnegie-Mellon University" ,
address = "Comp. Science Dept." ,
type = "Tech. Report" ,
number = "CMU-CS-87-112" ,
month = "March" ,
year = "1987" ,
)
Dr Michael Coffin at Unoiversity of Waterloo suggests:
------------------------------------------------------
AUTHOR = "Michael H. Coffin",
TITLE = "Par: {A}n Approach to Architecture-Independent Parallel
Programming",
SCHOOL = "Department of Computer Science, The University of
Arizona",
MONTH = aug,
YEAR = "1990",
ADDRESS = "Tucson, Arizona"
}
Dr. David Skillicorn at Queens University suggests:
---------------------------------------------------
TITLE = {The Purdue Dual {MACE} Operating System},
INSTITUTION = {Purdue University},
KEYWORDS = {Abell1},
YEAR = {1978},
MONTH = {NOV},
}
@ARTICLE{bib:002,
AUTHOR = {Guy T. Almes and Andrew P. Black and Edward D.
Lazowska and Jerre D. Noe},
TITLE = {The Eden System: A Technical Review},
JOURNAL = {IEEE Transactions on Software Engineering},
PAGES = {43--59},
KEYWORDS = {Almes1},
YEAR = {1985},
MONTH = {JAN},
}
@INPROCEEDINGS{bib:003,
AUTHOR = {D.E. Bailey and J.E. Cuny},
TITLE = {An Approach to Programming Process Interconnection
Structures: Aggregate Rewriting Graph Grammars},
BOOKTITLE = {Proceedings of PARLE '87 Parallel Architectures
and Languages Europe, Volume II},
PAGES = {112--123},
ORGANIZATION = {Springer-Verlag, Lecture Notes in Computer
Science},
ADDRESS = {Eindhoven, The Netherlands},
YEAR = {1987},
MONTH = {June},
}
@ARTICLE{bib:004,
AUTHOR = {A. Barak and A. Litman},
TITLE = {{MOS}: a Multicomputer Distributed Operating System},
JOURNAL = {Software: Practice and Experience},
KEYWORDS = {Barak1},
LENGTH = {725},
YEAR = {1985},
MONTH = {AUG},
}
@ARTICLE{bib:005,
AUTHOR = {A. Barak and A. Shiloh},
TITLE = {A Distributed Load Balancing Policy for a
Multicomputer},
JOURNAL = {Software: Practice and Experience},
KEYWORDS = {Barak2},
LENGTH = {901},
YEAR = {1985},
MONTH = {SEP},
}
@ARTICLE{bib:006,
AUTHOR = {? Bartlett and et al},
TITLE = {A NonStop Kernel},
JOURNAL = {PROC of the 8th SOSP},
KEYWORDS = {Bartle1},
YEAR = {1981},
MONTH = {OCT},
}
@ARTICLE{bib:007,
AUTHOR = {M.J. Berger and S.H. Bokhari},
TITLE = {A Partitioning Strategy for Nonuniform Problems on
Multiprocessors},
JOURNAL = {IEEE Transactions on Computers},
VOLUME = {C-36, No.5},
PAGES = {570--580},
KEYWORDS = {rectangular partition with uniform workload},
YEAR = {1987},
MONTH = {May},
}
@INPROCEEDINGS{bib:008,
AUTHOR = {Andrew P. Black},
TITLE = {Supporting Distributed Applications: Experience with
Eden},
JOURNAL = {PROC of the 10th SOSP},
KEYWORDS = {Black1},
YEAR = {1985},
MONTH = {DEC},
}
@ARTICLE{bib:011, ***** RECOMMENDED *****
AUTHOR = {Shahid H. Bokhari},
TITLE = {On the Mapping Problem},
JOURNAL = {IEEE Transactions on Computers},
VOLUME = {C-30},
NUMBER = {3},
PAGES = {207--214},
KEYWORDS = {grecommended,},
YEAR = {1981},
MONTH = {March},
ABSTRACT = {This paper is important because it points out that
the mapping problem is akin to graph traversal and is at
least P-complete. Also see ICPP79. Reproduced in the
1984 tutorial: Interconnection Networks for
parallel and distributed processing by Wu and
Feng.},
}
@ARTICLE{bib:015,
AUTHOR = {W.W. Chu and L.J. Holloway and M.T. Lan and K. Efe},
TITLE = {Task Allocation in Distributed Data Processing},
JOURNAL = {Computer},
PAGES = {57--69},
YEAR = {1980},
MONTH = {November},
}
@INPROCEEDINGS{bib:018,
AUTHOR = {J.G. Donnett and M. Starkey and D.B. Skillicorn},
TITLE = {Effective Algorithms for Partitioning Distributed
Programs},
BOOKTITLE = {Proceedings of the Seventh Annual International
Phoenix Conference on Computers and Communications},
PAGES = {363--369},
YEAR = {1988},
MONTH = {March 16--18},
}
@MISC{bib:025, **** RECOMMENDED ****
AUTHOR = {D.A. Hornig},
TITLE = {Automatic Partitioning and Scheduling on a Network of
Personal Computers},
INSTITUTION = {Carnegie Mellon University, Department of
Computer Science,},
YEAR = {1984},
MONTH = {November},
ABSTRACT = {This Ph.D thesis describes the development of a
language Stardust in which indications are given of the
running time of each function. The run-time evironment
then schedules the functions based on the costs of
message passing and load balancing. There is some
discussion of granularity. The language contains no
explicit partitioning.},
}
@ARTICLE{bib:027,
AUTHOR = {P. Hudak and B. Goldberg},
TITLE = {Distributed Execution of Functional Programs Using
Serial Combinators},
JOURNAL = {IEEE Transactions on Computers},
VOLUME = {C34, No.10},
PAGES = {881--891},
YEAR = {1985},
MONTH = {October},
}
@ARTICLE{bib:031, **** RECOMMENDED ****
AUTHOR = {F.C.H. Lin and R.M. Keller},
TITLE = {The Gradient Model Load Balancing Method},
JOURNAL = {IEEE Transactions on Software Engineering},
VOLUME = {SE-13, No.1},
PAGES = {32--38},
YEAR = {1987},
MONTH = {January},
}
@INPROCEEDINGS{bib:037,
AUTHOR = {L.J. Miller},
TITLE = {A Heterogeneous Multiprocessor Design and the
Distributed Scheduling of its Task Group
Workload},
BOOKTITLE = {Proceedings of 9th Annual Symposium on Computer
Architecture},
PAGES = {283--290},
YEAR = {1982},
MONTH = {April},
}
@ARTICLE{bib:042,
AUTHOR = {D.A. Padua and M.J. Wolfe},
TITLE = {Advanced Compiler Optimizations for Supercomputers},
JOURNAL = {Communications of the ACM},
VOLUME = {29, No.12},
PAGES = {1184--1201},
YEAR = {1986},
MONTH = {December},
}
@ARTICLE{bib:043,
AUTHOR = {Michael L. Powell and Barton P. Miller},
TITLE = {Process Migration in DEMOS/MP},
JOURNAL = {PROC of the 9th SOSP},
KEYWORDS = {Powell1},
LENGTH = {110},
YEAR = {1983},
MONTH = {DEC},
}
@ARTICLE{bib:044,
AUTHOR = {G.S. Rao and H.S. Stone and T.C. Hu},
TITLE = {Assignment of Tasks in a Distributed Processor System
with Limited Memory},
JOURNAL = {IEEE Transactions on Computers},
VOLUME = {C-28, No.4},
PAGES = {291--299},
YEAR = {1979},
MONTH = {April},
}
@ARTICLE{bib:046, **** RECOMMENDED ****
AUTHOR = {C.-C Shen and W.-H. Tsai},
TITLE = {A Graph Matching Approach to Optimal Task Assignment
in Distributed Computing Systems Using a Minimax
Criterion},
JOURNAL = {IEEE Transactions on Computers},
VOLUME = {C-34, No.3},
PAGES = {197--203},
YEAR = {1985},
MONTH = {March},
}
@ARTICLE{bib:054,
AUTHOR = {H. Widjaja},
TITLE = {An Effective Structured Approach to Finding Optimal
Partitions},
JOURNAL = {Computing},
VOLUME = {29, No.3},
PAGES = {241--262},
YEAR = {1982},
}
@INPROCEEDINGS{bib:055,
AUTHOR = {E. Williams},
TITLE = {Assigning Processes to Processors in Distributed
Systems},
BOOKTITLE = {Proceedings of 1983 International Conference on
Parallel},
PAGES = {404--406},
YEAR = {1983},
MONTH = {August},
}
@INPROCEEDINGS{bib:056,
AUTHOR = {F. Ercal and J. Ramanujam and P. Sadayappan},
TITLE = {Task Allocation onto a Hypercube by Recursive Mincut},
BOOKTITLE = {Hypercube Conference},
YEAR = {1988},
}
@article{,
author = {J.-L. Gaudiot and J.I. Pi and M.L. Campbell},
title = {Program Graph Allocation in Distributed Multicomputers},
journal = {Parallel Computing},
volume = {7},
year = {1988},
pages = {227 -- 247},
}
David Hudak at the University of Michigan writes:
-------------------------------------------------
"Performance Evaluation and Prediction for Parallel Algorithms on the
BBN GP1000", F. Bodin, D. Windheiser, W. Jalby, etc., ACM International
Conference on Supercomputing, 1990, pp. 401 - 413.
"The Impact of Synchronization and Granularity on Parallel Systems",
Ding-Kai Chen, Hong-Men Su, and Pen-Chung Yew, International Symposium on
Computer Architecture, 1990, p. 239 - 248
Also: for interesting work on dynamic partitioning, check Polychronopou
los'
article, (IEEE Computer, '86 I think) on Guided Self-Scheduling
Really, the guys you want to read about are: Jalby, Polychronopoulos,
Dennis Gannon, Sameh, Windheiser, and, of course, me. (Oh, Reed
had an IEEE paper '87 on stencils and program partitioning, and
Vrsalovic had a good tech report from CMU.)
Bill Schilit at Columbia suggests:
----------------------------------
Parallel Processing: the Cm* experience, Edward F. Gehringer,
et. al. Digital Press
Dr. David Finkel at Worcester Polythecnic Institute writes:
-----------------------------------------------------------
"Evaluating Dynamic Load Sharing in Distributed Computer
Systems", Computer Systems: Science and Engineering
5 (1990), 89 - 94.
"Load Indices for Load Sharing in Heterogeneous Distributed Computing Systems",
with David Hatch,Proceedings of the 1990 UKSC Conference on
Computer Simulation, Brighton, 1990, 202 - 206.
Zbigniew Chamski (Zbigniew.Chamski@irisa.fr) suggests:
------------------------------------------------------
@string{IEEES = "IEEE Software"} **** RECOMMENDED ****
@string{ECEDOSU = "Electrical and Computer Engineering Department, Oregon State
University"}
@article{
KrLe88,
author = "Kruatrachue, B. and Lewis, T.",
title = "Grain Size Determination for Parallel Processing",
journal = IEEES,
year = 1988,
volume = 5,
number = 1,
pages = "23--32",
month = jan}
@phdthesis{ **** RECOMMENDED ****
Krua87,
author = "Kruatrachue, B.",
title = "Static Task Scheduling and Grain Packing in
Parallel Processing Systems",
school = ECEDOSU,
year = 1987,
address = "{Corvallis, OR, USA}"}
@PhdThesis{ElRe89,
author = "El-Rewini, H.",
title = "Architecture-Independent Task Partitioning and
Scheduling on Arbitrary Parallel Processing Systems",
school = "Department of Computer Science, Oregon State Universi
ty",
year = "1989",
address = "{Corvallis, OR, USA}",
month = nov}
I would also add the following recommendations:
McCreary, C., and Gill, H., "Automatic Determination of Grain Size for
Efficient Parallel Processing", CACM, September 1989, pp. 1073-1078.
Van Tilborg, A., Wittie, L., "Wave Scheduling -- Decentralized Scheduling
of Task Forces in Multicomputers", IEEE Transactions on Computers, 33:835-844,
September 1984.
Berman, F., "Why is Mapping Hard for Parallel Computers?", Proceedings of
the IEEE Parallel/Distributed Computing Networks Seminar, Jan. 31, 1990.
Sarkar, V., "Partitioning and Scheduling for Execution on Multiprocessors",
Ph.D. dissertation, Stanford Tech. Report No. CSL-TR-87-328, April 1987.
--
--------------------------------------------------------------------
Christian.Collberg@dna.lu.se
--
=========================== MODERATOR ==============================
Steve Stevenson {steve,fpst}@hubcap.clemson.edu
Department of Computer Science, comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell