[comp.parallel] SUMMARY: Parallelizing applications on a multi-workstation net.

thomasf@ius3.ius.cs.cmu.edu (Thomas Fahringer) (01/30/91)
/* THANKS TO ALL WHO CONTRIBUTED TO THIS SUMMARY                          */


HI,


Before you read the summary of my request for references
on "Parallelizing applications on a multi-workstation
network" (I posted about a week ago) two more questions:

I got a very interesting hint of two projects dealing exactly
with the topic of my netnews request:

1. "The Multi-Satellite Star", by Michael Stumm at Stanford University

2. "Marionette: Support for Highly Parallel Distributed Programs in Unix"
   by Mark Sullivan, at Univ. of Berkeley, CA

Unfortunately I couldn't get any more information about this projects.
No e-mail addresses, technical reports nor published paper. 

Is anyone out there who knows anything about the two projects mentioned
above (e-mail addresses, published papers, etc.)? It should be easy for
you guys at Stanford and Berkeley University to find out about, right?
 Please let me know about. 

Thanks in advance.

------------------------------------------------------------------


Now to the SUMMARY. I got more than 30 answers to my network request.
I am still receiving at least 3 responses a day. Anyway I think
it is time to post my summary as promised.

Some entries are undoubtably incomplete.
Corrections and additions are appreciated.
I also included the name of the contributor for
most of the contributors. I hope that this does not
 violate some aspect of netiquette that I am unaware of.  
Please forgive the faux pas otherwise.
Thanks to all those who contributed!

    Thomas Fahringer
    Universitaet Wien
    Institut fuer Statistik und Informatik
    Rathausstrasse 19/II/3
    1010  Wien
    Austria

    tf@eacpc1.tuwien.ac.at
    thomasf@globe.edrc.cmu.edu
    tf%eacpc1.tuwien.ac.at@VMXA.TUWIEN.AC.AT


--------------------------------------------------------------
1. 

People here at Yale and elsewhere are looking at using C-LINDA
to effectively use a network of workstations (homogeneous or
heterogeneous). The tuple space is assumed to be distributed
throughout the local memories of the nodes and the compiler performs
some optimisations to reduce data movement during a process's
communication with tuple space. i myself am using C-Linda on a
Sun-Sparc network for various linear algebra algorithms and have gotten good
speedups. contact Doug Gilmore at "gilmore@sca.com " for further details:


From: David Kaminsky <kaminsky-david@CS.YALE.EDU>
I am working on a similar problem using TSnet.  TSnet
is a network version of Linda.  


From: <bremner@cs.sfu.ca>
Carriero, N., Gelernter, D. [1986], "The S/Net's Linda Kernel", ACM
Transactions on Computer Systems,4,2, May 1986, pp. 110-129.

Carriero, N., Gelernter, D. [1988,1], "Linda in Context", Research
Report YALEU/DCS/RR-622, Yale University, Department of Computer
Science, April 1988.

Carriero, N., Gelernter, D. [1988,2], "How to Write Parallel Programs:
A Guide to the Perplexed", Research Report YALEU/DCS/RR-628, Yale
University, Department of Computer Science, April 1988.

Carriero, N., Gelernter, D. [1989] Technical Correspondence,
Communications of the ACM, 32,19, pp. 1256-1258

Davidson, C., [1989], ibid, pp. 1249-1251

Gelernter, D. [1984], "Dynamic global name spaces on network
computers", Proceedings of International Conference on Parallel
Processing, August, 1984, pp. 25-31.

Gelernter, D. [1985], "Generative Communication ,in Linda", ACM
Transactions on Programming Languages and Systems, 7, 1, pp. 80-112.

Leler, Wm [1990], "Linda Meets Unix", IEEE Computer, 23, 2, pp. 43-54.

-------------------------------------------------------------

2. 

From:     anand@top.cis.syr.edu
We just had a discussion of this topic on the net a while back. I have
used ISIS for parallel processing of the type that you are interested
in.


From: "Chang L. Lee" <clee@polyslo.CalPoly.EDU>
You might want to look at ISIS from Cornell.  It's a distributed
system toolkit.  The idea is to build applications served by process
groups, with the virtual synchrony model helping to make the actual
implementation less painful.

---------------------------------------------------------------

3. 

TCGMSG Send/receive subroutines .. version 3.0 (9/27/90)
--------------------------------------------------------

Robert J. Harrison

tel:    (708) 972-7197
E-mail: harrison@tcg.anl.gov, harrison@anlchm.bitnet
letter: Bldg. 200,
        Theoretical Chemistry Group,
        Argonne National Laboratory,
        9700 S. Cass Avenue, Argonne, IL 60439.


These routines have been written with the objective of providing a
robust, portable, high performance message passing system for
distributed memory FORTRAN applications. The C interface is also
portable. The syntax is nearly identical to that of the iPSC
subroutines, but the functionality is restricted to improve efficiency
and speed implementation. On machines with vector hardware  sustained
interprocess communication rates of 6.0Mb/s have been observed. This
toolkit (referred to as TCGMSG) only strives to provide the minimal
functionallity needed for our applications. It is only a stop gap
until some better model becomes widely (and cheaply) available.
However, I believe that many (not all) chemistry and physics problems
are readily and efficiently coded with this simple functionality, and
that such effort will not be wasted when better tools are found.

----------------------------------------------------------------

4.

From: Rochelle Grober <argosy!rocky@decwrl.dec.com>

Have you ever heard of what has been done on Apollo workstations (now owned
by HP)?  Their animated movies are created by a "master" node grabbing any
other computer on the net and shipping off a frame and appropriate code to
process the frame using ray tracing algorithms.  The master locates available
computers, coordinates shippping off a frame and appropriate code to
process the frame using ray tracing algorithms.  The master locates available
computers, coordinates shipping out and retrieving the frames and code, etc.
It can grow with the number of computers available.  The way most of the
movies have been made is that the job is run during off hours and weekends.
I believe one of their claims was something like 25,000 hours of compute time
to generate one of their shors was accomplished over two weekends using the
company's at that time ~1000 node network.

I, mayself participated in a project that took our large data files and broke
them up into blocks, shipping blocks and copy of the necessary code to up
to five other computers on the net.  It worked very well, and this was in
1986.  The apollo operating system and system support is designed to facilitate
this kind of network distribution of work, so the code to do this sort of
subtasking took a knowledgeable person a day to write, and perhaps a couple
to ensure it worked properly.


-------------------------------------------------------------------------

5. 

From: Colin Brough <cmb@castle.edinburgh.ac.uk>


Have you heard of CSTools, from Meiko Scientific?  It is a message
passing environment, originally designed for their Computing Surface
reconfigurable processor arrays.  There is now a version for a Sun
cluster, as well as the Transputer and i860 versions.

-----------------------------------------------------------------------

6.

From: "V.S.Sunderam" <vss@mathcs.emory.edu>


We have a system called PVM that does what you describe. A paper on
this has appeared in "Concurrency: Practice & Experience"
December 1990.

----------------------------------------------------------------------

7.

From: "Veikko M. Keha" <keha@hydra.Helsinki.FI>


I am working on a methodology that tries to automate the process
of converting a serial program into objects that are executed parallel
in a local area network. This "Implicit Parallelism Model" is
targeted to programmers who are used to write serial programs and
don't wont to know much about parallelism and its complexity.

The model is based on remote procedure calls. The source code is
modified by a precompiler to produce such code that remote
procedure calls are executed parallel. The model is suitable to
be used with any 3rd generation language.

I have built some prototypes to demonstrate the model. The language
has been C++.

----------------------------------------------------------------------

8.

From: rgb@mcc.com (Roger Bolick)


You posted a request for information concerning multi-workstation
programming.  If I understand your search correctly then our
project may be of interest.  It is a programming environment for
C++ with a runtime kernel running on either a Unix box or on the
native hardware which supports remote objects.  This is used on
both multi-computer hardware as well as multi-workstations.


This means that you program runs on the available hardware as
described in that applications configuration file, assigning
objects to as many nodes as you have.  Of course its not that
simple and there are limitations, but that is why its still in
research.


-------------------------------------------------------------------


9.

From: Karsten Morisse <kamo@uni-paderborn.de>

       I collected quite a few responses to my query on how
        to connect an ensemble of suns into a multiprocessing team.
        Here is a summary of what I received.
        (It took some time for the mail to percolate overseas
        and back, that is the reason for the delay in my replying.)

      I heard about 9 different projects:

        1.  ISIS (Cornell)
        If you want ISIS, send mail to
         "croft@gvax.cs.cornell.edu," subject  "I  want  ISIS".

        2.  Cosmic Environment (Caltech)
        You can obtain a programming guide by sending e-mail
        to chuck@vlsi.caltech.edu, or postal mail to:

        3.  DOMINO (U. Maryland, College Park)
       DOMINO is a message passing environment for parallel computation.
        See the Computer Science Dept. (U. Maryland)  tech report # TR-1648
        (April, 1986) by D. P. O'Leary, G. W. Stewart, and R. A. van de Geijn.

        4.  DPUP (U. of Colorado)
        DPUP stands for Distributed Processing Utilities Package.
        What follows is an abstract from a technical report
        written at the Computer Science Dept. at the University of Colorado
        by T. J. Garner, et. al

        "DPUP is a library of utilities that support distributed concurrent
        computing on a local area network of computers.
        The library is built upon the interprocess communication
        facilities in Berkeley Unix 4.2BSD."

        5.  TORiS (Toronto)
        TORis implements a shared memory communication model.
        Contact Orran Krieger at the University of Toronto for more information:

        UUCP:   {decvax,ihnp4,linus,utzoo,uw-beaver}!utcsri!eecg!okrieg
        ARPA:   okrieg%eecg.toronto.edu@relay.cs.net
        CSNET:  okrieg@eecg.toronto.edu
        CDNNET: okrieg@eecg.toronto.cdn

        6.  LINDA (Yale, Scientific Computing Associates)
        Linda is a parallel programming language for shared memory
        implementations.  It is simple and has only six operators.  C-linda
        has been implemented for a network of SUNs in the internet domain.

        With LAN-LINDA (also called TSnet) you can write parallel or
        distributed programs in C and run them on a network of workstations.
        TSnet has been tested on Sun and IBM RT workstations.

        Contact David Gelernter (project head) or Mauricio Arango  at:
        gelernter@cs.yale.edu
        arango@cs.yale.edu
       TSnet and other Linda systems are being distributed through
        Scientific Computing Associates.

        Contact

          Dennis Philbin
          Scientific Computing Associates
          246 Church St., Suite 307
          New Haven, CT 06510
          203-777-7442



        7.  SR (U. Arizona)
        "SR (Synchronizing Resources) is designed for writing distributed
        programs.  The main language constructs are resources and operations.
        Resources encapsulate processes and variables they share;
        operations provide the primary mechanism for process interaction.
        SR provides a novel integratiotion of the mechanisms for invoking
        and servicing operations.  Consequently, all of local and remote
        procedure call, rendezvous, message passing, dynamic process
        creation, multicast, and semaphores are supported.  An overview of the
        language and implementation appeared in the January, 1988, issue of
        TOPLAS (ACM Transactions on Programming Languages and Systems 10,1, 51-86).


        "SR is available by anonymous FTP from Arizona.EDU (128.196.128.118 or
        192.12.69.1).
        [Copy over the README file for an explanation.]

        You may reach the members of the SR project electronically at:
                uunet!Arizona!sr-project

        or by surface mail at:

        SR Project
         Department of Computer Science
         University of Arizona
         Tucson, AZ  85721
         (602) 621-2018


        8.  MAITRD (U.C. Berkeley/U. Wash)
        "The maitr'd software is remote process server that is designed to
        farm out cpu expensive jobs to less loaded machines.  It has a small
        amount of built-in intelligence, in that it attempts to send jobs to
        the least loaded machine of the set which is accepting off-site jobs."

        `Maitrd' is available via anonymous ftp from
        june.cs.washington.edu (128.95.1.4) as ~ftp/pub/Maitrd.tar.Z.
        There is also a heterogeneous systems rpc package `hrpc.tar.Z'.

        Contact Brian Bershad at U. Washington (brian@june.cs.washington.edu.)
        for more information.


        9.  PARMACS (Argonne)
        David Levine at Argonne National Laboratory tells us about a
        "generic package to do send/recv message passing" with
        "different versions (c, c++, fortran) [that] work on different machines."
        For more information, send email to netlib@mcs.anl.gov, with subject
        (or body) ``send index from parmacs.''
        For more information send email to
       levine@mcs.anl.gov or by uucp: {alliant,sequent,rogue}!anlams!levine.

------------------------------------------------------------------


10. 

From: Karen Tracey <kmt@pclsys48.pcl.nd.edu>

My implementation platform for this work is the ARCADE distributed
system.  If you are not familiar with ARCADE I can also give you
references on it.  The goal of the ARCADE project is to develop an
environment which allows programmers to easily build distributed
applications that consist of cooperating tasks which may run on
heterogeneous machines.  ARCADE contains a number of facilities
(data transfer & sharing, inter-task control & synchronization)
that simplify development of cooperative distributed applications.

 "ARCADE: A Platform for Heterogeneous Distributed
              Operating Systems", by David L. Cohn, William P. Delaney,
              and Karen M. Tracey, appeared in Proceedings of 1989
              USENIX Workship on Experiences with Distributed and
              Multiprocessor Systems, Fort Lauderdale, FL, October 1989.


-----------------------------------------------------------------

11.

From: David Taylor <ddt@ccwf.cc.utexas.edu>

I'm not sure about this, but I believe you may be interested in a system
called Amoeba (sp?).  I can't remember whether it's a true parallel-processing
environment or simply an OS that distributes processes on a network, but it's
probably worth looking into.

-------------------------------------------------------------------

12.

>From : roman@CCSF.Caltech.EDU (Roman Salvador)

We (ParaSoft Corporation) sell a system (parallel libraries, profilers,
debugger, semi-automatic data-distributer, automatic parallelizer, ...)
to do parallel processing on networks of workstations and most other
parallel computers.

-------------------------------------------------------------------

13. 

From: Joe Hummel <jhummel@ICS.UCI.EDU>

The Univ. of Washington has such a system, Amber.  It runs a single appl
in parallel on a network of shared-memory workstations (e.g. Dec Firefly).
See 12th ACM Syp on Operating Systems, Dec. '89, pp 1147-158.

Also, take a look at Munin, at system from Rice Univ.  They have it running
on a network of Suns.  See '90 ACM Sigplan PPoPP conference.

------------------------------------------------------------------


14.

We did some work at UCLA using PCs on an ethernet

Felderman, R.E., Schooler, E.M., Kleinrock L.,
"The Benevolent Bandit Laboratory: A Testbed for Distributed Algorithms",
IEEE Journal on Selected Areas in Communications, Vol. 7, No. 2, February 1989.


----------------------------------------------------------------------


15.

"J. Eric Townsend" <JET@uh.edu>

 I would suggest you look at CalTech's "Cosmic Environment",
which lets you write iPSC/2 code and run it on a network of workstations.
chuck@vlsi.caltech.edu is who you need to talk to.

---------------------------------------------------------------------

16.
From: Andy Newman <andy@research.canon.oz.au>

Get in touch with AT&T and get some information on their
Concurrent C product. They have a version that runs on
multiple machines in the network and others that run on
actual multi-processors. Its basically C with Hoare's (sp?)
CSP constructs added to it (a sort of super-Occam).

-----------------------------------------------------------------------

17.

From: paddy@daimi.aau.dk

About 2 years ago I was involved in a project which given an
Ada program with (logical) site annotations converted it into
a set of Ada programs which could be compiled using existing
compilers and the code run on separate machines
connected by an ethernet. There was a set of utilities which
invoked our convertor, followed by the Ada compiler/linker
etc.

-----------------------------------------------------------------------

18.

From: Paulo V Rocha <P.Rocha@cs.ucl.ac.uk>

The PYGMALION Programming Environment, a ESPRIT II project, uses a
multi-workstation environment (up to 3 workstations if I am not wrong)
to run neural network applications. It uses remote procedure calls to
communicate.

----------------------------------------------------------------------

19.

From: adamb@cs.utk.edu

We're currently working on a system called DagNet which will allow
the programmer to specify subroutines and their dependencies and then
have the subroutines scheduled around the internet.  In DagNet the
data distribution is precisely associated with the dependencies.  We
currently have a prototype working but the the system is still going
through changes.

I also developed a system called Phred which allows the visual
specification and analysis of parallel programs.  Currently I'm working
on designing an execution system which will actually execute Phred
programs over a network of machines.  Unfortunately the executioner is
still on paper at this point.  Phred does have interesting properties
for sharing data among parallel processes which might interest you.

------------------------------------------------------------------


20. 

From: eric@castle.edinburgh.ac.uk

Meiko (from Bristol UK -- don't have actual address but somebody
on the net must) sell a product called CS-Tools which will run
jobs over a network of Suns (SPARCstations and SLC etc) and
their own boxes.  Don't know how well it works.  (I use it every
day but I only run on Meiko's Computing Surface so can't verify
its  behaviour on the Suns.)

--------------------------------------------------------------------

21. 


From: Ozalp Babaoglu <ozalp@dm.unibo.it>


                Paralex:  An Environment for Parallel Programming
                        in Distributed Systems


One of the many advantages of distributed systems is their ability to
execute several computations on behalf of a single application in
parallel, thus improving performance.  In fact, at a certain level of
aabstraction, there is little difference between a distributed system
and a losely-coupled multiprocessor computer.  We cannot, however, treat
distributed systems as if they were uniform multiprocessor parallel
machines due to the following characteristics:

        o  High latency, low bandwidth communication

        o  Presence of heterogeneous processor architectures

        o  Communication link and processor failures

        o  Multiple independent administrative domains.

Thus, if we can address these issues, a distributed computing resource
such as a collection of workstations could be viewed and used as if it
were a poor man's ``super computer.''  To make a distributed system
suitable for long-running parallel computations, support must be
provided for fault tolerance.  Many hours of computation can be wasted
not only if there are hardware failures, but also if one of the
processors is turned off, rebooted or disconnected from the network.
Given that the components of the system (workstations) may be under
the control of several independent administrative domais (typically a
single individual who ``owns'' the workstation), these events are much
more plausible and frequent than real hardware failures.

----------------------------------------------------------------------


22.

From: eisen@cc.gatech.edu (Greg Eisenhauer)

I'm not sure if it's exactly what you're looking for, but you might look at my
old work on Distributed Ada.  The idea was that you took a single application
(program) and gave it to a compiler along with a specification of how you
wanted parts of it to be  distributed across a distributed memory architecture.
We did most of the development here on a network of Suns and eventually moved
to an Intel iPSC/2 Hypercube.



-----------------------------------------------------------------------


23. 

From: Ralph Noack <mdaeng!rwn@utacfd.uta.edu>

We've bought a package called Express by ParaSoft.

It runs on sun workstation networks and pc or macs with transputer
cards. It has two different modes of programming for parallelizing a task.
1) cubix: a single executable with appropriate calls to library
routines to: find id number, exchange data between nodes, etc.
I've written a small program with solves a simple pde  on multiple
nodes. So it works. I could not get the main application running do to
problems with their current preprocessor(it translates fortran write
statements to subr calls so all io plays through node 0). They say
a new version of the preprocessor will be released soon.
2) host + node executables. A single host task communicated/controls
multiple tasks running a different executable.

-----------------------------------------------------------------------

24.

from rwolski@lll-crg.llnl.gov (Richard Wolski)

About a week ago I posted a request for reference information on
partitioning and scheduling for heterogeneous systems.  I am pleased to say
the the response has been almost overwhelming.  While I haven't completely
digested everything I have received, here is a summary of what I think
is pertinent.

Dr. Shahid Bokhari at icase suggests:
-------------------------------------


    Author = "Shahid H. Bokhari",
    Year = "July 1979",
     Journal  = "IEEE Transactions on Software Engineering",
    Number = "5",
    Pages = "341-349",
    Title = "Dual processor scheduling with dynamic reassignment",
    Volume = "SE-5",
    Author = "Shahid H. Bokhari",
    Year = "November 1981",
     Journal  = "IEEE Transactions on Software Engineering",
    Number = "6",
    Pages = "583-589",
    Title = "A shortest tree algorithm for optimal assignments across space and time in
a distributed processor system",
    Volume = "SE-7",


    Author = "Shahid H. Bokhari",   *** RECOMMENDED ***
    Title = "Partitioning problems in parallel, pipelined and distributed computing",
     Journal  = "IEEE Transactions on Computers",
    Year = "January, 1988",
    Number ="1",
    Volume="C-37",
    Pages="48-57"
    Author = "Shahid H. Bokhari",
    Title = "Assignment problems in parallel and distributed computing",
     Publisher  = "Kluwer",
     Address = "Boston",
    Year = "1987"
    Author = "Patricia J. Carstensen",
    Title = "The Complexity of Some Problems in Parametric Linear and Combinatorial Programming",
    Year = "1983",
    Institution = "Department of Mathematics,
University of Michigan",
    Author = "K. W. Doty",
    Author = "P. L. McEntire",
    Author = "J. G. O'Reilly",
    Title = "Task allocation in a distributed
computer system",
     Journal  = "Proceedings of the IEEE Infocom 82",
    Pages = "33-38",
    Year = "1982",
    Author = "Dan Gusfield",
    Title = "Parametric combinatorial computing and a problem of program module distribution",
     Journal  = "Journal of the ACM",
    Volume = "30",
    Number = "3",
    Pages = "551-563",
    Year = "July 1983",

    Author = "Robert E. Larson",
    Author = "Paul E. McIntyre",
    Author = "John G. O'Reilly",
    Title = "Tutorial: Distributed Control",
    Publisher = "IEEE Computer Society Press",    Address = "Silver Spring, MD",
    Year = "1982",

    Author = "Virginia M. Lo",    **** RECOMMENDED ****
    Title = "Heuristic algorithms for task assignments in distributed systems",
     Journal  = "Proceedings of the 4th International Conference on Distributed Processing Systems",
    Pages = "30-39",
    Year = "May 1984",

    Author = "Janet Michel",
    Author = "Andries van Dam",
    Title = "Experience with distributed processing on a host/satellite system",
     Journal  = "Computer Graphics (SIGGRAPH Newsletter)",
    Volume = "10",
    Number = "2",
    Year = "1976",

    Author = "Camille C. Price",
    Author = "Udo W. Pooch",
    Title = "Search Techniques for a nonlinear multiprocessor scheduling problem",
     Journal  = "Naval Research Logistics Quarterly",
    Volume = "29",
    Number = "2",
    Pages = "213-233",
    Year = "June 1982",

    Author = "Gururaj S. Rao",
    Author = "Harold S. Stone",
    Author = "T. C. Hu",
    Title = "Assignment of tasks in a distributed processor system with limited memory",
     Journal  = "IEEE TC",
    Volume = "C-28",
    Number = "4",
    Pages = "291-299",
    Year = "April 1979",
 
    Author = "Harold S. Stone",    **** RECOMMENDED ****
    Title = "Multiprocessor scheduling with the aid of network flow algorithms",
     Journal  = "IEEE Transactions on Software Engineering",
    Volume = "SE-3",
    Number = "1",
    Pages = "85-93",
    Year = "January 1977",
    Author = "Harold S. Stone",
    Year = "1977",
   Number = "ECE-CS-77-7",
   Institution = "Department of Electrical & Computer Engineering, University of Massachusetts, Amherst",
    Title = "Program assignment in three-processor systems and tricutset partitioning of graphs"
    Author = "Harold S. Stone",
    Title = "Critical load factors in two-processor distributed systems",
     Journal  = "IEEE Transactions on Software Engineering",
    Volume = "SE-4",
    Number = "3",
    Pages = "254-258",
    Year = "May 1978",
 
    Author = "Donald F. Towsley",
    Title = "Allocating programs containing branches and loops within a multiple processor system",
     Journal  = "IEEE Transactions on Software Engineering",
    Volume = "SE-12",
    Pages = "1018-1024",
    Year = "October 1986",

    Author = "Andries van Dam",
    Author = "George M. Stabler",
    Author = "Richard J. Harrington",
    Title = "Intelligent satellites for interactive graphics",
     Journal  = "Proceedings of the IEEE",
    Volume = "62",
    Number = "4",
    Pages = "483-492",
    Year = "April 1974",
 

>From Alessandro Forin at CMU:

@article        ( IEEECOMP,                    
key     =       "Agora" ,
author  =       "Bisiani, R. and Forin, A." ,
title   =       "Multilanguage Parallel
Programming on Heterogeneous Systems" ,
journal =       "IEEE Transactions on Computers",
publisher=      "IEEE-CS" ,
month   =       "August" ,
year    =       "1988" ,
)

inproceedings  ( BISI87G,                      
key     =       "bisi87g" ,
author  =       "Bisiani,R. and Lecouat,F." ,
title   =       "A Planner for the Automatization of Programming
Environment Tasks" ,
booktitle=      "21st Hawaii International Conference on System
Sciences" ,
publisher=      "IEEE" ,
month   =       "January" ,
year    =       "1988" ,
bibdate =       "Fri Aug 28 09:44:54 1987" ,
)

@inproceedings  ( DBGWKSHP,
key     =       "Agora" ,
author  =       "Forin, Alessandro" ,
title   =       "Debugging of Heterogeneous Parallel Systems" ,
booktitle=      "Intl. Workshop on Parallel and Distributed Debugging",
publisher=      "SIGPLAN Notices, V24-1 Jan. 1989",
address =       "Madison, WI",
month   =       "May" ,
year    =       "1988" ,
pages   =       "130-141",
)

@techreport     ( ASMREPORT,
key     =       "Agora" ,
author  =       "R. Bisiani, F. Alleva, F. Correrini, A. Forin, F. Lecouat, R. L
erner",
title   =       "Heterogeneous Parallel Processing, The Agora
Shared Memory" ,
institution=    "Carnegie-Mellon University" ,
address =       "Comp. Science Dept." ,
type    =       "Tech. Report" ,
number  =       "CMU-CS-87-112" ,
month   =       "March" ,
year    =       "1987" ,
)


Dr Michael Coffin at Unoiversity of Waterloo suggests:
------------------------------------------------------

  AUTHOR =      "Michael H. Coffin",
  TITLE =       "Par: {A}n Approach to Architecture-Independent Parallel
                 Programming",
  SCHOOL =      "Department of Computer Science, The University of
                 Arizona",
  MONTH =       aug,
  YEAR =        "1990",
  ADDRESS =     "Tucson, Arizona"
}

Dr. David Skillicorn at Queens University suggests:
---------------------------------------------------

       TITLE = {The Purdue Dual {MACE} Operating System},
        INSTITUTION = {Purdue University},
        KEYWORDS = {Abell1},
        YEAR = {1978},
        MONTH = {NOV},
}

@ARTICLE{bib:002,
        AUTHOR = {Guy T. Almes and Andrew P. Black and Edward D.
                Lazowska and Jerre D. Noe},
        TITLE = {The Eden System: A Technical Review},
        JOURNAL = {IEEE Transactions on Software Engineering},
        PAGES = {43--59},
        KEYWORDS = {Almes1},
        YEAR = {1985},
        MONTH = {JAN},
}

@INPROCEEDINGS{bib:003,
        AUTHOR = {D.E. Bailey and J.E. Cuny},
        TITLE = {An Approach to Programming Process Interconnection
                Structures: Aggregate Rewriting Graph Grammars},
        BOOKTITLE = {Proceedings of PARLE '87 Parallel Architectures
                and Languages Europe, Volume II},
        PAGES = {112--123},
        ORGANIZATION = {Springer-Verlag, Lecture Notes in Computer
                Science},
        ADDRESS = {Eindhoven, The Netherlands},
        YEAR = {1987},
        MONTH = {June},
}

@ARTICLE{bib:004,
        AUTHOR = {A. Barak and A. Litman},
        TITLE = {{MOS}: a Multicomputer Distributed Operating System},
        JOURNAL = {Software: Practice and Experience},
        KEYWORDS = {Barak1},
        LENGTH = {725},
        YEAR = {1985},
        MONTH = {AUG},
}

@ARTICLE{bib:005,
        AUTHOR = {A. Barak and A. Shiloh},
        TITLE = {A Distributed Load Balancing Policy for a
                Multicomputer},
        JOURNAL = {Software: Practice and Experience},
        KEYWORDS = {Barak2},
        LENGTH = {901},
        YEAR = {1985},
        MONTH = {SEP},
}

@ARTICLE{bib:006,
        AUTHOR = {? Bartlett and et al},
        TITLE = {A NonStop Kernel},
        JOURNAL = {PROC of the 8th SOSP},
        KEYWORDS = {Bartle1},
        YEAR = {1981},
        MONTH = {OCT},
}

@ARTICLE{bib:007,
        AUTHOR = {M.J. Berger and S.H. Bokhari},
        TITLE = {A Partitioning Strategy for Nonuniform Problems on
                Multiprocessors},
        JOURNAL = {IEEE Transactions on Computers},
        VOLUME = {C-36, No.5},
        PAGES = {570--580},
        KEYWORDS = {rectangular partition with uniform workload},
        YEAR = {1987},
        MONTH = {May},
}

@INPROCEEDINGS{bib:008,
        AUTHOR = {Andrew P. Black},
        TITLE = {Supporting Distributed Applications: Experience with
                Eden},
        JOURNAL = {PROC of the 10th SOSP},
        KEYWORDS = {Black1},
        YEAR = {1985},
        MONTH = {DEC},
}

@ARTICLE{bib:011,			***** RECOMMENDED *****
        AUTHOR = {Shahid H. Bokhari},
        TITLE = {On the Mapping Problem},
        JOURNAL = {IEEE Transactions on Computers},
        VOLUME = {C-30},
        NUMBER = {3},
        PAGES = {207--214},
        KEYWORDS = {grecommended,},
        YEAR = {1981},
        MONTH = {March},
        ABSTRACT = {This paper is important because it points out that
                the mapping problem is akin to graph traversal and is at
                least P-complete. Also see ICPP79. Reproduced in the
                1984 tutorial: Interconnection Networks for
                parallel and distributed processing by Wu and
                Feng.},
}

@ARTICLE{bib:015,
        AUTHOR = {W.W. Chu and L.J. Holloway and M.T. Lan and K. Efe},
        TITLE = {Task Allocation in Distributed Data Processing},
        JOURNAL = {Computer},
        PAGES = {57--69},
        YEAR = {1980},
        MONTH = {November},
}

@INPROCEEDINGS{bib:018,
        AUTHOR = {J.G. Donnett and M. Starkey and D.B. Skillicorn},
        TITLE = {Effective Algorithms for Partitioning Distributed
                Programs},
        BOOKTITLE = {Proceedings of the Seventh Annual International
                Phoenix Conference on Computers and Communications},
        PAGES = {363--369},
        YEAR = {1988},
        MONTH = {March 16--18},
}

@MISC{bib:025,				**** RECOMMENDED ****
        AUTHOR = {D.A. Hornig},
        TITLE = {Automatic Partitioning and Scheduling on a Network of
                Personal Computers},
        INSTITUTION = {Carnegie Mellon University, Department of
                Computer Science,},
        YEAR = {1984},
        MONTH = {November},
        ABSTRACT = {This Ph.D thesis describes the development of a
                language Stardust in which indications are given of the
                running time of each function. The run-time evironment
                then schedules the functions based on the costs of
                message passing and load balancing. There is some
                discussion of granularity. The language contains no
                explicit partitioning.},
}

@ARTICLE{bib:027,
        AUTHOR = {P. Hudak and B. Goldberg},
        TITLE = {Distributed Execution of Functional Programs Using
                Serial Combinators},
        JOURNAL = {IEEE Transactions on Computers},
        VOLUME = {C34, No.10},
        PAGES = {881--891},
        YEAR = {1985},
        MONTH = {October},
}

@ARTICLE{bib:031,			**** RECOMMENDED ****
        AUTHOR = {F.C.H. Lin and R.M. Keller},
        TITLE = {The Gradient Model Load Balancing Method},
        JOURNAL = {IEEE Transactions on Software Engineering},
        VOLUME = {SE-13, No.1},
        PAGES = {32--38},
        YEAR = {1987},
        MONTH = {January},
}

@INPROCEEDINGS{bib:037,
        AUTHOR = {L.J. Miller},
        TITLE = {A Heterogeneous Multiprocessor Design and the
                Distributed Scheduling of its Task Group
                Workload},
        BOOKTITLE = {Proceedings of 9th Annual Symposium on Computer
                Architecture},
        PAGES = {283--290},
        YEAR = {1982},
        MONTH = {April},
}

@ARTICLE{bib:042,
        AUTHOR = {D.A. Padua and M.J. Wolfe},
        TITLE = {Advanced Compiler Optimizations for Supercomputers},
        JOURNAL = {Communications of the ACM},
        VOLUME = {29, No.12},
        PAGES = {1184--1201},
        YEAR = {1986},
        MONTH = {December},
}

@ARTICLE{bib:043,
        AUTHOR = {Michael L. Powell and Barton P. Miller},
        TITLE = {Process Migration in DEMOS/MP},
        JOURNAL = {PROC of the 9th SOSP},
        KEYWORDS = {Powell1},
        LENGTH = {110},
        YEAR = {1983},
        MONTH = {DEC},
}

@ARTICLE{bib:044,
        AUTHOR = {G.S. Rao and H.S. Stone and T.C. Hu},
        TITLE = {Assignment of Tasks in a Distributed Processor System
                with Limited Memory},
        JOURNAL = {IEEE Transactions on Computers},
        VOLUME = {C-28, No.4},
        PAGES = {291--299},
        YEAR = {1979},
        MONTH = {April},
}

@ARTICLE{bib:046,			**** RECOMMENDED ****
        AUTHOR = {C.-C Shen and W.-H. Tsai},
        TITLE = {A Graph Matching Approach to Optimal Task Assignment
                in Distributed Computing Systems Using a Minimax
                Criterion},
        JOURNAL = {IEEE Transactions on Computers},
        VOLUME = {C-34, No.3},
        PAGES = {197--203},
        YEAR = {1985},
        MONTH = {March},
}

@ARTICLE{bib:054,
        AUTHOR = {H. Widjaja},
        TITLE = {An Effective Structured Approach to Finding Optimal
                Partitions},
        JOURNAL = {Computing},
        VOLUME = {29, No.3},
        PAGES = {241--262},
        YEAR = {1982},
}

@INPROCEEDINGS{bib:055,
        AUTHOR = {E. Williams},
        TITLE = {Assigning Processes to Processors in Distributed
                Systems},
        BOOKTITLE = {Proceedings of 1983 International Conference on
                Parallel},
        PAGES = {404--406},
        YEAR = {1983},
        MONTH = {August},
}

@INPROCEEDINGS{bib:056,
        AUTHOR = {F. Ercal and J. Ramanujam and P. Sadayappan},
        TITLE = {Task Allocation onto a Hypercube by Recursive Mincut},
        BOOKTITLE = {Hypercube Conference},
        YEAR = {1988},
}

@article{,
        author = {J.-L. Gaudiot and J.I. Pi and M.L. Campbell},
        title = {Program Graph Allocation in Distributed Multicomputers},
        journal = {Parallel Computing},
        volume = {7},
        year = {1988},
        pages = {227 -- 247},
}



David Hudak at the University of Michigan writes:
-------------------------------------------------

"Performance Evaluation and Prediction for Parallel Algorithms on the
BBN GP1000", F. Bodin, D. Windheiser, W. Jalby, etc., ACM International
Conference on Supercomputing, 1990, pp. 401 - 413.

        "The Impact of Synchronization and Granularity on Parallel Systems",
Ding-Kai Chen, Hong-Men Su, and Pen-Chung Yew, International Symposium on
Computer Architecture, 1990, p. 239 - 248

        Also:  for interesting work on dynamic partitioning, check Polychronopou
los'
article, (IEEE Computer, '86 I think) on Guided Self-Scheduling

        Really, the guys you want to read about are:  Jalby, Polychronopoulos,
        Dennis Gannon, Sameh, Windheiser, and, of course, me.  (Oh, Reed
        had an IEEE paper '87 on stencils and program partitioning, and
        Vrsalovic had a good tech report from CMU.)

Bill Schilit at Columbia suggests:
----------------------------------

Parallel Processing: the Cm* experience, Edward F. Gehringer,
        et. al.  Digital Press


Dr. David Finkel at Worcester Polythecnic Institute writes:
-----------------------------------------------------------

"Evaluating Dynamic Load Sharing in Distributed Computer
Systems", Computer Systems: Science and Engineering
5 (1990), 89 - 94.

"Load Indices for Load Sharing in Heterogeneous Distributed Computing Systems",
with David Hatch,Proceedings of the 1990 UKSC Conference on
Computer Simulation, Brighton, 1990, 202 - 206.

Zbigniew Chamski (Zbigniew.Chamski@irisa.fr) suggests:
------------------------------------------------------

@string{IEEES   = "IEEE Software"}      **** RECOMMENDED ****
@string{ECEDOSU = "Electrical and Computer Engineering Department, Oregon State
 University"}

@article{
        KrLe88,
        author          = "Kruatrachue, B. and Lewis, T.",
        title           = "Grain Size Determination for Parallel Processing",
        journal         = IEEES,
        year            = 1988,
        volume          = 5,
        number          = 1,
        pages           = "23--32",
        month           = jan}


@phdthesis{				**** RECOMMENDED ****
        Krua87,
        author          = "Kruatrachue, B.",
        title           = "Static Task Scheduling and Grain Packing in
Parallel Processing Systems",
        school          = ECEDOSU,
        year            = 1987,
        address         = "{Corvallis, OR, USA}"}


@PhdThesis{ElRe89,
        author          = "El-Rewini, H.",
        title           = "Architecture-Independent Task Partitioning and
     Scheduling on Arbitrary Parallel Processing Systems",
        school          = "Department of Computer Science, Oregon State Universi
ty",
        year            = "1989",
        address         = "{Corvallis, OR, USA}",
        month           = nov}


I would also add the following recommendations:

McCreary, C., and Gill, H., "Automatic Determination of Grain Size for
Efficient Parallel Processing", CACM, September 1989, pp. 1073-1078.

Van Tilborg, A., Wittie, L., "Wave Scheduling -- Decentralized Scheduling
of Task Forces in Multicomputers", IEEE Transactions on Computers, 33:835-844,
September 1984.

Berman, F., "Why is Mapping Hard for Parallel Computers?", Proceedings of
the IEEE Parallel/Distributed Computing Networks Seminar, Jan. 31, 1990.

Sarkar, V., "Partitioning and Scheduling for Execution on Multiprocessors",
Ph.D. dissertation, Stanford Tech. Report No. CSL-TR-87-328, April 1987.