[comp.parallel] compile time load-balancing/task-assignment

overeind@uva.UUCP (B. J. Overeinder (I84)) (05/03/89)

I am searching for articles which handle about compile time load-balancing.
We are designing an object oriented language, with a static number of objects,
i.e. the number of objects is known at compile time. The problem is, how to
distribute these objects on a grid of processors, with an optimum in parallel
execution, but the communication between objects have to been considered also.
It's an option to place heavy communicating objects on the same node (processor)
in order to reduce network delay, etc.

I'm already searching a month or so, for articles about this subject, but sofar
nothing found. So EVERY reference wich touches or deals with this subject is 
welcome to me. PLEASE, PLEASE e-mail me with your contributions, and I will
post a compilation of the references in the news groups eventualy.


				Thanks in advance,

					Benno.

nusbaum@ucsd.edu (R. James Nusbaum) (05/05/89)

Specifically this is an answer to <5392@hubcap.clemson.edu> but it
also applies to mapping/scheduling in general.

>I am searching for articles which handle about compile time
>load-balancing.  We are designing an object oriented language, with a
>static number of objects, i.e. the number of objects is known at
>compile time. The problem is, how to distribute these objects on a
>grid of processors, with an optimum in parallel execution, but the
>communication between objects have to been considered also.  It's an
>option to place heavy communicating objects on the same node
>(processor) in order to reduce network delay, etc.   

>I'm already
>searching a month or so, for articles about this subject, but sofar
>nothing found. So EVERY reference which touches or deals with this


If you haven't found anything yet you must be looking in the wrong
place.  Process/Task/Object allocation to multiprocessor networks is
basically a graph-theoretic problem.  It involves dividing a graph
such that some weighting function is minimized. It can usually be
classified as a graph separator problem or a graph embedding problem.
Graph separator algorithms are usually good enough if you can assume
equal size computational units, a fully connected (or a reasonable
simulation thereof) multiprocessor network and can ignore the
communication delays of the network.  Examples of this might be a BBN
Butterfly, which simulates a fully connected network with a global
address space.  In fact the Butterfly is perfect for your application.
When using a graph separator algorithm for this kind of task you
usually ignore the hardware platform architecture except for the
number of processors.

A graph separator algorithm divides the graph into a certain number of
clusters of a certain size by cutting as few vertices as possible of
the graph  (these are also called graph clustering algorithms).  This
is in general a NP-Hard problem.  If the graph has certain limiting
characteristics then there are polynomial time algorithms that will do
the job.  If the graph is truly general then heuristic algorithms are
best. 

A graph embedding algorithm lets you consider the processor
architecture more in dividing the graph.  A graph embedding takes one
graph and embeds it in another by placing vertices from the first
graph into the second such that the arcs of the first graph lay along
the arcs of the second.  The trick is finding the embedding that
distributes that minimizes your weighting function.  An obvious
degenerate case is to embed all of the first graph into one vertex of
the second graph.  Of course this problem is NP-Hard also.

Anyway, so whose been trying to solve these problems?  Well
multi/parallel processing people like you and me.  Also just pure
theory math people, vlsi layout tool designers and distributed
database people.  The vlsi people often have to layout a circuit,
represented by a graph, onto multiple chips.  A good separator
algorithm can help divide the circuit up correctly.  The database
people need to know how to distribute data such that the average time
to access any data item is minimized.

The good thing is that you have a rather restricted graph, known at
compile time.  Lets say though that you wanted to have a general
algorithm that would map any program graph unto a heterogeneous
processor network.  You need a graph embedding algorithm (heuristic)
that would minimize the computational load on all processors and minimize
the communication delay between any two communicating computational
units.  This is a complex weighting function to calculate and the best
heuristic algorithms are still O(ne) where n is the number of vertices
and e is the number of arcs(edges).

As you can see this is a hot topic for me.  Need any collaborators?  I
might be willing to write the mapping software for you.  I can give
you more specific pointers, but check out 'Graph algorithms and
NP-Completeness' by Kurt Melhorn, volume 2 of EATCS Monographs on
Theoretical Computer Science.  Also 'An Efficient Heuristic Cluster
Algorithm for Tearing Large-Scale Networks' by A.
Sangiovanni-Vincentelli, L. Chen and L. Chua in IEEE Transactions on
Circuits and Systems.  

One thing that looking into this topic will show you is how much graph
theory underlies so many problems in CS.


Jim Nusbaum
LLUMC Radiation Research Laboratory
(714) 824-4077
proton!nusbaum@ucrmath.ucr.edu