overeind@uva.UUCP (B. J. Overeinder (I84)) (05/03/89)
I am searching for articles which handle about compile time load-balancing. We are designing an object oriented language, with a static number of objects, i.e. the number of objects is known at compile time. The problem is, how to distribute these objects on a grid of processors, with an optimum in parallel execution, but the communication between objects have to been considered also. It's an option to place heavy communicating objects on the same node (processor) in order to reduce network delay, etc. I'm already searching a month or so, for articles about this subject, but sofar nothing found. So EVERY reference wich touches or deals with this subject is welcome to me. PLEASE, PLEASE e-mail me with your contributions, and I will post a compilation of the references in the news groups eventualy. Thanks in advance, Benno.
nusbaum@ucsd.edu (R. James Nusbaum) (05/05/89)
Specifically this is an answer to <5392@hubcap.clemson.edu> but it also applies to mapping/scheduling in general. >I am searching for articles which handle about compile time >load-balancing. We are designing an object oriented language, with a >static number of objects, i.e. the number of objects is known at >compile time. The problem is, how to distribute these objects on a >grid of processors, with an optimum in parallel execution, but the >communication between objects have to been considered also. It's an >option to place heavy communicating objects on the same node >(processor) in order to reduce network delay, etc. >I'm already >searching a month or so, for articles about this subject, but sofar >nothing found. So EVERY reference which touches or deals with this If you haven't found anything yet you must be looking in the wrong place. Process/Task/Object allocation to multiprocessor networks is basically a graph-theoretic problem. It involves dividing a graph such that some weighting function is minimized. It can usually be classified as a graph separator problem or a graph embedding problem. Graph separator algorithms are usually good enough if you can assume equal size computational units, a fully connected (or a reasonable simulation thereof) multiprocessor network and can ignore the communication delays of the network. Examples of this might be a BBN Butterfly, which simulates a fully connected network with a global address space. In fact the Butterfly is perfect for your application. When using a graph separator algorithm for this kind of task you usually ignore the hardware platform architecture except for the number of processors. A graph separator algorithm divides the graph into a certain number of clusters of a certain size by cutting as few vertices as possible of the graph (these are also called graph clustering algorithms). This is in general a NP-Hard problem. If the graph has certain limiting characteristics then there are polynomial time algorithms that will do the job. If the graph is truly general then heuristic algorithms are best. A graph embedding algorithm lets you consider the processor architecture more in dividing the graph. A graph embedding takes one graph and embeds it in another by placing vertices from the first graph into the second such that the arcs of the first graph lay along the arcs of the second. The trick is finding the embedding that distributes that minimizes your weighting function. An obvious degenerate case is to embed all of the first graph into one vertex of the second graph. Of course this problem is NP-Hard also. Anyway, so whose been trying to solve these problems? Well multi/parallel processing people like you and me. Also just pure theory math people, vlsi layout tool designers and distributed database people. The vlsi people often have to layout a circuit, represented by a graph, onto multiple chips. A good separator algorithm can help divide the circuit up correctly. The database people need to know how to distribute data such that the average time to access any data item is minimized. The good thing is that you have a rather restricted graph, known at compile time. Lets say though that you wanted to have a general algorithm that would map any program graph unto a heterogeneous processor network. You need a graph embedding algorithm (heuristic) that would minimize the computational load on all processors and minimize the communication delay between any two communicating computational units. This is a complex weighting function to calculate and the best heuristic algorithms are still O(ne) where n is the number of vertices and e is the number of arcs(edges). As you can see this is a hot topic for me. Need any collaborators? I might be willing to write the mapping software for you. I can give you more specific pointers, but check out 'Graph algorithms and NP-Completeness' by Kurt Melhorn, volume 2 of EATCS Monographs on Theoretical Computer Science. Also 'An Efficient Heuristic Cluster Algorithm for Tearing Large-Scale Networks' by A. Sangiovanni-Vincentelli, L. Chen and L. Chua in IEEE Transactions on Circuits and Systems. One thing that looking into this topic will show you is how much graph theory underlies so many problems in CS. Jim Nusbaum LLUMC Radiation Research Laboratory (714) 824-4077 proton!nusbaum@ucrmath.ucr.edu