[comp.parallel] Automatic Data Distribution

cchase@uunet.UU.NET (Craig Chase) (01/30/91)

I am interested in the problem of automatically identifying a distribution 
of array data in programs for NUMA machines.  I've not seen a whole lot of 
work on this subject and I'd appreciate any information or pointers I can get.

I've seen papers by Li and Chen on aligning the distribution of arrays 
to minimize the communication, by Hudak and Abraham discussing "stencil 
sets" and how to best partition array data for a specific class of programs,
and by Balasundaram, Fox, Kennedy and Kremer that describes a tool that 
predicts the impact of a given data distribution on performance and allows
the programmer to make informed decisions about what distribution should be 
used.

In addition to pointers to other work that's been done to address this 
problem, I'd particularly like to know:

1) Exactly what is the "prize" here.  For some regular, 
highly-parallel algorithms, it just doesn't seem to matter how you 
distribute the data.  What sort of problems are most sensative to 
data distribution, and what kind of difference results between a 
naive distribution and an optimal one? (20%? 200%? 2000% improvement?)

2) Has any work been done with data distributions that are not partitionings
of the array.  For example, two processors' components of the array 
might overlap resulting in a number of elements being shared.  I'm 
most interested in this type of arrangement where the communication is
explicitly generated by the compiler (or programmer) (i.e. NOT Shared 
Virtual Memory, an elegant approach, but not what I'm interested in
looking at right not.)

3) What about dynamic repartitionings of the data.  For example, at 
procedure call boundaries, or even inside the body of a procedure
(I view this as being analogous to the "live-range splitting" for
register allocation in modern compilers where for one portion of the
code a particular allocation of variables to fast memory (registers)
is appropriate, but at other places in the same code, a different
binding is useful).  Is anyone working with code that does this sort
of thing? I can imagine some matrix operations where the array is 
traversed first by rows and then by columns where it might be appropriate 
to transpose the entire matrix in between.

I'd be more than happy to summarize any responses and repost if there
is sufficient interest.

Thanks

Craig Chase

-- 
-------------------------------------------------------------------------
| Programming a parallel computer is like trying to ride a camel with   |
| 128 humps.  You've got to be careful how you distribute your load, or |
| you'll wind up getting stuck in the end.                              |