cchase@uunet.UU.NET (Craig Chase) (01/30/91)
I am interested in the problem of automatically identifying a distribution of array data in programs for NUMA machines. I've not seen a whole lot of work on this subject and I'd appreciate any information or pointers I can get. I've seen papers by Li and Chen on aligning the distribution of arrays to minimize the communication, by Hudak and Abraham discussing "stencil sets" and how to best partition array data for a specific class of programs, and by Balasundaram, Fox, Kennedy and Kremer that describes a tool that predicts the impact of a given data distribution on performance and allows the programmer to make informed decisions about what distribution should be used. In addition to pointers to other work that's been done to address this problem, I'd particularly like to know: 1) Exactly what is the "prize" here. For some regular, highly-parallel algorithms, it just doesn't seem to matter how you distribute the data. What sort of problems are most sensative to data distribution, and what kind of difference results between a naive distribution and an optimal one? (20%? 200%? 2000% improvement?) 2) Has any work been done with data distributions that are not partitionings of the array. For example, two processors' components of the array might overlap resulting in a number of elements being shared. I'm most interested in this type of arrangement where the communication is explicitly generated by the compiler (or programmer) (i.e. NOT Shared Virtual Memory, an elegant approach, but not what I'm interested in looking at right not.) 3) What about dynamic repartitionings of the data. For example, at procedure call boundaries, or even inside the body of a procedure (I view this as being analogous to the "live-range splitting" for register allocation in modern compilers where for one portion of the code a particular allocation of variables to fast memory (registers) is appropriate, but at other places in the same code, a different binding is useful). Is anyone working with code that does this sort of thing? I can imagine some matrix operations where the array is traversed first by rows and then by columns where it might be appropriate to transpose the entire matrix in between. I'd be more than happy to summarize any responses and repost if there is sufficient interest. Thanks Craig Chase -- ------------------------------------------------------------------------- | Programming a parallel computer is like trying to ride a camel with | | 128 humps. You've got to be careful how you distribute your load, or | | you'll wind up getting stuck in the end. |