carl@lemon.lcs.mit.edu (Carl Waldspurger) (07/03/90)
Here is a summary of the responses to my request for difficult scheduling problems. Many thanks to all those who responded. - Carl My original post: > Hi. I'm interested in techniques for managing computational > resources in parallel systems, such as load-balancing policies > and user-level scheduling mechanisms. > > I would like to start amassing a collection of parallel > programs that are difficult to schedule; i.e. programs that > perform poorly using conventional scheduling techniques, and > programs that require sophisticated partitioning and scheduling > mechanisms in order to perform well. > > If you have any parallel programs that meet this description, > or can provide me with pointers into relevant literature, I would > like to hear from you. > > Please send e-mail to carl@lemon.lcs.mit.edu; I will post a > follow-up summary of any information that I receive. *** Response #1 From: Gary L Dare <gld@cunixd.cc.columbia.edu> Subject: Re: Difficult Scheduling Examples Wanted Check out any literature on parallel circuit simulators. Highly irregular, quite dependant on topology of a specific design. A real nasty for any regular, parallel machine structure or scheduler. gld ~~~~~~~~~~~~~~~~~~~~~~~ Je me souviens ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gary L. Dare "An independant Quebec > gld@cunixD.cc.columbia.EDU in a strong Canada!" > gld@cunixc.BITNET - Yvon Deschamps, comedian *** Response #2 From: reiher@onyx.jpl.nasa.gov (Peter Reiher) Subject: Re: Difficult Scheduling Examples Wanted Are you familiar with the concepts of virtual time and Time Warp? These terms refer to an unusual form of synchronization based entirely on rollback, with no blocking. So far, the method has been used with great success in parallelizing discrete event simulations. Other synchronization methods have not been very successful in dealing with this sort of program. If you need them, I can get you many references on the Time Warp Operating System, developed at JPL. Peter Reiher reiher@onyx.jpl.nasa.gov . . . cit-vax!elroy!jato!jade!reiher ** Response #3 From: Richard A. Huff <rahuff@hplabsz.hpl.hp.com> Subject: Scheduling I'm not sure if this is the type of problem you're looking for, (since "scheduling" is a little ambiguous), but try this: Given a simple triply nested matrix multiply loop nest, how can we get the compiler to determine an appropriate blocking strategy to minimize the cache<-->memory traffic? The amount of blocking should depend on the floating point add/multiply latencies and the number of available fp registers. This might not appear to be a scheduling problem, but do note that a) block matrix methods are REQUIRED for decent performance on CPU's with caches b) the blocking and scheduling algorithms needs to be tightly coupled In the end, we'll probably just write some runtime libraries in hand coded assembly, but it would be nice to get the compiler to do some of this automatically. Enjoy, Richard Huff rahuff@hplabsz.hpl.hp.com ** Response #4 From: schwan%wayward@gatech.edu (Karsten Schwan) Subject: difficult parallel programs Here is one reference of my own...basically from the domain of optimization methods...what they have in common are dynamically decomposed domains....Karsten @inproceedings{Sch88b, Key="Schwan and Blake", Author="Karsten Schwan and John Gawkowski and Ben Blake", Title="Process and Workload Migration for a Parallel Branch-And-Bound Algorithm on a Hypercube Multicomputer", Booktitle="Third Conference on Hypercube Concurrent Computers and Applications, Pasadena, CA", Organization="ACM, Jet Propulsion Laboratories", Month="Jan.", year=1988, pages="1520-1530"} @article{Sch89a, Key="Schwan et al.", Author="Karsten Schwan and Ben Blake and Win Bo and John Gawkowski", Title="Global Data and Control in Multicomputers: Operating System Primitives and Experimentation with a Parallel Branch-and-Bound Algorithm", Journal="Concurrency: Practice and Experience, Wiley and Sons", Pages="191-218", Year=1989, Month="Dec."} ** Response #5 From: Lynn Sutherland <lynn@noah.arc.ab.ca> Subject: Unbalanced Programs I am working on a Master's thesis that compares a number of load balancing strategies. Instead of using real programs, I decided to generate sample programs that exhibit varying amounts of load imbalance. The programs that I generate are basically trees with different average branching factors and amount of "skewedness" and density. I am not sure how close these trees model real programs, but can argue empirically that they are similar to typical chess or production system programs. If you get a list of "typical unbalanced programs" I would appreciate a copy. Just so I know what kinds of programs my system is missing. Lynn Sutherland Alberta Research Council lynn@noah.arc.ab.ca ** Response #6 From: george%avocet@cs.utah.edu (Lal George) Subject: Re: Difficult Scheduling Examples Wanted I have submitted a paper to ICPP related to the management of computational resources. I enclose the abstract in case this should be of interest to you. \begin{center} \bf Abstract \end{center} {\sl An efficient scheduling strategy for shared memory multiprocessors is described. The rapid dissemination of tasks to available processors and ready queues is crucial to the performance of any parallel system. Such overheads determine the attainable speedup and performance of the system. Poor techniques used to address this can lead to severe degradation in performance particularly with high processor counts. This work has been conducted in the context of a parallel functional language - CoF, where the parallelism is usually fine grained and the efficient assignment of tasks to processors even more important. In such systems, observing strict queue semantics (i.e., FIFO) is not essential. This allows for very efficient algorithms such as that described here. On the BBN GP1000, our technique was superior in performance to the centralized queue and has the potential of performing well on a fully configured GP1000. } ** End of Responses