kh@cs.glasgow.ac.uk (Kevin Hammond) (08/25/90)
Many thanks to those who responded to my question about throttling strategies for control of fine-grained parallelism. Here's a summary of the replies I received. The two principal sources mentioned are the MulT project at Yale, and the Flagship/EDS project at Manchester. Several people seem pessimistic about the utility of throttling/work stealing... Kevin -------------------------------- From: Randy Osborne <ran@crl.dec.com> Subject: Re: Throttling Strategies for Fine-Grained Parallelism You might want to look at the paper on Lazy Task Creation in MulT presented at the latest Lisp and Functional Programming Conference by Mohr, Kranz, and Halstead, Joe Weening has done some relevant analytical work. His work is mentioned and referenced in the paper by Mohr et al. -------------------------------- From: Rick Mohr <mohr-eric@cs.yale.edu> Subject: runtime control of task generation You might be interested in my paper with Halstead and Kranz in the latest Lisp & FP Conference Proceedings, "Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs". It is aimed at addressing exactly the issue you raise, i.e. runtime control of task generation. If you don't have access to the proceedings yet, you can get it (compressed postscript) by anonymous ftp to nebula.cs.yale.edu (128.36.13.1), file pub/ltc_lfp90.ps.Z. -------------------------------- From: L. V. Kale <kale@a.cs.uiuc.edu> Subject: Throttling strategies We have experimented with different strategies in connection with our parallel Prolog system and in our parallel-state space search work. These are provided in the chare kernel parallel programing system. Even the simple strategies tend to be pretty adaptive and good in controlling the number of tasks and memory usage, without affecting the parallelism much. (By simple strategy I meant a LIFO strategy for local "new-task" queues). Some literature from UK (Alice?) has indicated that Lifo is not right for iterative constructs, but I think that is wrong. You have to make sure that they are pushed in the right order. Better still, one can use bit-vector priorities to ensure the right order is followed. I can send you the relevant papers, if you want. The software is also available by anonymous ftp. (See recent note in comp.parallel). -------------------------------- From: Ian Foster <itf@antares.mcs.anl.gov> Subject: Throttling Send me your address and I will mail you a relevant paper. -------------------------------- From: David Skillicorn <skill@qucis.queensu.ca> Subject: Throttling etc. As well as the throttling work at Manchester, the group at the Royal Melbourne Institute of Technology have looked at this problem -- the guys running it are Greg Egan and David Abramson. They have really good data illustrating the problem and some data on solutions. But I'm inclined to think that throttling is the wrong sort of solution. The problem that comes up is that instructions on the critical path of a computation are needlessly held up by the other enabled instructions that are hanging around; because of the greedy scheduling that call-by-value seems to create. If one can get a handle at compile time on where the critical path is, and can collect single instructions into clusters, then it seems possible to put some work into explicitly scheduling or some approximation to it. This kind of idea is behind Nikhil's P-RISC (16 ISCA 1989) and the Monsoon machine, both from MIT. -------------------------------- From: Arch Robison <robison@shell.com> Subject: Throttles My Ph.D. thesis looks at throttling functional programs on MIMD machines in order to make efficient use of limited space. If you want a copy of the thesis, the best thing to do would be to contact the U. of Illinois C.S. Department's secretary in charge of tech. reports. Her e-mail address is 'erna@cs.uiuc.edu'. Optimistic Regulation of Concurrency Arch D. Robison University of Illinois at Urbana-Champaign, April 1990 Report UIUCDCS-R-90-1580 Abstract Unregulated concurrency in functional programs may lead to space demands that exceed available space, causing deadlock. This thesis proposes regulating concurrency optimistically with rollbacks. Excessive concurrency is viewed as a fault from which to recover. An optimistic regulator has two parts: a recovery-point manager and a process scheduler. This thesis presents a formal model that characterizes data flow and control flow within concurrent functional programs. The model guides the design of the recovery-point manager and process scheduler. The proposed regulator guarantees that concurrent execution of a program does not deadlock if the program is given space sufficient for sequential execution. The advantage of the optimistic regulation is that concurrency is tailored to fit available space instead of vice versa. Furthermore, the optimistic approach allows errors on the side of too much concurrency rather than too little, and may be able to obtain better speedup than pessimistic regulation. -------------------------------- From: Mark Greenberg <markg@m2.csc.ti.com> Subject: Re: Throttling Strategies for Fine-Grained Parallelism There are some references to the load balancing and throttling strategies that were developed for Flagship Machine at Manchester University: \bibitem{Sargeant87} Sargeant,J. ``Load Balancing, Locality and Parallelism Control in Fine-Grain Parallel Machines''. Internal Report UMCS-86-11-5. Department of Computer Science. University of Manchester. January 1987. \item Watson,I., Woods,V., Watson,P. Banach,R, Greenberg,M. and Sargeant,J. ``Flagship: A Parallel Architecture for Declarative Programming''. Proceedings of the 15th Annual Symposium on Computer Architecture Conference. Honolulu, Hawaii. May/June 1988. P124-129. \item Greenberg,M. ``An Investigation Into Architectures For A Parallel Reduction Machine''. Doctor of Philosophy Thesis. Department of Computer Science. Manchester University. January 1989. \item Greenberg,M., Woods,V. and Sargeant,J. ``Work Distribution And Scheduling In The Flagship Parallel Reduction Machine''. Proceedings of the Workshop on Architectural Support for Declarative Programming. Eilat, Israel. May 1989. \item Greenberg,M. and Woods,V. ``FLAGSHIP -- A Parallel Machine For Declarative Programming''. IEE Computing & Control Engineering Journal. Vol.1. No.2. March 1990. I experimented with work stealing for large grain rewriting (LAGER at MU under the EDS project) just before coming out to TI. I did write up the results of the experiments in the form of a group report. Work stealing has no hope of performing well in a system which relies on fine-grain parallelism. The overheads of sending out messages to request work, interrupt the destination processor to "steal" work, and waiting for the work to arrive will probably be greater than the time to perform the piece of work itself. You could do a better job though, if there was a threshold at which you decided to send out messages to request work, rather than wait until the processor runs out and then send out requests. -- This Signature Intentionally Left Blank. E-mail: kh@cs.glasgow.ac.uk