[comp.parallel] Throttling Strategies -- Summary of Replies

kh@cs.glasgow.ac.uk (Kevin Hammond) (08/25/90)

Many thanks to those who responded to my question about throttling
strategies for control of fine-grained parallelism.  Here's a summary
of the replies I received.

The two principal sources mentioned are the MulT project at Yale, and
the Flagship/EDS project at Manchester.  Several people seem
pessimistic about the utility of throttling/work stealing...

Kevin

--------------------------------

From: Randy Osborne <ran@crl.dec.com>
Subject: Re: Throttling Strategies for Fine-Grained Parallelism

You might want to look at the paper on Lazy Task Creation in MulT presented
at the latest Lisp and Functional Programming Conference by Mohr, Kranz, and
Halstead, Joe Weening has done some relevant analytical work. His work is
mentioned and referenced in the paper by Mohr et al.

--------------------------------

From: Rick Mohr <mohr-eric@cs.yale.edu>
Subject: runtime control of task generation

You might be interested in my paper with Halstead and Kranz in the
latest Lisp & FP Conference Proceedings, "Lazy Task Creation: A
Technique for Increasing the Granularity of Parallel Programs".  It is
aimed at addressing exactly the issue you raise, i.e.  runtime control
of task generation.  If you don't have access to the proceedings yet,
you can get it (compressed postscript) by anonymous ftp to
nebula.cs.yale.edu (128.36.13.1), file pub/ltc_lfp90.ps.Z.

--------------------------------

From: L. V. Kale <kale@a.cs.uiuc.edu>
Subject: Throttling strategies

We have experimented with different strategies in connection
with our parallel Prolog system and in our parallel-state space
search work. These are provided in the chare kernel parallel programing
system. Even the simple strategies tend to be pretty adaptive and good
in controlling the number of tasks and memory usage, without affecting
the parallelism much. (By simple strategy I meant a LIFO strategy 
for local "new-task" queues).

Some literature from UK (Alice?) has indicated that Lifo is not
right for iterative constructs, but I think that is wrong.
You have to make sure that they are pushed in the right order.
Better still, one can use bit-vector priorities to ensure
the right order is followed. 

I can send you the relevant papers, if you want. The software is also
available by anonymous ftp. (See recent note in comp.parallel).

--------------------------------

From: Ian Foster <itf@antares.mcs.anl.gov>
Subject: Throttling

Send me your address and I will mail you a relevant paper.

--------------------------------

From: David Skillicorn <skill@qucis.queensu.ca>
Subject: Throttling etc.

As well as the throttling work at Manchester, the group at the Royal
Melbourne Institute of Technology have looked at this problem -- the
guys running it are Greg Egan and David Abramson. They have really good
data illustrating the problem and some data on solutions.

But I'm inclined to think that throttling is the wrong sort of solution.
The problem that comes up is that instructions on the critical path of
a computation are needlessly held up by the other enabled instructions
that are hanging around; because of the greedy scheduling that
call-by-value seems to create. If one can get a handle at compile time
on where the critical path is, and can collect single instructions into
clusters, then it seems possible to put some work into explicitly
scheduling or some approximation to it. This kind of idea is behind
Nikhil's P-RISC (16 ISCA 1989) and the Monsoon machine, both from MIT.

--------------------------------

From: Arch Robison <robison@shell.com>
Subject: Throttles

My Ph.D. thesis looks at throttling functional programs on MIMD
machines in order to make efficient use of limited space.  If you want
a copy of the thesis, the best thing to do would be to contact the U.
of Illinois C.S. Department's secretary in charge of tech. reports.
Her e-mail address is 'erna@cs.uiuc.edu'.


			Optimistic Regulation of Concurrency
				 Arch D. Robison
		     University of Illinois at Urbana-Champaign, 
				   April 1990
			    Report UIUCDCS-R-90-1580

				    Abstract

	Unregulated concurrency in functional programs may lead to space 
	demands that exceed available space, causing deadlock.  This thesis 
	proposes regulating concurrency optimistically with rollbacks.  
	Excessive concurrency is viewed as a fault from which to recover.  
	An optimistic regulator has two parts: a recovery-point manager and 
	a process scheduler.  This thesis presents a formal model that 
	characterizes data flow and control flow within concurrent functional 
	programs.  The model guides the design of the recovery-point manager 
	and process scheduler.  The proposed regulator guarantees that 
	concurrent execution of a program does not deadlock if the program is 
	given space sufficient for sequential execution.
	
	The advantage of the optimistic regulation is that concurrency is 
	tailored to fit available space instead of vice versa.  Furthermore, 
	the optimistic approach allows errors on the side of too much 
	concurrency rather than too little, and may be able to obtain better 
	speedup than pessimistic regulation.
	
--------------------------------

From: Mark Greenberg <markg@m2.csc.ti.com>
Subject: Re: Throttling Strategies for Fine-Grained Parallelism

There are some references to the load balancing and throttling strategies that
were developed for Flagship Machine at Manchester University:


\bibitem{Sargeant87} Sargeant,J. 
``Load Balancing, Locality and Parallelism Control in Fine-Grain Parallel
Machines''. Internal Report UMCS-86-11-5. Department of Computer Science.
University of Manchester. January 1987.

\item Watson,I., Woods,V., Watson,P. Banach,R, Greenberg,M. and Sargeant,J.
``Flagship: A Parallel Architecture for Declarative Programming''.
Proceedings of the 15th Annual Symposium on Computer Architecture
Conference.  Honolulu, Hawaii. May/June 1988. P124-129.

\item Greenberg,M.
``An Investigation Into Architectures For A Parallel Reduction Machine''.
Doctor of Philosophy Thesis. Department of Computer Science. Manchester
University. January 1989.

\item Greenberg,M., Woods,V. and Sargeant,J.
``Work Distribution And Scheduling In The Flagship Parallel Reduction
Machine''. Proceedings of the Workshop on Architectural Support for
Declarative Programming. Eilat, Israel. May 1989.

\item Greenberg,M. and Woods,V.
``FLAGSHIP -- A Parallel Machine For Declarative Programming''. IEE
Computing & Control Engineering Journal. Vol.1. No.2. March 1990.


I experimented with work stealing for large grain rewriting (LAGER at MU
under the EDS project) just before coming out to TI. I did write up the
results of the experiments in the form of a group report.

Work stealing has no hope of performing well in a system which relies on
fine-grain parallelism. The overheads of sending out messages to request
work, interrupt the destination processor to "steal" work, and waiting
for the work to arrive will probably be greater than the time to perform
the piece of work itself. You could do a better job though, if there was
a threshold at which you decided to send out messages to request work,
rather than wait until the processor runs out and then send out
requests.
-- 
This Signature Intentionally Left Blank.	E-mail: kh@cs.glasgow.ac.uk