[comp.os.research] Responses to LoadBalancing/Migration request

kenchew@cs.utexas.edu (Khien Mien Kennedy Chew) (09/28/90)
	Many moons ago, I asked for information about load balancing
and process migration.  I received a couple of responses but due to
my oversight, I failed to post the responses for interested parties.
I apologise for this late posting.  Anyway, here are the responses:

	Enjoy,
	Kennedy Chew

-------------------------------------------------------------------------
Responses to call for information on load balancing and migration
-------------------------------------------------------------------------

We're doing some work at JPL on load balancing for the Time Warp Operating
System.  Chances are that you wouldn't want to use the same ideas, as they
are rather specific to some unusual things that system does, but it might be
worth looking at.  I had a paper on it published in the proceedings of the
1990 conference on Distributed Simulation.  (It was just last week.)

>[ Check into Locus from UCLA and Sprite from Berkeley.  --DL	]
>[ 								]
>[ @INPROCEEDINGS(Wa:sosp:83,					]
>[ AUTHOR  	= "Bruce Walker and others",			]
>[ TITLE   	= "The {LOCUS} Distributed Operating System",	]
>[ BOOKTITLE     = "Proceedings of the $9^{\rm th}$ " # SOSP,	]
>[ MONTH   	= NOV,						]
>[ YEAR    	= 1983,						]
>[ PAGES   	= "49-70"					]
>[ )								]

I can tell you a little about Locus' migration.  It worked, it was a full
implementation, it allowed forks and execs across network boundaries, and
it didn't limit what a migratable process could do.  (It was allowed full
UNIX semantics.)  We never did a load management system using it, though, so
it was totally under user control.

I'd appreciate it if you would summarize your responses for the net.
-- 
			Peter Reiher
			reiher@onyx.jpl.nasa.gov
			. . . cit-vax!elroy!jato!jade!reiher

-------------------------------------------------------------------------

see also "Eager/Lazowska/Zahorjan" in the 1988 sigmetrics
see also "Powell" in the 1983 sigops
see also "Theimer" in the 1985 sigops
see also "Zayas" in the 1987 sigops
see also "Cheriton" in the march, 1988 CACM
see also "Smith" in OSR July, 1988

>[ Check into Locus from UCLA and Sprite from Berkeley.  --DL	]
>[ 								]
>[ @ARTICLE(Ou:computer:83,					]
>[ AUTHOR	= "John Ousterhout and others",			]
>[ TITLE		= "The {S}prite Network Operating System",	]
>[ JOURNAL	= "IEEE Computer",				]
>[ YEAR		= 1988,						]
>[ VOLUME	= 21,						]
>[ NUMBER	= 2,						]
>[ PAGES		= "23-36",					]
>[ MONTH		= FEB						]
>[ )								]
>
charlie shub  cdash@boulder.Colorado.EDU  -or-  ..!{ncar|nbires}!boulder!cdash
  or even     cdash@colospgs (BITNET)     -or-  (719) 593-3492
-------------------------------------------------------------------------
>From vev2j@polka-dot.cs.virginia.edu Tue Jan 23 14:40:41 1990

	I'm currently writing my thesis: "A Distributed Scheduler for MENTAT".
In my research, I compiled some literature that I think will help you in the
policies aspect. I basically implemented the scheduler for MENTAT with the
following characteristics:

	Load-sharing
	Adaptive
	Sender-Initiated
	parameterizable
	static assignment
	stable.

	The scheduling decision was divided in two parts:

		can be executed locally? and
		if not, where to send the objects for execution?

	Beginning on March 1990, we'll start doing the analysis of the
algorithm vs. the sequential time. At this moment, we don't have literarture
about our scheduler.

	Here are the references:

[Baum89]  	K. M. Baumgartner and B. W. Wah, ``GAMMON: A Load Balancing
Strategy for Local Computer Systems with Multiaccess Networks'', \fI IEEE
Transactions on Computers\fR, vol. 38, pp. 1098-1109, August 1989.

[Boel89]  	R. K. Boel and J. H. Van Schuppen, ``Distributed Routing for
Load Balancing'', \fI Proceedings of the IEEE\fR, vol. 77, pp. 210-221, January
1989.

[Brya81]	R. M. Bryant and R. A. Finkel, ``A Stable Distributed
Scheduling Algorithm'', \fI The 2nd International Conference on Distributed
Computing Systems\fR, Paris, France, April 1981.

[Casa88]	T. L. Casavant and J. G. Kuhl, ``A Taxonomy of Scheduling in
General-Purpose Distributed Computing Systems'', \fI IEEE Transactions on
Software Engineering\fR, vol. 14, pp. 141-154, February 1988.

[Diks89]	P. Dikshit, S. K. Tripathi and P. Jalote, ``SAHAYOG: A Test Bed
for Evaluating Dynamic Load-Sharing Policies'', \fI Software-Practice and
Experience\fR, vol. 19, pp. 411-435, May 1989.

[Eage86a]  	D. L. Eager, E. D. Lazowska and J. Zahorjan, ``A Comparison of
Receiver-Initiated and Sender-Initiated Adaptive Load Sharing'', \fI Performance
Evaluation\fR, vol. 6, pp. 53-68, March 1986.

[Eage86b]  	D. L. Eager, E. D. Lazowska and J. Zahorjan, ``Adaptive Load
Sharing in Homogeneous Distributed Systems'', \fI IEEE Transactions on Software
Engineering\fR, vol. 12, pp. 662-675, May 1986.

[Eage88]  	D. L. Eager, E. D. Lazowska and J. Zahorjan, ``The Limited
Performance Benefits of Migrating Active Processes for Load Sharing'', \fI 
Performance Evaluation Review, ACM\fR, vol. 16, pp. 63-72, May 1988.

[Eage89]  	D. L. Eager, E. D. Lazowska and J. Zahorjan, ``Speedup Versus
Efficiency in Parallel Systems'', \fI IEEE Transactions on Computers\fR, vol.
38, pp. 408-423, March 1989.

[Grim87]	A. S. Grimshaw and J. W. Liu, ``MENTAT: An Object-Oriented Data-
Flow System'', \fI Proceedings of the 1987 Object-Oriented Programming Systems,
Languages and Applications Conference, ACM\fR, pp. 35-47, October 1987.

[Grim88a]	A. S. Grimshaw, ``MENTAT: An Object-Oriented Macro Dataflow
System'', report no. UIUCDCS-R-88-1440, University of Illinois at
Urbana-Champaign, June 1988.

[Grim88b]	A. S. Grimshaw and J. W. Liu,``The Mentat Programming Language
and Architecture'', \fIProceedings Workshop on Future Trend of Distributed
Computing Systems\fR, Hong Kong, September 1988.

[Hac89]		A. Hac, ``Load Balancing in Distributed Systems: A Summary'', \fI Performance Evaluation Review, ACM\fR, vol. 16, pp. 17-25, February 1989.

[Hsu86]		C. H. Hsu and J. W. Liu, ``Dynamic Load Balancing in
Distributed Systems'',  \fI The 6th International Conference on Distributed
Computing Systems\fR, Cambridge, MA, May 1986.

[Liu86a]	J. W. Liu and A. S. Grimshaw, ``A Distributed System
Architecture Based on Macro Data Flow Model'', \fI Proceedings Workshop on
Future Directions in Architecture and Software\fR, South Carolina, May 1986.

[Liu86b]	J. W. Liu and A. S. Grimshaw, ``An Object-Oriented Macro Data
Flow Architecture'', \fI Proceedings of the 1986 National Communications
Forum\fR, September 1986.

[Lo84]		V. M. Lo, ``Heuristic Algorithms for Task Assignment in
Distributed Systems '', \fI The 4th International Conference on Distributed
Computing Systems\fR, San Fransisco, California, May 1984.

[Shen88]  	S. Shen, ``Cooperative Distributed Dynamic Load Balancing'', \fI Acta Informatica\fR, vol. 25, pp. 663-676, June 1988.

[Shin89]  	K. G. Shin and Y.-C. Chang, ``Load Sharing in Distributed
Real-Time Systems with State-Change Broadcasts'', \fI IEEE Transactions on
Computers\fR, vol. 38, pp. 1124-1142, August 1989.

[Shu89]		W. Shu and L. V. Kale, ``Dynamic Scheduling of Medium-Grained
Processes on Multicomputers'', \fI Technical Report\fR, report no.
UIUCDCS-R-89-1528, University of Illinois at Urbana-Champaign, July 1989.

[Stan84]	J. A. Stankovic and I. S. Sidhu, ``An Adaptive Bidding
Algorithm for Processes, Cluster and Distributed Groups'', \fI The 4th
International Conference on Distributed Computing Systems\fR, San Fransisco,
California, May 1984.

[Stan85] 	J. A. Stankovic, K. Ramamritham and W. H. Kohler, ``A Review of
Current Research and Critical Issues in Distributed System Software'', \fI IEEE
Distributed Processing Technical Committee NEWSLETER\fR, vol. 7, pp. 14-47,
March 1985.

[Tane85]	A. S. Tanenbaum and R. Van Renesse, ``Distributed Operating
Systems'', \fI Computing Surveys\fR, vol. 17, pp. 419-470, December 1985.

[Tant85]	A. N. Tantawi and D. Towsley, ``Optimal Static Load Balancing
in Distributed Computer Systems'', \fI Journal of the ACM\fR, vol. 32,
pp. 445-465, April 1985.


			Virgil.

-------------------------------------------------------------------------
>From beers@tcgould.TN.CORNELL.EDU Wed Jan 24 06:29:08 1990

Marc Willebeck-Lemair presented a paper on this at a receint SIAM 
supercomputer conference in Chicago, in Dec, 89.  I have an address
and phone number for Marc, but not an e-mail address.  His address is
401 Phillip
Cornell University
Ithaca, NY  14853

and his phone there is 607-255-7558.

The work he is doing involves balancing jobs on a loosely coupled system,
and migrating these jobs, when there is an imbalance.


Jim Beers					beers@tcgould.tn.cornell.edu
------------------------------------------------------------------------

>From douglis@sprite.Berkeley.EDU Wed Jan 24 11:43:54 1990

btw, i also included another paper that i wrote around the same time
and which appeared in a workshop last fall:

@ARTICLE{douglis:mig-experience,
	AUTHOR = {F. Douglis},
	TITLE = {Experience with Process Migration in {S}prite},
	JOURNAL = {Workshop on Experience with Building Distributed (and Multiprocessor) Systems},
	YEAR = {1989},
	MONTH = OCT,
}
------------------------------------------------------------------------

>From ravi@iag.hp.com Wed Jan 24 13:37:00 1990

You may be interested in my work on process migration - particularly the
mechanics issues related to it. The particular problem that I have worked
on is how to route messages to moving processes in a large distributed
system.

Please check:

UCLA Computer Science Dept. Technical Report
August 1989
CSD-890049

ravi
------------------------------------------------------------------------
Sylvain R.Y. Louboutin

Department of Computer Science         Telephone: (+353-1) 69-32-44 ext. 24-88
University College Dublin              E-mail: S_LOUBOUTIN@CS.UCD.IE
Belfield, Dublin 4                          or LOUBOUTN@CCVAX.UCD.IE
Ireland                                     or SLOUBH91@IRLEARN.BITNET

(*) "A Survey of Process Migration Mechanisms"
    Jonathan M. Smith
    Computer Science Department, Columbia University, New York, NY 10027
    in ACM Operating System Review, Vol 22 No 3, July 1988

------------------------------------------------------------------------
I am working on a project which may be related to what you want to do
with migrating work on a local area network.  Here is a copy of
the announcement I have posted in several newsgroups earlier.

-- mike


Subject: Condor - Run UNIX jobs on idle workstations (available via ftp/uucp)

Brief Description:
	Condor is a facility for executing UNIX jobs on a pool of
	cooperating workstations.  Jobs are queued and executed
	remotely on workstations at times when those workstations would
	otherwise be idle.  A transparent checkpointing mechanism is
	provided, and jobs migrate from workstation to workstation
	without user intervention.  When the jobs complete, users are
	notified by mail.  No source code changes are required for use
	of Condor, but executables must be specially linked.  Condor is
	especially suited for execution of long running, compute bound
	jobs.

Limitations:
	There are several restrictions on the type of program which can
	be executed by the condor facility.  Only single process jobs
	are supported, and signals and IPC calls are not implemented.

Copyright:
	Condor is copyrighted, but available without any charge or
	license agreement.  The copyright is not very restricitve, but
	requires reproduction of the copyright, and disclaims the
	University of Wisconsin from any responsibility for problems
	connected with condor.

Currently supported architectures and UNIX variants:
	VAX                     BSD 4.3
	VAX                     Ultrix 3.0 and above
	SPARC                   SunOS4.0 and above
	I386                    Dynix
	MC68020                 SunOS3.2 (and probably 4.0 and above)
	DECstation 3100         Ultrix 3.0 and above

How to get it:
	Ftp to "shorty.cs.wisc.edu" and login as "ftp".  Give anything
	except the null string as a password.  Fetch the "README"
	file.  Switch your ftp user program to "binary" mode.  Fetch
	"Condor_4.0.0.tar.Z".  The README file tells what do to from
	there.

	The software is also available via anonymous ftp from uunet.uu.net
	in "/networking", and via anonymous uucp from osu-cis in directory
	"/pub".

Documentation:
	Source for all the documentation is included in the
	distribution.  The README file explains how to extract just the
	source, which you should do first, so you can plan where to
	extract everything else.  Some of the documents contain
	"gremlin" pictures, but the ditroff ready versions are also
	included for those who don't have gremlin.  If you can't print
	the docs, send me your address, and I'll mail you a set.


Write or call if you have any questions or problems.
	email: mike@cs.wisc.edu or ucbvax!uwvax!mike
	phone: (608) 262-6122

Mike Litzkow
------------------------------------------------------------------------

@INPROCEEDINGS{usenix:86,
	TITLE = "MOS - Scaling Up UNIX",
	AUTHOR = "A. Barak and O. G. Paradise",
	BOOKTITLE = "Proc. Usenix Technical Conference",
	YEAR = {1986},
	MONTH = "June",
	ADDRESS = "Atlanta, GA (USA)",
	PAGES = {414-418}				}

@INPROCEEDINGS{euug:86a,
	TITLE = "MOS - A Load Balancing UNIX",
	AUTHOR = "A. Barak and O. G. Paradise",
	BOOKTITLE = "Proc. EUUG conference on Dist. Systems",
	YEAR = {1986},
	MONTH = "Sept.",
	ADDRESS = "Manchester (UK)",
	PAGES = {273-280}				}

@INPROCEEDINGS{barwhl:mos,
	TITLE = "MOSIX: An Integrated Multiprocessor UNIX",
	AUTHOR = "A. Barak and R. Wheeler",
	BOOKTITLE ="Proc. of the Winter 1989 USENIX Conference",
	PAGES = {101-112},
	ADDRESS = {San Diego, CA.},
	MONTH = "February",
	YEAR = {1989} 					}

------------------------------------------------------------------------
From: On G. Paradise <oracle!erseq.oracle.com!oparadis@uunet.UU.NET>

We (at the Hebrew University of Jerusalem) used dynamic load balancing
by process migration on the MOS distributed system. You may wish to
look at Proc. Usenix Tec. Conf. Summer 1986 (Atlanta) and also Porc.
EUUG workshop for distributed systems (Summer 1986, Manchester, UK)
for some papers by A. Barak and myself. There are also some papers in
Software -- Practice and Experience and some papers by Ami Litman in
SIGOPS of 1988.

The system is pretty much a production system -- National
Semiconducturs ported it to their 32532 chip, and it is used there and
by a coupe of universities as a general-purpose Unix engine.

You may wish to contact Prof. Amnon Barak (amnon@humus.huji.ac.il)
for more recent results.

On
-----------------------------------------------------------------------