kenchew@cs.utexas.edu (Khien Mien Kennedy Chew) (09/28/90)
Many moons ago, I asked for information about load balancing and process migration. I received a couple of responses but due to my oversight, I failed to post the responses for interested parties. I apologise for this late posting. Anyway, here are the responses: Enjoy, Kennedy Chew ------------------------------------------------------------------------- Responses to call for information on load balancing and migration ------------------------------------------------------------------------- We're doing some work at JPL on load balancing for the Time Warp Operating System. Chances are that you wouldn't want to use the same ideas, as they are rather specific to some unusual things that system does, but it might be worth looking at. I had a paper on it published in the proceedings of the 1990 conference on Distributed Simulation. (It was just last week.) >[ Check into Locus from UCLA and Sprite from Berkeley. --DL ] >[ ] >[ @INPROCEEDINGS(Wa:sosp:83, ] >[ AUTHOR = "Bruce Walker and others", ] >[ TITLE = "The {LOCUS} Distributed Operating System", ] >[ BOOKTITLE = "Proceedings of the $9^{\rm th}$ " # SOSP, ] >[ MONTH = NOV, ] >[ YEAR = 1983, ] >[ PAGES = "49-70" ] >[ ) ] I can tell you a little about Locus' migration. It worked, it was a full implementation, it allowed forks and execs across network boundaries, and it didn't limit what a migratable process could do. (It was allowed full UNIX semantics.) We never did a load management system using it, though, so it was totally under user control. I'd appreciate it if you would summarize your responses for the net. -- Peter Reiher reiher@onyx.jpl.nasa.gov . . . cit-vax!elroy!jato!jade!reiher ------------------------------------------------------------------------- see also "Eager/Lazowska/Zahorjan" in the 1988 sigmetrics see also "Powell" in the 1983 sigops see also "Theimer" in the 1985 sigops see also "Zayas" in the 1987 sigops see also "Cheriton" in the march, 1988 CACM see also "Smith" in OSR July, 1988 >[ Check into Locus from UCLA and Sprite from Berkeley. --DL ] >[ ] >[ @ARTICLE(Ou:computer:83, ] >[ AUTHOR = "John Ousterhout and others", ] >[ TITLE = "The {S}prite Network Operating System", ] >[ JOURNAL = "IEEE Computer", ] >[ YEAR = 1988, ] >[ VOLUME = 21, ] >[ NUMBER = 2, ] >[ PAGES = "23-36", ] >[ MONTH = FEB ] >[ ) ] > charlie shub cdash@boulder.Colorado.EDU -or- ..!{ncar|nbires}!boulder!cdash or even cdash@colospgs (BITNET) -or- (719) 593-3492 ------------------------------------------------------------------------- >From vev2j@polka-dot.cs.virginia.edu Tue Jan 23 14:40:41 1990 I'm currently writing my thesis: "A Distributed Scheduler for MENTAT". In my research, I compiled some literature that I think will help you in the policies aspect. I basically implemented the scheduler for MENTAT with the following characteristics: Load-sharing Adaptive Sender-Initiated parameterizable static assignment stable. The scheduling decision was divided in two parts: can be executed locally? and if not, where to send the objects for execution? Beginning on March 1990, we'll start doing the analysis of the algorithm vs. the sequential time. At this moment, we don't have literarture about our scheduler. Here are the references: [Baum89] K. M. Baumgartner and B. W. Wah, ``GAMMON: A Load Balancing Strategy for Local Computer Systems with Multiaccess Networks'', \fI IEEE Transactions on Computers\fR, vol. 38, pp. 1098-1109, August 1989. [Boel89] R. K. Boel and J. H. Van Schuppen, ``Distributed Routing for Load Balancing'', \fI Proceedings of the IEEE\fR, vol. 77, pp. 210-221, January 1989. [Brya81] R. M. Bryant and R. A. Finkel, ``A Stable Distributed Scheduling Algorithm'', \fI The 2nd International Conference on Distributed Computing Systems\fR, Paris, France, April 1981. [Casa88] T. L. Casavant and J. G. Kuhl, ``A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems'', \fI IEEE Transactions on Software Engineering\fR, vol. 14, pp. 141-154, February 1988. [Diks89] P. Dikshit, S. K. Tripathi and P. Jalote, ``SAHAYOG: A Test Bed for Evaluating Dynamic Load-Sharing Policies'', \fI Software-Practice and Experience\fR, vol. 19, pp. 411-435, May 1989. [Eage86a] D. L. Eager, E. D. Lazowska and J. Zahorjan, ``A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing'', \fI Performance Evaluation\fR, vol. 6, pp. 53-68, March 1986. [Eage86b] D. L. Eager, E. D. Lazowska and J. Zahorjan, ``Adaptive Load Sharing in Homogeneous Distributed Systems'', \fI IEEE Transactions on Software Engineering\fR, vol. 12, pp. 662-675, May 1986. [Eage88] D. L. Eager, E. D. Lazowska and J. Zahorjan, ``The Limited Performance Benefits of Migrating Active Processes for Load Sharing'', \fI Performance Evaluation Review, ACM\fR, vol. 16, pp. 63-72, May 1988. [Eage89] D. L. Eager, E. D. Lazowska and J. Zahorjan, ``Speedup Versus Efficiency in Parallel Systems'', \fI IEEE Transactions on Computers\fR, vol. 38, pp. 408-423, March 1989. [Grim87] A. S. Grimshaw and J. W. Liu, ``MENTAT: An Object-Oriented Data- Flow System'', \fI Proceedings of the 1987 Object-Oriented Programming Systems, Languages and Applications Conference, ACM\fR, pp. 35-47, October 1987. [Grim88a] A. S. Grimshaw, ``MENTAT: An Object-Oriented Macro Dataflow System'', report no. UIUCDCS-R-88-1440, University of Illinois at Urbana-Champaign, June 1988. [Grim88b] A. S. Grimshaw and J. W. Liu,``The Mentat Programming Language and Architecture'', \fIProceedings Workshop on Future Trend of Distributed Computing Systems\fR, Hong Kong, September 1988. [Hac89] A. Hac, ``Load Balancing in Distributed Systems: A Summary'', \fI Performance Evaluation Review, ACM\fR, vol. 16, pp. 17-25, February 1989. [Hsu86] C. H. Hsu and J. W. Liu, ``Dynamic Load Balancing in Distributed Systems'', \fI The 6th International Conference on Distributed Computing Systems\fR, Cambridge, MA, May 1986. [Liu86a] J. W. Liu and A. S. Grimshaw, ``A Distributed System Architecture Based on Macro Data Flow Model'', \fI Proceedings Workshop on Future Directions in Architecture and Software\fR, South Carolina, May 1986. [Liu86b] J. W. Liu and A. S. Grimshaw, ``An Object-Oriented Macro Data Flow Architecture'', \fI Proceedings of the 1986 National Communications Forum\fR, September 1986. [Lo84] V. M. Lo, ``Heuristic Algorithms for Task Assignment in Distributed Systems '', \fI The 4th International Conference on Distributed Computing Systems\fR, San Fransisco, California, May 1984. [Shen88] S. Shen, ``Cooperative Distributed Dynamic Load Balancing'', \fI Acta Informatica\fR, vol. 25, pp. 663-676, June 1988. [Shin89] K. G. Shin and Y.-C. Chang, ``Load Sharing in Distributed Real-Time Systems with State-Change Broadcasts'', \fI IEEE Transactions on Computers\fR, vol. 38, pp. 1124-1142, August 1989. [Shu89] W. Shu and L. V. Kale, ``Dynamic Scheduling of Medium-Grained Processes on Multicomputers'', \fI Technical Report\fR, report no. UIUCDCS-R-89-1528, University of Illinois at Urbana-Champaign, July 1989. [Stan84] J. A. Stankovic and I. S. Sidhu, ``An Adaptive Bidding Algorithm for Processes, Cluster and Distributed Groups'', \fI The 4th International Conference on Distributed Computing Systems\fR, San Fransisco, California, May 1984. [Stan85] J. A. Stankovic, K. Ramamritham and W. H. Kohler, ``A Review of Current Research and Critical Issues in Distributed System Software'', \fI IEEE Distributed Processing Technical Committee NEWSLETER\fR, vol. 7, pp. 14-47, March 1985. [Tane85] A. S. Tanenbaum and R. Van Renesse, ``Distributed Operating Systems'', \fI Computing Surveys\fR, vol. 17, pp. 419-470, December 1985. [Tant85] A. N. Tantawi and D. Towsley, ``Optimal Static Load Balancing in Distributed Computer Systems'', \fI Journal of the ACM\fR, vol. 32, pp. 445-465, April 1985. Virgil. ------------------------------------------------------------------------- >From beers@tcgould.TN.CORNELL.EDU Wed Jan 24 06:29:08 1990 Marc Willebeck-Lemair presented a paper on this at a receint SIAM supercomputer conference in Chicago, in Dec, 89. I have an address and phone number for Marc, but not an e-mail address. His address is 401 Phillip Cornell University Ithaca, NY 14853 and his phone there is 607-255-7558. The work he is doing involves balancing jobs on a loosely coupled system, and migrating these jobs, when there is an imbalance. Jim Beers beers@tcgould.tn.cornell.edu ------------------------------------------------------------------------ >From douglis@sprite.Berkeley.EDU Wed Jan 24 11:43:54 1990 btw, i also included another paper that i wrote around the same time and which appeared in a workshop last fall: @ARTICLE{douglis:mig-experience, AUTHOR = {F. Douglis}, TITLE = {Experience with Process Migration in {S}prite}, JOURNAL = {Workshop on Experience with Building Distributed (and Multiprocessor) Systems}, YEAR = {1989}, MONTH = OCT, } ------------------------------------------------------------------------ >From ravi@iag.hp.com Wed Jan 24 13:37:00 1990 You may be interested in my work on process migration - particularly the mechanics issues related to it. The particular problem that I have worked on is how to route messages to moving processes in a large distributed system. Please check: UCLA Computer Science Dept. Technical Report August 1989 CSD-890049 ravi ------------------------------------------------------------------------ Sylvain R.Y. Louboutin Department of Computer Science Telephone: (+353-1) 69-32-44 ext. 24-88 University College Dublin E-mail: S_LOUBOUTIN@CS.UCD.IE Belfield, Dublin 4 or LOUBOUTN@CCVAX.UCD.IE Ireland or SLOUBH91@IRLEARN.BITNET (*) "A Survey of Process Migration Mechanisms" Jonathan M. Smith Computer Science Department, Columbia University, New York, NY 10027 in ACM Operating System Review, Vol 22 No 3, July 1988 ------------------------------------------------------------------------ I am working on a project which may be related to what you want to do with migrating work on a local area network. Here is a copy of the announcement I have posted in several newsgroups earlier. -- mike Subject: Condor - Run UNIX jobs on idle workstations (available via ftp/uucp) Brief Description: Condor is a facility for executing UNIX jobs on a pool of cooperating workstations. Jobs are queued and executed remotely on workstations at times when those workstations would otherwise be idle. A transparent checkpointing mechanism is provided, and jobs migrate from workstation to workstation without user intervention. When the jobs complete, users are notified by mail. No source code changes are required for use of Condor, but executables must be specially linked. Condor is especially suited for execution of long running, compute bound jobs. Limitations: There are several restrictions on the type of program which can be executed by the condor facility. Only single process jobs are supported, and signals and IPC calls are not implemented. Copyright: Condor is copyrighted, but available without any charge or license agreement. The copyright is not very restricitve, but requires reproduction of the copyright, and disclaims the University of Wisconsin from any responsibility for problems connected with condor. Currently supported architectures and UNIX variants: VAX BSD 4.3 VAX Ultrix 3.0 and above SPARC SunOS4.0 and above I386 Dynix MC68020 SunOS3.2 (and probably 4.0 and above) DECstation 3100 Ultrix 3.0 and above How to get it: Ftp to "shorty.cs.wisc.edu" and login as "ftp". Give anything except the null string as a password. Fetch the "README" file. Switch your ftp user program to "binary" mode. Fetch "Condor_4.0.0.tar.Z". The README file tells what do to from there. The software is also available via anonymous ftp from uunet.uu.net in "/networking", and via anonymous uucp from osu-cis in directory "/pub". Documentation: Source for all the documentation is included in the distribution. The README file explains how to extract just the source, which you should do first, so you can plan where to extract everything else. Some of the documents contain "gremlin" pictures, but the ditroff ready versions are also included for those who don't have gremlin. If you can't print the docs, send me your address, and I'll mail you a set. Write or call if you have any questions or problems. email: mike@cs.wisc.edu or ucbvax!uwvax!mike phone: (608) 262-6122 Mike Litzkow ------------------------------------------------------------------------ @INPROCEEDINGS{usenix:86, TITLE = "MOS - Scaling Up UNIX", AUTHOR = "A. Barak and O. G. Paradise", BOOKTITLE = "Proc. Usenix Technical Conference", YEAR = {1986}, MONTH = "June", ADDRESS = "Atlanta, GA (USA)", PAGES = {414-418} } @INPROCEEDINGS{euug:86a, TITLE = "MOS - A Load Balancing UNIX", AUTHOR = "A. Barak and O. G. Paradise", BOOKTITLE = "Proc. EUUG conference on Dist. Systems", YEAR = {1986}, MONTH = "Sept.", ADDRESS = "Manchester (UK)", PAGES = {273-280} } @INPROCEEDINGS{barwhl:mos, TITLE = "MOSIX: An Integrated Multiprocessor UNIX", AUTHOR = "A. Barak and R. Wheeler", BOOKTITLE ="Proc. of the Winter 1989 USENIX Conference", PAGES = {101-112}, ADDRESS = {San Diego, CA.}, MONTH = "February", YEAR = {1989} } ------------------------------------------------------------------------ From: On G. Paradise <oracle!erseq.oracle.com!oparadis@uunet.UU.NET> We (at the Hebrew University of Jerusalem) used dynamic load balancing by process migration on the MOS distributed system. You may wish to look at Proc. Usenix Tec. Conf. Summer 1986 (Atlanta) and also Porc. EUUG workshop for distributed systems (Summer 1986, Manchester, UK) for some papers by A. Barak and myself. There are also some papers in Software -- Practice and Experience and some papers by Ami Litman in SIGOPS of 1988. The system is pretty much a production system -- National Semiconducturs ported it to their 32532 chip, and it is used there and by a coupe of universities as a general-purpose Unix engine. You may wish to contact Prof. Amnon Barak (amnon@humus.huji.ac.il) for more recent results. On -----------------------------------------------------------------------