[comp.os.misc] process migration - status and availability

mike@sachiko.acc.stolaf.edu (Mike Haertel) (04/15/91)

In article <10422@pitt.UUCP> jonathan@cs.pitt.edu (Jonathan Eunice) writes:
>Why isn't process migration common?

Two reasons I can think of right off the top of my head are:

1.  It is probably difficult to retrofit to existing operating systems.
My guess (as someone who is attempting to design an OS for fun) is that
some early design decisions (such as whether to have lots of implicit
per-process state, like Unix) can make or break the feasibility of migration.

2.  The benefits may not be all that great.  For example, is it worthwhile
to migrate a process that is paged to a local disc to another machine from
which it will have to be paged over the network?  Similar questions apply
regarding other kinds of I/O bound jobs.

One OS I know of that could easily support migration is Amoeba; it's
interested to note that the current version of Amoeba supports load
balancing to control processor allocation at process creation time,
but does not have any more general migration facilities.  I suspect
this is a case of getting 90% of the benefits at 10% of the cost.

Personally, I tend to think of process migration as being a good test
of the flexibility of an OS, but not something you really want to use
in practice.

gdtltr@chopin.udel.edu (root@research.bdi.com (Systems Research Supervisor)) (04/17/91)

In article <10422@pitt.UUCP> jonathan@cs.pitt.edu (Jonathan Eunice) writes:
=>
=>Why isn't process migration common?
=>
   Two reasons: 1) It is hard to do in general. There is more to a
process's state than most people realize, including the kernel state and
relationships with other processes. There is also the potential overhead
of copying the entire code and data across a network. There are a couple
papers on the subject that confirm this, but I don't have my bibliographical
stuff with me.
2) Depending on the system model, you may get comparable performance simply
by using load balancing when new processes are created and leaving them
where they go. The only case where I would want process migration is when
more than one long-running CPU-bound processes are assigned (erroneously)
to a single node. The most prominent use of processes migration I know of is
in Sprite, which primarily uses it as a policy tool: foreign processes
running on a user's idle workstation are evicted to their "home" node when
the user returns.

                                        Gary Duzan
                                        Time  Lord
                                    Third Regeneration



-- 
                            gdtltr@brahms.udel.edu
   _o_                      ----------------------                        _o_
 [|o o|]   Two CPU's are better than one; N CPU's would be real nice.   [|o o|]
  |_o_|           Disclaimer: I AM Brain Dead Innovations, Inc.          |_o_|

gdtltr@chopin.udel.edu (root@research.bdi.com (Systems Research Supervisor)) (04/17/91)

In article <1991Apr14.215738.8745@news.stolaf.edu> mike@sachiko.acc.stolaf.edu (Mike Haertel) writes:
=>
=>One OS I know of that could easily support migration is Amoeba; it's
=>interested to note that the current version of Amoeba supports load
=>balancing to control processor allocation at process creation time,
=>but does not have any more general migration facilities.  I suspect
=>this is a case of getting 90% of the benefits at 10% of the cost.
=>
   I believe that Amoeba supports a process checkpointing function (stun?)
for dumping process state to a file. Theoretically, this could be restarted
on another compatible processor. Amoeba avoids the problem of redirecting I/O
by making capabilities for files location-independent. Note, however, that
if a related process or file is killed/deleted while the process is being
migrated, the restarted process would most likely fail at some point.

                                        Gary Duzan
                                        Time  Lord
                                    Third Regeneration



-- 
                            gdtltr@brahms.udel.edu
   _o_                      ----------------------                        _o_
 [|o o|]   Two CPU's are better than one; N CPU's would be real nice.   [|o o|]
  |_o_|           Disclaimer: I AM Brain Dead Innovations, Inc.          |_o_|

douglis@cs.vu.nl (Fred Douglis) (04/17/91)

mike@sachiko.acc.stolaf.edu (Mike Haertel) writes:

>One OS I know of that could easily support migration is Amoeba; it's
>interested to note that the current version of Amoeba supports load
>balancing to control processor allocation at process creation time,
>but does not have any more general migration facilities.  I suspect
>this is a case of getting 90% of the benefits at 10% of the cost.

As the designer of process migration in Sprite, and the person who plans
to implement full process migration in Amoeba, I might as well throw
in my two cents:

	- Process migration is not that hard, once you have transparent
	  remote execution.  As Amoeba already supports the latter, adding
	  migration shouldn't be the other 90% of the cost.  Rather, I'd
	  guess (and I'm hoping) that 90% of the cost has already been
	  paid, and I can put in full migration for the other 10%.

	- You're absolutely right about process creation time being
	  the important point.  I think the general consensus about
	  migration is that it's more useful for other things, like
	  machine autonomy (as in Sprite, with personal workstations)
	  or to move from a machine that's being shut down (TCF/AIX has
	  been used for this purpose). 
--
=============================================================================
     Fred Douglis, Vrije Universiteit, douglis@cs.vu.nl +31 20 548-5777
=============================================================================

timk@cs.qmw.ac.uk (Tim Kindberg) (04/17/91)

In <16932@chopin.udel.edu> gdtltr@chopin.udel.edu (root@research.bdi.com 
(Systems Research Supervisor)) writes:

>In article <10422@pitt.UUCP> jonathan@cs.pitt.edu (Jonathan Eunice) writes:
>=>
>=>Why isn't process migration common?
>=>
>   Two reasons: 1) It is hard to do in general. There is more to a
>process's state than most people realize, including the kernel state and
>relationships with other processes. There is also the potential overhead
>of copying the entire code and data across a network.

To minimise copying, standard VM techniques can be employed.  Sprite, for 
example, writes dirty data/stack pages to disc and page-faults its address 
space components back in again as necessary at the new site.  Incidentally, 
Sprite exhibits another migration issue related to your first point.  A 
migrated process in Sprite still depends for some facilities on its original 
site, making it vulnerable to that site's failure; and also meaning that it 
continues to impose a certain amount of load there.

Migration is a better prospect on a distributed memory multiprocessor with a 
high interconnection bandwidth (my own kernel, Equus, shows this: 60 
milliseconds to migrate all of a 100K process over a VME bus-based network; 
420 milliseconds for a 1M process).
 
>There are a couple
>papers on the subject that confirm this, but I don't have my bibliographical
>stuff with me.
See Y. Artsy, R. Finkel, 'Designing a process migration facility - the 
Charlotte experience, IEEE Computer, vol 22, no 9, Sep 89, pp 47-56, for info 
and references on a number of designs.

>2) Depending on the system model, you may get comparable performance simply
>by using load balancing when new processes are created and leaving them
>where they go. The only case where I would want process migration is when
>more than one long-running CPU-bound processes are assigned (erroneously)
Erroneously? Who/what knew they were going to be CPU bound?  Or, suppose the 
system put them there because it made sense given the load on the other 
machines at the time?  Having a process migration facility means that, when 
the cross-machine load profile becomes unbalanced due to processes dying or 
entering new phases with different load-related behaviours, you can do 
something about it.

Other reasons for having process migration (apart from withdrawing when 
someone logs on to a previously idle workstation): 1) if two processes start a 
lengthy interaction involving only synchronous communication, migrate one of 
them to the other's site to save network overhead.  2) I don't have virtual 
memory in Equus; if a process attempts to increase its data size and fails for 
lack of memory, it can migrate to another site where there is sufficient 
memory.

I'm interested to hear about other implementations.  In particular, I'm not 
familiar with the AIX one: can anyone give me a reference for that?


--

Tim Kindberg

UUCP:      timk@qmw-cs.uucp                      | Computer Science Dept
ARPA:      timk%cs.qmw.ac.uk@nsfnet-relay.ac.uk  | QMW, University of London
JANET:     timk@uk.ac.qmw.cs                     | Mile End Road
Voice:     +44 71 975 5236 (Direct Dial)         | London E1 4NS

gd@geovision.gvc.com (Gord Deinstadt) (04/18/91)

In article <10422@pitt.UUCP> jonathan@cs.pitt.edu (Jonathan Eunice) writes:
>Why isn't process migration common?

Several replies suggested that it may not be all that useful.  I can
think of a use: fault-tolerant systems.  Though you need more than
just process migration, it would be an interesting way to start.
--
Gord Deinstadt  gdeinstadt@geovision.UUCP

gdtltr@brahms.udel.edu (root@research.bdi.com (Systems Research Supervisor)) (04/19/91)

In article <3057@redstar.cs.qmw.ac.uk> timk@cs.qmw.ac.uk (Tim Kindberg) writes:
=>In <16932@chopin.udel.edu> gdtltr@chopin.udel.edu (root@research.bdi.com 
=>(Systems Research Supervisor)) writes:
=>
=>>where they go. The only case where I would want process migration is when
=>>more than one long-running CPU-bound processes are assigned (erroneously)
=>Erroneously? Who/what knew they were going to be CPU bound?

   I suppose my above not-terribly-well-thought-out statement reflects a
subconscious desire for more intelligent process behavior prediction. In
any case, I am most familiar with the Amoeba processor pool model, in which
the normal case would have the processes running on different processors
anyway. Of course, if Dr. Douglis is correct in his estimate of the complexity
of adding process migration to Amoeba, I certainly have no objections. It
can certainly be used to support a number of scheduling policies. If MP
can be done without a significant reduction in system performance or major
increase in kernel complexity, there is little reason not to put it in.

                                        Gary Duzan
                                        Time  Lord
                                    Third Regeneration



-- 
                            gdtltr@brahms.udel.edu
   _o_                      ----------------------                        _o_
 [|o o|]   Two CPU's are better than one; N CPU's would be real nice.   [|o o|]
  |_o_|           Disclaimer: I AM Brain Dead Innovations, Inc.          |_o_|

gdtltr@brahms.udel.edu (root@research.bdi.com (Systems Research Supervisor)) (04/20/91)

In article <1505@geovision.gvc.com> gd@geovision.gvc.com (Gord Deinstadt) writes:
=>In article <10422@pitt.UUCP> jonathan@cs.pitt.edu (Jonathan Eunice) writes:
=>>Why isn't process migration common?
=>
=>Several replies suggested that it may not be all that useful.  I can
=>think of a use: fault-tolerant systems.  Though you need more than
=>just process migration, it would be an interesting way to start.

   How do you propose to migrate a process of a node that has failed? I
suppose if you had a fairly recent checkpoint for the process you could
restart it, but you have to deal with any I/O the process has performed
between the checkpoint and the failure. In any case, I don't think this
is process migration, per se, but it is related.

                                        Gary Duzan
                                        Time  Lord
                                    Third Regeneration



-- 
                            gdtltr@brahms.udel.edu
   _o_                      ----------------------                        _o_
 [|o o|]   Two CPU's are better than one; N CPU's would be real nice.   [|o o|]
  |_o_|           Disclaimer: I AM Brain Dead Innovations, Inc.          |_o_|