[mod.os] Process Migration in the Sprite Operating System

darrell@sdcsvax.UUCP (04/17/87)
The Sprite OS has been mentioned in an earlier submission to mod.os.
I have implemented a process migration facility for Sprite, which I
shall describe briefly here.  (A Berkeley tech report is available; to
request it, send me your US mail address.)

Process migration is a means to transfer processes between
workstations during execution.  In contrast to a remote execution
facility, in which processes are started on different workstations and
execute there for their entire lifetime, migration allows processes to
be moved at any time and continued on a new host.  This ability is
particularly useful for long-running jobs (e.g., simulations) and
parallel jobs (e.g., a "make" in which compilations are done in
parallel on multiple machines).

We envision an environment in which there are many workstations in a
domain, each of which "belongs" to a particular user.  When someone is
actively using his or her workstation, we don't want "foreign"
processes to degrade the performance of the local user.  Therefore,
one may only migrate to a workstation in which no one has been active
for several minutes, and if the local user becomes active again, the
foreign process is migrated away.

Our implementation of process migration involves cooperation between
the node to which a process "belongs" and the one to which it has
been migrated, known as the "home node" and "remote node",
respectively.  The home node is responsible for handling such things
as signals and shared environments, which are always relative to the
home node of a process.  The remote node handles file I/O (including
IPC, since processes communicate using named pipes) by bypassing the
home node and communicating directly with file servers.  Both the home
and the remote nodes must support system calls such as fork and exit,
since the home node must keep track of all processes that it "owns"
even if they are executing remotely.

Our main concern with migration was the issue of home-node support.
We didn't want the cost of sending the fork/exit/signal/... system
calls home, to make remote execution on an idle machine slower than
local execution on a loaded machine.  Fortunately, the amount of
computation and I/O done by such things as "cc" outweigh the cost of
process management, and there is only about a 5% penalty for executing
something remotely rather than locally, if both the home and remote
nodes are otherwise idle.  Since migration should be used when the
remote node is idle but the home node is heavily utilized, the extra
computing power on the remote node more than makes up for the cost of
communicating with the home node.  Some simple benchmarks showed a
40-60% improvement by spreading two or more simultaneous compilations
across workstations rather than running them all locally.

I would be interested in hearing about other implementations of
process migration.  I am certainly aware of process migration in V and
LOCUS, for example, but I imagine there are other new implementations
that have not yet been widely publicized.  (I guess descriptions of
such systems could be posted directly to comp.os.research.)  Please
send any comments or questions to:

	douglis@ucbarpa.Berkeley.EDU
	ucbvax!douglis

- Fred -