darrell@sdcsvax.UUCP (04/17/87)
The Sprite OS has been mentioned in an earlier submission to mod.os. I have implemented a process migration facility for Sprite, which I shall describe briefly here. (A Berkeley tech report is available; to request it, send me your US mail address.) Process migration is a means to transfer processes between workstations during execution. In contrast to a remote execution facility, in which processes are started on different workstations and execute there for their entire lifetime, migration allows processes to be moved at any time and continued on a new host. This ability is particularly useful for long-running jobs (e.g., simulations) and parallel jobs (e.g., a "make" in which compilations are done in parallel on multiple machines). We envision an environment in which there are many workstations in a domain, each of which "belongs" to a particular user. When someone is actively using his or her workstation, we don't want "foreign" processes to degrade the performance of the local user. Therefore, one may only migrate to a workstation in which no one has been active for several minutes, and if the local user becomes active again, the foreign process is migrated away. Our implementation of process migration involves cooperation between the node to which a process "belongs" and the one to which it has been migrated, known as the "home node" and "remote node", respectively. The home node is responsible for handling such things as signals and shared environments, which are always relative to the home node of a process. The remote node handles file I/O (including IPC, since processes communicate using named pipes) by bypassing the home node and communicating directly with file servers. Both the home and the remote nodes must support system calls such as fork and exit, since the home node must keep track of all processes that it "owns" even if they are executing remotely. Our main concern with migration was the issue of home-node support. We didn't want the cost of sending the fork/exit/signal/... system calls home, to make remote execution on an idle machine slower than local execution on a loaded machine. Fortunately, the amount of computation and I/O done by such things as "cc" outweigh the cost of process management, and there is only about a 5% penalty for executing something remotely rather than locally, if both the home and remote nodes are otherwise idle. Since migration should be used when the remote node is idle but the home node is heavily utilized, the extra computing power on the remote node more than makes up for the cost of communicating with the home node. Some simple benchmarks showed a 40-60% improvement by spreading two or more simultaneous compilations across workstations rather than running them all locally. I would be interested in hearing about other implementations of process migration. I am certainly aware of process migration in V and LOCUS, for example, but I imagine there are other new implementations that have not yet been widely publicized. (I guess descriptions of such systems could be posted directly to comp.os.research.) Please send any comments or questions to: douglis@ucbarpa.Berkeley.EDU ucbvax!douglis - Fred -