jonathan@cs.pitt.edu (Jonathan Eunice) (04/17/91)
Process migration -- the ability to move a running process from one system to another without (more than very temporarily) disrupting execution -- seems just what the doctor ordered for highly networked/distributed computing environments. It allows easy network-wide load balancing (move tasks to powerful or lightly-loaded systems) and greater availability/reliability (move processes off systems that must be shut down for maintenance). The support code for process migration also leverages into other desirable features, such as process checkpointing and synchronized parallel execution (for high-reliability environments). Yet I know of only one commercial system that offers it (the Transparent Computing Facility of IBM's AIX/370 and AIX PS/2). Even IBM emphasizes TCF's file-sharing, not its process migration (reasonable in a mainframe environment). Why isn't process migration common? Are there systems of which I'm unaware that can do process migration? Perhaps Stratus, Tandem, Sequoia, other FT players? What about DEC's VAXclusters -- do they implement this capability (didn't last I knew, several years ago)? Even if specialized vendors offer the capability, why isn't it mainstream? Aren't large workstation networks (recent Sun ad: "The network is the computer") the perfect opportunity? Conversely, isn't process migration exactly the right kind of technology necessary to transparently exploit large workstation networks? Wouldn't this be a leapfrog over current rsh/rexec/HP Task Broker approaches? Isn't there a significant competitive advantage/user enabler herein? Why aren't AT&T/UI and especially OSF, given its penchant for "next generation" distributed software like DCE and DME, doing process migration? (This query is posted to a variety of tangential newsgroups. Let's restrict discussion and followups to comp.os.misc.)