Doug.Jensen@K.GP.CS.CMU.EDU (08/17/89)
E. Douglas Jensen (edj@cs.cmu.edu. edj@westford.ccur.com) Alpha E. Douglas Jensen Concurrent Computer Corp. One Technology Way Westford, MA 01886 508-392-2999 The Alpha Real-Time Decentralized Operating System Alpha is an operating system for the mission-critical integration and operation of large, complex, distributed, real-time systems. Only recently have such systems become more common in industrial factory and plant automation (e.g., automobile manufacturing), aerospace (e.g.,space stations), and military (e.g., C3I) contexts. They differ substantially from the more widely known timesharing systems, numerically oriented supercomputers, and networks of personal workstations. More surprisingly, they also depart significantly from traditional real-time systems, which are for low-level sampled data monitoring and control. The most challenging technical requirements dictated by this application domain are in the areas of: satisfying real-time constraints despite the system's inherently stochastic and nondeterministic nature; distributed programming and system-wide (inter-node) resource management; robustness in the face of failures and even attacks; and adaptability to a wide range of ever-changing requirements over decades of use. Satisfying these entails unconventional design and implementation tradeoffs. In Alpha's distributed programming model, activities correspond to threads, which execute concurrently in otherwise passive objects, and cross object (and, transparently and reliably, node) boundaries by means of operation invocation; they carry with them attributes such as urgency, importance, and relibility specified by the application. Alpha instances cooperate to manage the global resources of the entire system based on these attributes, using best-effort resource management algorithms to ensure that as many as possible of the most important aperiodic as well as periodic time constraints are met, permitting graceful degradation in response to the inevitable overloads. To facilitate maintaining integrity of system and application distributed data and programs despite physical dispersal, asynchronous concurrency of execution, and hardware failures, Alpha includes exception handling facilities, thread repair, and kernel-level mechanisms for real-time atomic transactions and object replication. Alpha uses policy/mechanism separation to exploit application specificity in support of adaptability. Departing from common practice, Alpha's performance is optimized for the important high-stress exception cases, such as failure or attack, rather than for the normal, most frequent cases. Alpha embodies results from nine years of research performed by the Archons Project at Carnegie Mellon University, where a prototype was built from 1984 to 1987; another copy has been successfully demonstrated with application software written at General Dynamics Corp. Alpha research is ongoing at CMU and other academic and industrial institutions, but is now led by Concurrent Computer Corp., where it continues to be sponsored in part by DoD. A series of next-generation designs and implementations will be delivered to various Government and industry labs for experimental applications beginning in early 1990.