[comp.doc.techreports] clouds

E1AR0002@SMUVM1.BITNET (Leff, Southern Methodist University) (05/03/87)

Listed below are entries (in the format used by refer(1)) for papers
written by members of the Clouds project at Georgia Tech, which has
been concerned with the design and implementation of a reliable,
decentralized operating system prototype since late 1981.  Copies of
most of these reports may be obtained by writing to the following
address:

    Technical Reports Librarian
    School of Information and Computer Science
    Georgia Institute of Technology
    Atlanta, GA  30332-0280

Please mention the technical report number.

----------

%A M. Ahamad
%A M. H. Ammar
%A J. Bernabeu
%A M. Y. A. Khalidi
%T A Multicast Scheme for Locating Objects in a Distributed Operating System
%R Technical Report GIT-ICS-87/01
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D January 1987
%K Clouds
%X Object-oriented  distributed  operating  systems   that   provide
location-independent  access  to objects must locate and invoke an
object remotely when the object is not local and its location  is not
known.  Commonly, a  remote invocation is broadcast and each node
performs a search to determine if the invoked object  exists locally.
When  objects  are  not  replicated, only a single node finds the
object while the search performed by the other nodes is wasted
computation.
%X In this paper, we present a scheme for distributing and
locating objects that reduces this wasteful  computation.  We describe
a set of protocols for object creation, invocation, deletion and
migration.  These  protocols  exploit  the multicast  capability of the
underlying communication network and hence only a subset of the nodes
receives a remote invocation.  A mathematical  model  of  the  system
using the proposed scheme is presented and analyzed in order to
demonstrate how various parameters affect system performance.

%A M. Ahamad
%A P. Dasgupta
%A R. J. LeBlanc
%A C. T. Wilkes
%T Fault-Tolerant Computing in Object Based Distributed Operating Systems
%J Proceedings of the Sixth Symposium on Reliability in Distributed Software and
 Database Systems
%I IEEE Computer Society
%C Williamsburg, VA
%D March 1987
%P 115-125
%K Clouds replication naming PET
%X Replication of data has been used for enhancing its availability in the
presence of failures in distributed systems. Data can be replicated
with greater ease than generalized objects. We review some of the
techniques used to replicate objects for resilience in distributed
operating systems.  We discuss the problems associated with the
replication of objects and present a scheme of replicated actions and
replicated objects, using a paradigm we call PETs (parallel execution
threads).
%X The PET scheme not only exploits the high availability of
replicated objects but also tolerates site failures that happen while
an action is executing.  We show how this scheme can be implemented in
a distributed object based system, and use the  Clouds  operating
system as an example testbed.

%A M. Ahamad
%A P. Dasgupta
%T Parallel Execution Threads: An Approach to Fault-Tolerant Actions
%R Technical Report GIT-ICS-87/16
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D March 1987
%K Clouds replication naming
%X A distributed system can support fault tolerant atomic actions by
replicating data and computation at sites that have independent failure
modes.  We present a scheme called parallel execution threads (PET) that can
be used to implement fault tolerant actions in an object-based distributed
environment.  This scheme tolerates existing as well as transient failures.
The details of the PET scheme as well as the commit protocols used by it are
described.  We also consider the integration of the PET scheme in the
\fIClouds\fP distributed operating system.

%A J. E. Allchin
%A M. S. McKendry
%T Object-Based Synchronization and Recovery
%R Technical Report GIT-ICS-82/15
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D September 1982
%X Using abstract data types and nested actions as system structuring
tools can help create more robust systems.  In pursuing the goal of
creating an operating system using these tools, several interesting
principles have been encountered.  First, in this environment
synchronization and recovery should be associated with each object.  By
associating synchronization with each object and by using the semantics
of the object operations, it is possible to achieve higher
concurrency.  Binding recovery to objects permits efficient recovery
techniques which might not be possible without the specific
implementation knowledge available to the programmer of the object.
Second, it is important to distinguish between the abstract behavior of
an object and its implementation when analyzing concurrency.  Third,
using serializability for the abstract behavior of an object is
sometimes undesirable or unnecessary.  Whether an object provides
serializability as the abstract behavior depends on the semantics of
how the object is used.  Examples of object types which motivate the
principles are presented.

%A J. E. Allchin
%T A Suite of Algorithms for Maintaining Replicated Data Using Weak Correctness
Conditions
%R Technical Report GIT-ICS-82/18
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D December 1982
%X A suite of decentralized algorithms for maintaining distributed
replicated data is presented.  The algorithms do not necessarily
achieve serial consistency, but they are adequate for many simple data
storage problems in operating systems and realtime systems.
Applications which appear well-suited to the suite include mail
systems, naming servers, appointment calendars, certain types of file
dictionaries, operating system load tables (e.g., routing), and device
state in distributed process control systems.  The algorithms are
robust and are intuitively easy to understand.  The algorithms assume
an unreliable network and tolerate node failures, network partitions,
lost, duplicate, and out-of-order messages.  Both goals for replicating
data--high availability and rapid response time--are met by the
algorithms.  The basic algorithms use resolution tables to state the
outcome of information conflicts caused by concurrent actions or
unreliable nodes and communication.  Each algorithm is oriented toward
different application requirements and provides a different degree of
message traffic overhead and availability.  The efficiency of the
algorithms depends on the acceptability of weak correctness conditions
in the applications.  The correctness condition for one of the
algorithms is formally defined and the algorithm is proved to be
correct (with other proofs following in a straightforward manner from
the framework presented).  This algorithm has also been implemented.

%A J. E. Allchin
%A M. S. McKendry
%T Facilities for Supporting Atomicity in Operating Systems
%R Technical Report GIT-ICS-83/01
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D January 1983
%X One of the problems fundamental to distributed computing is maintaining
the atomicity of a sequence of operations despite concurrent activity
or system/application failures.  \fIAtomic actions\fP have been used
for this purpose in database systems and recently in programming
languages.  This paper introduces support for atomicity in the kernel
of an operating system.  This support is not limited to managing just
one type of data (\fIe.g.,\fP files) and could be used to ensure that
any action (or task) be accomplished atomically on a set of user
definable objects.  The atomicity framework presented uses processes,
actions, and objects.  Requirements for atomicity are discussed and
system primitives are defined which include the ability to create and
terminate nested actions, control concurrency between actions, and
recover from action aborts.  The facilities presented provide system
designers and programmers with the ability to control consistency
requirements using whatever semantic knowledge is available.  The
atomicity thus attained is called \fIsemantic atomicity\fP.  Unlike
other work, we do not tightly bind processes to actions, thus allowing
the facilities presented to be applicable to a wide class of systems
(including applications where actions are supported by cooperating
processes).  One possible approach for integration of the facilities
into a programming language is discussed related to the Clouds
decentralized global operating system.  The desirability for semantic
atomicity is illustrated through a file directory system example.  Use
of the facilities to address the problem of actions supported by
cooperating processes is also illustrated through an example.

%A J. E. Allchin
%T How to Shadow a Shadow
%R Technical Report GIT-ICS-83/05
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D February 1983
%X Several file and database systems have used a shadowing technique for
recovery purposes on data files which are not concurrently accessed.
Essentially there are two versions:  the current version and a shadow
version.  Transactions manipulate only the current version.  When a
change is first made to a data page, a new page is allocated and the
current version page directory is updated with the new page location.
The usual implementation is exceptionally efficient for small to
medium-sized files because on transaction termination the only
processing required is to determine which version should become the
shadow; the other version is discarded.  This paper discusses an
efficient solution for using this approach with concurrent
transactions.  We present a technique for building not only
single-level concurrent transactions, but nested transactions which may
be concurrent as desired.

%A J. E. Allchin
%A M. S. McKendry
%T Support for Objects and Actions in Clouds: Status Report
%R Technical Report GIT-ICS-83/11
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D May 1983
%X This status report describes the current work of the \fBClouds\fP
project at Georgia Tech.  The Clouds project is studying techniques for
construction of reliable computing systems in environments of
distributed machines interconnected by local area networks.  This
report emphasises the functional requirements for architectural
support.  To support reliability, the architecture supports
\fIobjects\fP and \fIactions\fP.  Objects are instances of abstract data
types.  They provide a basis for building system components and for
controlling the behaviour of a system when failures occur.  Atomic
actions are a means of dynamically grouping invocations of operations
on objects into units of work that either complete in their entirety or
do not have any effect whatsoever.  Recovery mechanisms assist in
maintaining this abstraction and synchronization mechanisms control
interactions between actions.
%X The techniques described are oriented particularly toward highly
dynamic applications in which the payoff for reliability is high and
the loads placed on the system vary substantially.  The architectural
support may be tailored to particular applications, even within a
single system.  It is possible, for example, to use `hot spares' (an
on-line spare is maintained so that no time is lost upon failure), or
slower but cheaper recovery in which computations are restarted after
failures.  Mechanisms are provided to make it possible to bring failed
machines on-line and integrate them with the remainder of the system
without disruption.  To improve efficiency and limit the propagation of
the effects of failures, mechanisms are provided to construct
\fInested\fP actions, which function as components of larger actions
while failing independently of their containing actions.

%A J. E. Allchin
%A M. S. McKendry
%T Synchronization and Recovery of Actions
%J Proceedings of the Second Symposium on Principles of Distributed Computing
%C Montreal
%I ACM SIGACT/SIGOPS
%D August 1983
%K Clouds
%X We introduce an approach to robust computation in distributed systems.
This approach is the foundation for reliability in the \fBClouds\fP
decentralized operating system.  It is based on atomic actions
operating on instances of abstract data types (objects).  We present an
event-based model of computation in which scheduling of responses to
operation invocations is controlled by objects.  We discuss an
integrated strategy for synchronization \fIand\fP recovery which uses
relationships between the abstract states of objects to track
dependencies between actions.  Serializability is defined in terms of
the semantics of operations.  This permits high concurrency to be
obtained in non-serializable implementations without deviation from
serializable abstract behavior.  We define a class of schedulers that
allows objects to make autonomous scheduling decisions.  We present the
use of non-serializable operation semantics.  Finally, we discuss
implementation of the model, including action synchronization, object
operation ordering using action-based counting semaphores, and action
recovery.

%A J. E. Allchin
%T An Architecture for Reliable Decentralized Systems
%R Ph.D. Diss.
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%O Also released as technical report GIT-ICS-83/23
%K thesis Clouds
%D September 1983
%X Constructing reliable programs for distributed processing systems is a
very difficult task.  \fIActions (transactions),\fP indivisible units
of work, can simplify this process by providing uniform treatment of
failures and preventing interference.  These units of work can also be
\fInested,\fP further controlling the scope of concurrency and
failures.  The atomicity provided by actions is an important tool for
building reliable decentralized systems.
%X Actions manipulate pieces of data called \fIobjects\fP.  Objects are
usually treated as uninterpreted (bit strings).  However, treating all
objects in this fashion can result in unacceptable concurrency or high
recovery overhead.  In order to take advantage of actions in the widest
possible context, it is necessary to consider operations on generalized
objects (instances of abstract data types).
%X This report presents a general architectural model for reliable
decentralized systems constructed using actions and object.  We include
one prototype design created from the model.  We also include practical
algorithms necessary to implement this design.
%X We present a nested action management algorithm that, to our knowledge,
is the first such algorithm to separate remote call semantics from
action units.  It also guarantees that \fIorphans\fP, computational
parts of actions that will be eventually aborted, view consistent
system states.  We describe a design for synchronization and recovery
that is oriented toward a programming-based view of objects (as well as
simple data).  We demonstrate the usefulness of our results through
typical reliable programming problems.
%X Availability is another important dimension of distributed systems.  We
describe a novel collection of simple, yet very robust, replication
algorithms which can increase data availability.  Algorithms from this
suite can be customized to balance particular tradeoffs required in
different application systems.  The efficiency of the algorithms
depends on the acceptability of weak consistency conditions in the
applications.  One member of this suite is formally modelled and proven
correct.  The other follow in a straightforward manner.

%A P. Dasgupta
%A R. LeBlanc
%A E. Spafford
%T The Clouds Project: Design and Implementation of a Fault-Tolerant Distributed
 Operating System
%R Technical Report GIT-ICS-85/29
%D 1985
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%X The \fIClouds\fP project at Georgia Tech was initiated to conduct
research into failure resistant, efficient distributed architectures
and operating systems.  The project used state of the art techniques to
design a distributed operation system kernel that can be supported on
conventional, unreliable hardware, and be more reliable than the
underlying electronics.  Several approaches to the problem were
considered, and after substantial research and construction effort, the
current design emerged.  This design unifies simplicity with efficiency
and advanced concepts.  The resulting system is quite versatile and can
be adapted easily to suit most requirements of reliable distributed
computing, in many different hardware configurations.  The design is
largely hardware independent and independent of system configuration.
%X This report describes the object and action based approach to building
operating systems as incorporated in Clouds.  We also describe in some
detail the salient features of the system and the research directions
that the project is expected to take.

%A P. Dasgupta
%A M. Morsi
%T An Object-Based Distributed Database System Supported on the Clouds Operating
 System
%R Technical Report GIT-ICS-86/07
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1986
%X Many database systems are built on top of conventional off-the-shelf
operating systems.  However most operating systems lack the kind of
support necessary to measure up to the consistency and recovery needs
of database systems.  This entails the need for modifying and adding
some operating system services, and a creation of a database system
service layer on top of the existing system to provide the database
services.  These services generally comprise of synchronization
routines (or concurrency control), crash recovery protocols,
transactions commit, and rollback protocols.
%X At Georgia Tech, the \fIClouds\fP project is actively involved in
building the \fIClouds\fP operating system that provides all of the
above mentioned services in a distributed integrated environment.  We
are interested in investigating techniques to implement reliable
distributed database systems on the \fIClouds\fP environment.
%X This paper presents a design for such a database system.  We discuss
the design of a distributed relational database system using the object
paradigm.  We discuss approaches to techniques that handle storage and
handling of relations, relational operators and their implementations,
concurrency control, failure and recovery, and transaction commit.  We
show how our design exploits the \fIClouds\fP environment and fits in
with the services provided by \fIClouds\fP.  The design of the database
is conceptually quite simple, elegant and yet completely general,
effective and efficient.
%X We also deal with replication of data in the database system.
\fIClouds\fP does not effectively handle replication, as the location
independence of data in the \fIClouds\fP system nearly does away with
the need for replication in most applications.  However, in efficient
implementations of database systems, there is a need for providing
support for replicated data, and we present a scheme that provides
quicker data access through replication.

%A P. Dasgupta
%T A Probe-Based Monitoring Scheme for an Object-Oriented Distributed Operating
System
%J Proceedings of the Conference on Object Oriented Programming Systems, Languag
es and Applications
%C Portland, OR
%D Sept. 1986
%I ACM SIGPLAN
%O Also available as Technical Report GIT-ICS-86/05
%P 57-66
%K OOPSLA Clouds
%X Research in the field of concurrency control for database systems has
given rise to many techniques of ensuring consistency in multiuser
database systems.  However claims of superiority of proposed protocols
have mainly been supported by intuitive reasoning.  Simulation is one
of the methods that can be used to demonstrate efficiency and
practicality of the mechanisms when analytical methods are not easily
available.
%X A simulator provides a concrete, and often the only practical way of
judging the merits of different strategies under various conditions of
operation.  It can provide statistical insight into the different
factors that affect performance, and the correlation of these factors
with desirable features.  It can thus be also used for fine-tuning
existing protocols.  Finally, it can be used as a verification tool and
for enhancing our intuition about the issues involved.
%X We first describe very briefly some of the previous research dealing
with database performance evaluation; it serves mainly to lead the
interested reader to relevant literature.  We then proceed to the
description of our model of the distributed database operating system
and what needs to be accounted for in a useful simulator.  Later we go
into substantial detail about the simulation and implementation
techniques to provide the reader with information about the exact
simulation environment, sufficient to judge the validity and the
usefulness of the results obtained.  Finally, we describe the results
of simulating several concurrency protocols and present our
interpretations.

%A G. G. Kenley
%T An Action Management System for a Distributed Operating System
%R M.S. Thesis
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1986
%K thesis Clouds
%O Also released as technical report GIT-ICS-86/01
%X The goal of constructing reliable programs has led to the introduction
of transaction (action) software into programming environments.  The
further goal of constructing reliable programs in a distributed
environment has led to the extension of transaction systems to operate
a more decentralized environment.
%X We present the design of a transaction manager that is integrated
within the kernel of a decentralized operating system: the
\fBClouds\fP kernel.  This decentralized action management system
supports nested actions, action-based locking, and efficient facilities
for supporting recovery.  The recovery facilities have been designed to
support a systems programming language which recognizes the concept of
an action.  We also present a search protocol to locate objects in this
distributed environment.
%X \fIOrphans\fP, disjoint parts of actions that have aborted, are
identified and eliminated using a time-driven orphan detection scheme
which requires a clock synchronization protocol; we present the
facilities necessary to generate a system-wide global clock to support
that protocol.
%X The design goal of this implementation has been to achieve the
performance necessary to support an experimental testbed which can
serve as the basis for further work in the area of decentralized
systems.

%A R. J. LeBlanc
%A C. T. Wilkes
%T Systems Programming with Objects and Actions
%J Proceedings of the Fifth International Conference on Distributed Computing Sy
stems
%C Denver
%D July 1985
%O Also released, in expanded form, as technical report GIT-ICS-85/03
%K 5ICDCS Aeolus Clouds
%X The goal of the Clouds project at Georgia Tech is the implementation of
a fault-tolerant distributed operating system based on the notions of
objects and actions, which will provide an environment for the
construction of reliable applications.  As part of the Clouds project,
we are designing and implementing a high-level language in which those
levels of the Clouds system above the kernel level will be
implemented.  The Aeolus language provides access to the
synchronization and recovery features of Clouds.  It also provides a
framework with which to study programming methodologies suitable for
action-object systems such as Clouds.
%X This paper provides a brief introduction to the features of the Clouds
system which provide support for programming of objects and actions,
and how these features are made available in the Aeolus language.  We
also present an example Aeolus object from our initial studies in
programming methodologies for Clouds which demonstrates the use of
these features for programming recoverable objects.

%A C. Lin
%T The Design of a Distributed Debugger for Action-Based Object-Oriented Program
s
%R Ph.D. Diss.
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1987
%K thesis Clouds
%O In progress

%A M. S. McKendry
%A J. E. Allchin
%A W. C. Thibault
%T Architecture for a Global Operating System
%J Proceedings IEEE Infocom
%C San Diego, CA
%D April 1983
%K Clouds
%X Global operating systems are suited to distributed, local-area network
environments.  A decentralized global operating system can manage all
resources globally, relying on functional requirements for resource
allocations, rather than the relative physical locations of the
resource allocation mechanism and the resource itself.  Among the
advantages of global operating systems are the ability to use idle
resources and to control the environment as a single cohesive entity.
This paper introduces an architectural approach to supporting
decentralized global operating systems.  The approach addresses the
problem of managing distributed data by incorporating specialized data
management facilities in the kernel.  This data management is
especially useful to the operating system itself.  A capability-based
access scheme provides flexible, control of resources and autonomy.
The approach is being utilized in the \fBClouds\fP operating system
project at Georgia Tech.

%A M. S. McKendry
%T Clouds: A Fault-Tolerant Distributed Operating System
%J Distributed Processing Technical Committee Newsletter
%I IEEE
%D 1984
%O Also issued as Clouds Technical Memo #42

%A M. S. McKendry
%T Fault-Tolerant Scheduling Mechanisms
%R (Unpublished Technical Report)
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D May 1984
%O Draft only
%K FTJS Clouds

%A M. S. McKendry
%T Ordering Actions for Visibility
%J Transactions on Software Engineering
%I IEEE
%V 11
%N 6
%D June 1985
%O Also released as technical report GIT-ICS-84/05
%K Clouds
%X Several research projects are studying architectures for distributed
computing that are founded on the notion of \fIatomic actions\fP
operating on \fIobjects\fP (instances of abstract data types).  The
\fIClouds\fP project at Georgia Tech is evaluating this approach as the
foundation for constructing distributed operating systems.  Objects are
not new to operating systems.  They provide substantial benefits in
such dimensions as protection and synchronization, as well as their
inherent organizational characteristics.  This paper is concerned with
synchronization to control ordering.  Conventional approaches require
substantial extension for the action environment.  Typically, they are
based on (or equivalent to) general semaphores.  Semaphores take no
account of the visibility requirements of actions however, and
consequently they can allow an action to progress beyond the point at
which its effects can be undone.  Also, they do not account for
failures.
%X This paper introduces examples to illustrate requirements for ordering
mechanisms.  A model of nested actions is then used as a basis for
categorizing visibility requirements.  These requirements go beyond
those typical of database systems, because often the entities managed
by operating systems cannot be recovered if an action fails.  Several
simplifications that apply to many operating system problems are
discussed.  Algorithms for controlling ordering are then presented,
with examples of their use.  We establish several expediencies that
result from ordering requirements.  In many situations, recovery for
nested actions can be implemented with a single backup copy of each
item, a single synchronization variable can be used to control
blocking, and generalized locking is not required.  These savings
appear to be fundamental to making the object-action approach viable
for operating system construction.

%A D. V. Pitts
%A E. H. Spafford
%T Notes on a Storage Manager for the Clouds Kernel
%R Technical Report GIT-ICS-85/02
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1985
%X The Clouds project is research directed towards producing a reliable
distributed computing system.  The initial goal of the project is to
produce a kernel which provides a reliable environment with which a
distributed operating system can be built.  The Clouds kernel consists
of a set of replicated sub-kernels, each of which runs on a machine in
the Clouds system.  Each sub-kernel is responsible for the management
of resources on its machine; the sub-kernel components communicate to
provide the cooperation necessary to meld the various machines into one
kernel.
%X This report documents a portion of that research, namely, the
implementation of a kernel-level storage manager that supports
reliability.  The storage manager is a part of each sub-kernel and
maintains the secondary storage residing at each machine in our
distributed system.  In addition to providing the usual data transfer
services, the storage manager ensures that data being stored survives
machine and system crashes, and that the secondary storage of a failed
machine is recovered (made consistent) automatically when the machine
is restarted.  Since the storage manager is a part of the Clouds
kernel, efficiency of operation is also a concern.  We wish to reduce
the overhead required to ensure the recoverability of secondary storage
as much as possible, while adhering to the design goals associated with
the storage manager.

%A D. V. Pitts
%T Storage Management for a Reliable Decentralized Operating System
%R Ph.D. Diss.
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1986
%K thesis Clouds
%O Also released as Technical Report GIT-ICS-86/21
%X Decentralization of computing systems has several attractions:
performance enhancements due to increased parallelism; resource
sharing; and the increased reliability and availability of data due to
redundant copies of the data.  Providing these characteristics in a
decentralized system requires proper organization of the system.  With
respect to increasing the reliability of a system, one model which has
proven successful is the object/action model, where tasks performed by
the system are organized as sequences of atomic operations.  The system
can determine which operations have been performed completely and so
maintain the system in a consistent state.
%X This dissertation describes the design and a prototype implementation
of a storage management system for an object-oriented, action-based
decentralized kernel.  The storage manager is responsible for providing
reliable secondary storage structures.  First, the dissertation shows
how the object model is supported at the lowest levels in the kernel by
the storage manager.  It also describes how storage management
facilities are integrated into the virtual memory management provided
by the kernel to support the mapping of objects into virtual memory.
All input and output to secondary storage is done via virtual memory
management.  This dissertation discusses the role of the storage
management system in locating objects, and a technique intended to
short circuit searches whenever possible by avoiding unnecessary
secondary storage queries at each site.  It also presents a series of
algorithms which support two-phase commit of atomic actions and then
argues that these algorithms do indeed provide consistent recovery of
object data.  These algorithms make use of virtual memory management
information to provide recovery, and relieve the action management
system of the maintenance of the stable storage.

%A D. V. Pitts
%T Object Memory and Storage Management in the Clouds Kernel
%R Technical Report GIT-ICS-87/15
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D March 1987
%X The Clouds kernel is a native layer distributed kernel supporting the
Clouds  operating system.  Clouds is a distributed object based system,
designed to support fault tolerance, location independence and an
action/object programming environment.
%X Some of the key issues in supporting  Clouds  are the availability of
Object Memory, Object Location and Object Recovery. Object Memory
provides a set of global, permanent, named address spaces for storing
objects. The address spaces resemble conventional segmentation schemes,
but are persistent and thus replaces both the computational and storage
systems used in conventional schemes by a more powerful paradigm. The
Object Location system provides transparent object invocation
mechanisms throughout the distributed environment. The Object Recovery
system support recoverable objects through shadowing and two-phase
commit techniques to allow atomicity of actions.
%X This paper describes, in brief, the key issues in the design and
implementation of the Object Memory and Storage Management system, that
provides all the mentioned facilities. The implementation is
operational and in use by the  Clouds  Project at Georgia Tech.

%A E. H. Spafford
%A M. S. McKendry
%T Kernel Structures for Clouds
%R Technical Report GIT-ICS-84/09
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1984
%X In the past few years, a great deal of research has been focused on the
potential benefits of distributed systems.  In particular, a
distributed system offers the potential of a fault-tolerant computing
environment.  A distributed system also suggests increased computing
power through the combination and application of resources.  The
presence of multiple machines, however, raises many questions relating
to communication, consistency, reliability, configuration, and user
interfaces, to name just a few.  These questions are difficult to
address, and that is perhaps the reason why so few attempts have been
made to construct actual distributed systems.  Interesting recent work
in this area includes the \fIEden\fP project at the University of
Washington, the \fIArgus\fP project at MIT, the \fIAccent\fP system at
CMU, and the \fIISIS\fP project at Cornell.
%X The \fIClouds\fP project is an approach to the construction and
application of a distributed system that is intended to address these
questions.  We support the room full of computers'' view of
distribution.  In this view, the user sees a single resource, despite
physical distinctions.  In our research approach, this is achieved by
constructing a highly-transparent multicomputer operating system with
low-lever support for maintaining consistent data items.  A
\fImulticomputer\fP or \fIcomputer cluster\fP is a system of many
computers joined into one large system.  The system's distribution is
\fItransparent\fP to users and to most operating system components in
the sense that the user is not aware of the nature or number of
machines which compose the multicomputer.  The user's data and
processes may be distributed throughout the multicomputer system, or
they all may be located on one processor -- there is no observable
difference to the user, nor is there any need for the user to be aware
of the configuration.  We support this transparency during \fIupward
configuration\fP -- the addition of more machines, and during
\fIdownward reconfiguration\fP -- the removal of failure of machines.
%X \fIClouds\fP supports abstract data objects at a very low lever.  These
objects are used to build the operating system and applications.  Some
of these objects may be made \fIrecoverable\fP (operations on those
objects may be undone or reversed in the event of failure or error).
%X \fIAtomic transactions\fP or \fIactions\fP are used by both the
operating system and user applications to maintain consistency and
recoverability of data and operations.  The design makes use of actions
and objects to provide reliable operating system services, such as job
schedulers, and thus provide a fault-tolerant system.
%X The principles and motivations behind the \fIClouds\fP project have
been described in more depth in several other documents.  The authors
assume that the reader is already acquainted with the \fIClouds\fP
project and is somewhat familiar with the goals outlined in those
documents.  This paper is intended to be an introduction to the
internal structures of the \fIClouds\fP kernel.  We will be
constructing an experimental \fIClouds\fP system during the next few
years using dedicated minicomputers and personal computers.  Further
description of the \fIClouds\fP kernel will be done as this
experimental system continues to be designed and constructed.

%A E. H. Spafford
%T Kernel Structures for a Distributed Operating System
%R Ph.D. Diss.
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1986
%K thesis Clouds
%O Also released as technical report GIT-ICS-86/16
%X In recent years there has been considerable interest in developing
distributed computing systems.  Distribution of computing resources
suggests many possible benefits including greater flexibility, enhanced
computing power through greater parallelism, and increased
reliability.
%X In practice, achieving any of these benefits has been difficult, since
a distributed system also presents potential problems in naming,
synchronization, and the effective use of resources.  Consistency
problems arise when dealing with operations and data structures that
may span machine and device boundaries; that is, should a
communications or machine failure occur at an inopportune time, the
data may be left in an unknown, incorrect, or inaccessible condition.
This type of problem is certainly undesirable in user programs, but
special problems arise when operating system data structures become
inconsistent.  Due to the larger number of components involved in a
distributed system, these problems are more likely to occur and more
damaging in their effects.
%X Since 1982, the  Clouds  project has been researching an approach to
the construction of a distributed computing environment intended to
address these concerns.  The  Clouds  operating system is intended to
reliably support effective use of distributed resources. Some of that
design is derived from the action/object model of computation developed
in Jim Allchin's dissertation.  That work suggested an architecture for
a distributed, reliable computing system built from abstract data
objects and atomic transactions.  The architecture, properly
implemented, can be used to address many of the problems presented by
distributed systems.  However, Allchin's work does not address the
structure or implementation of the kernel and operating system services
necessary for a functional distributed system.
%X This dissertation explores the requirements for services and structures
needed to support a distributed computing environment as suggested by
Allchin's work.  It contains the design of a distributed operating
system kernel which meets these requirements and which could flexibly
support various implementations of the  Clouds  reliable system as well
as other forms of object-oriented distributed systems.  This
dissertation also describes a prototype implementation, which was done
to help refine and validate the design and provide a testbed for
further research.

%A Eugene H. Spafford
%T Object Operation Invocation in Clouds
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%R Technical Report GIT-ICS-87/14
%D February 1987
%K RPC
%O Submitted to SOSP
%X Many distributed operating systems have been developed in recent years
based on the action/object paradigm.   The Clouds multicomputer system
provides a fault-tolerant distributed computing environment built from
passive data objects, fault-atomic transactions, and a global kernel
interface.  Large portions of the Clouds operating system and
supporting software, and all user-level software are being constructed
from these constructs.
%X Important to the successful functioning of Clouds is the uniform
operation invocation mechanism.  The mechanism is flexible, powerful
and easily understood.  It allows plain processes or nested
transactions to access user and system objects in a transparent,
uniform manner, whether those objects are local to the current machine
or on some remote processor.  The same basic interface used to make
operation invocation requests on objects can be used to spawn processes
and actions, and to gain access to restricted kernel services.
%X This paper presents an abbreviated description of the Clouds philosophy
and some of its kernel features as they relate to object operation
invocation.  Included is a presentation of the structure and operation
of the invocation mechanism and its support for some of Clouds' design
goals.  Support for remote invocation, per-object access control, and
location independent invocation are also presented.  This should give
the reader some understanding of the integrated nature of the three
basic Clouds primitives--objects, actions, and processes--as well as
insight into how they are supported.

%A H. Strickland
%T Networking Support for a Distributed Operating System
%R M.S. Thesis
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1987
%K Clouds
%O In progress

%A Peter Wan
%T A Disk Driver for an Action-Oriented Operating System
%R M.S. Thesis
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1987
%O In progress
%K Clouds

%A C. T. Wilkes
%T Preliminary Aeolus Reference Manual
%R Technical Report GIT-ICS-85/07
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%D 1985
%O Last Revision: 17 March 1986
%K Clouds
%X The goal of the Clouds project at Georgia Tech is the implementation of
a fault-tolerant distributed operating system based on the notions of
objects, actions, and processes, which will provide an environment for
the construction of reliable applications.  The Aeolus programming
language developed from the need for an implementation language for
those portions of the Clouds system above the kernel level.  Aeolus has
evolved with these purposes:  to provide the power needed for systems
programming without sacrificing readability or maintainability; to
provide abstractions of the Clouds notions of objects, actions, and
processes as features within the language; to provide access to the
recoverability and synchronization features of the Clouds system; and
to serve as a testbed for the study of programming methodologies for
action-object systems such as Clouds.
%X Thus, the main interest of Aeolus lies not in the language itself, but
in what may be done with the language.  We have avoided providing
high-level features for programming actions with the intention of
evolving designs for such features out of our experience with
programming in Aeolus.  These features will then be incorporated into
an applications language for the Clouds system.
%X This report is not intended to be a tutorial on the Aeolus language;
rather, it strives to be a concise definition of the syntax and
semantics of Aeolus, and thus should serve as a reference for
programmers and implementors.

%A C. T. Wilkes
%A R. J. LeBlanc
%T Rationale for the Design of Aeolus: A Systems Programming Language for an Act
ion/Object System
%J Proceedings of the 1986 International Conference on Computer Languages
%I IEEE Computer Society
%C Miami, FL
%D October 1986
%P 107-122
%O Also available as Technical Report GIT-ICS-86/12
%K Clouds
%X The goal of the Clouds project at Georgia Tech is the implementation of
a fault-tolerant distributed operating system based on the notions of
objects, actions, and processes, to provide an environment for the
construction of reliable applications.  The Aeolus programming language
developed from the need for an implementation language for those
portions of the Clouds system above the kernel level.
%X Aeolus has evolved with these purposes:  to provide the power needed
for systems programming without sacrificing readability or
maintainability; to provide abstractions of the Clouds notions of
objects, actions, and processes as features within the language; to
provide access to the recoverability and synchronization features of
the Clouds system; and to serve as a testbed for the study of
programming methodologies for action-object systems such as Clouds.
%X In this paper, the features provided by the language for the support of
readability and maintainability in systems programming are described
briefly, as is the rationale underlying their design.  Considerably
more detail is devoted to features provided for support of object and
action programming.  Finally, an example making use of advanced
features for action programming is presented, and the current status of
the language and its use in the Clouds project is described.

%A C. T. Wilkes
%T Programming Methodologies for Resilience and Availability
%R Ph.D. Diss.
%I School of Information and Computer Science, Georgia Institute of Technology
%C Atlanta, GA
%K thesis Clouds
%D 1987
%O In progress
%X The goal of the Clouds project at Georgia Tech is the implementation of
a fault-tolerant distributed operating system based on the notions of
objects and actions, which will provide an environment for the
construction of reliable applications.  As part of the Clouds project,
we have designed and implemented a high-level language in which those
levels of the Clouds system above the kernel level are being
implemented.  The Aeolus language provides access to the
synchronization and recovery features of Clouds.  It also provides a
framework within which to study programming methodologies suitable for
action-object systems such as Clouds.  This dissertation describes
programming methodologies appropriate to the design of fault tolerant
servers needed in the Clouds system.  Among the properties needed by
these objects are resilience and availability.
%X As part of this research, several case studies which will serve as
designs for actual Clouds servers have been developed in Aeolus.  Among
the issues examined using these case studies are:  the use of knowledge
about the semantics of an object, as opposed to automatic provisions,
in designing for resilience and availability; the tradeoffs between
consistency and availability for such objects; the support from the
Aeolus runtime system and from the Clouds kernel needed for providing
fault tolerance; and high-level language features for resilience and
availability which may be derived from experience with programming in
Aeolus.

----------
C.T. "Tom" Wilkes
School of Information & Computer Science, Georgia Tech, Atlanta GA 30332
CSNet:  wilkes @ gatech        ARPA:  wilkes @ ics.gatech.edu
uucp:  ...!{akgua,allegra,hplabs,ihnp4,seismo,ulysses}!gatech!stratus!wilke o