[comp.os.mach] Data sharing among lightweight tasks

cbs@swbatl.sbc.com (Brad Slaten 529-7636) (08/30/90)

    Our current application is using an object-oriented implementation
under the ISIS lightweight tasking library and Unix.  We have provided an
interface to the Inter-process Communication (IPC) mechanism such that
the application does not know when an object with which it is
communicating requires IPC.  The problem which occurs is that in
making the IPC transparent, the lightweight task context switch is
also transparent.  The problem results from the fact that certain
objects are shared among multiple tasks.  When a context switch occurs
from one task to another, any or all shared objects may be modified by
a different task without the initial task having knowledge of this.
    Any and all thoughts (regardless of how wild) on this problem, or
alternate approaches are welcome.  Please send via e-mail.

Cheers
Brad Slaten

P.S. We have thought about placing automatic checks on all shared
    objects upon return from the transparent IPC, but do not care to
    do recursive checks on objects for each IPC.

    We have considered making shared objects resources, but have
    not come up with a good solution to the deadlock problem.

-- 
cbs@swbatl:		Brad Slaten - 529-7636

ken@gvax.cs.cornell.edu (Ken Birman) (08/31/90)

In article <1990Aug30.164153.25008@swbatl.sbc.com> cbs@swbatl.sbc.com (Brad Slaten 529-7636) writes:
>
>    Our current application is using an object-oriented implementation
>under the ISIS lightweight tasking library and Unix.  We have provided an
>interface to the Inter-process Communication (IPC) mechanism such that
>the application does not know when an object with which it is
>communicating requires IPC.  The problem which occurs is that in
>making the IPC transparent, the lightweight task context switch is
>also transparent.  The problem results from the fact that certain
>objects are shared among multiple tasks.  When a context switch occurs
>from one task to another, any or all shared objects may be modified by
>a different task without the initial task having knowledge of this.

Because of the wide posting this received and the mention of ISIS, I
want to comment on how our group sees this issue.  (For those who
don't know about ISIS, we have a distributed computing toolkit
for building networking and fault-tolerant software.  Contact us
at isis@cs.cornell.edu and we can send information).

It strikes me that S.W. Bell is running into a "classical" problem
that has plagued the transactional community ever since it was first
suggested (by the ARGUS, TABS, CAMELOT and CLOUDS projects) that
transactions be used on abstract data types.  The transaction people
hit this from the other side: transactional serializability proves to
be so costly (in terms of concurrency lost, deadlock detection, orphan
handling, etc) that the database/transaction crowd ended up working with
all sorts of schemes to weaken serializability by using object semantics,
top-level transactions, etc.

S.W. Bell is running into this problem from the non-transactional side.
As they point out, some form of concurrency control is needed to protect
against unintended interactions between concurrent threads that call
the same set of objects.  This is particularly true when you go to 
a full-blown OO environment, since you generally don't know what 
operations will trigger IPC.  IPC in a distributed setting normally
blocks the sender, and almost all IPC/RPC systems will schedule other
threads at this time.  Thus, even without pre-emptive thread scheduling,
one sees what looks like thread pre-emption.

Unfortunately, there really isn't any good general solution to this
problem.  Probably the best known work focuses on exploiting the semantics
of the object interface to build inexpensive "local" concurrency control
mechanisms.  Alfred Spector published on this years ago, and more recently
Maurice Herlihy (DEC CRL) and Jeanette Wing (CMU) implemented some mechanims
along this line in a language called AVALON.  Interesting research, but I
don't think any commercial strength database group would claim that the
results are actually very promising.  The evidence is that the designer
of each object needs to do something specific to protect the state of that
object, and that while systems support builders (like my group) need to
provide good tools for solving this problem, nothing automatic has much chance of working well.

Of course, I am biased on this.  In fact, ISIS used to use transactions
we moved away from this about 5 years ago (although we still offer the
option of using transactions in our system).  Basically, the more we
worked with them, the less we liked them.  Even the simplest problems,
like building a queue of some other sort of object, lead into terrible
concurrency-control problems.  5 years later, I have still never seen
a large OO system that was actually built using transactions, although
the MIT group certainly has had success with some smaller ones.  (In
contrast, we have built some very large systems using ISIS and many
of these have a loose OO structure.)

ISIS V2.1 provides a token passing lock manager, and our papers include
discussion of how to do read and write locks on ISIS if one prefers these.
Our basic recommendation is that designers try and structure OO systems
in layers, with layer i permitted to call layer i+1 in unrestricted
ways, but with any calls from layer i to layer i or to layers i-k given
very careful scrutiny.  Because hierarchical locking is deadlock free,
any form of locking (ISIS favors a "monitor" style, or you can go with
semaphores) will work for the down-calls.  The granularity of the locks
should be application-specific (i.e. the designer decides what and when
to lock).  For calls within a layer or upcalls, there is a real risk of
deadlock and there is no choice except to implement a scheme that will
avoid this, using some clever mechanism (i.e. one would use some scheme
that is provably deadlock free).  Or, one can somehow detect deadlock
and "abort".  ISIS users would normally go for the former sort of mechanism,
which is much cheaper than supporting a generalized abort/rollback scheme.
ARGUS and CAMELOT (and AVALON) users would presumably view abort/rollback
as cheaper, since these systems use abort more casually, and would favor a
deadlock detection or timeout mechanism.

I should probably point to other non-transactional work that uses this
approach ("rollback if in trouble").  The best known is Jefferson's virtual
time system ("Time Warp OS") and the Strom/Yemeni system reported in
TOCS a few years back.  Neither approach really took off except in certain
specialized languages and applications, notably simulation.  Willy Zwaenapoel
(Rice) has recently done a version of this in his work on "Sender based
message logging", which is cheap, but doesn't support rollback and hence
isn't an appropriate tool for solving this particular problem (he is more
interested in a cheap transparent fault-tolerance mechanism, and anyhow,
he assumes a deterministic, non-threaded, execution).  Recently, I understand
that Rob Strom's group has done something similar under Mach, but again
the focus is no longer on arbitrary rollback but rather is on roll-forward
for fault-tolerance with determinism assumptions.  Multithreaded OO systems 
are usually not deterministic because of the thread scheduling issue.

To summarize, there seems to be no simple way out of this problem.

At the level of process group programming, which is what ISIS
focuses on, "virtual synchrony" buys us a lot of flexibility.
get similar milage out of a related property, transactional
serializability.  But, databases (with their simple TM/DM
execution model) and ISIS (with a completely non-transactional model)
really don't extend well to a fully general OO scheme with multithreaded
concurrency control.  Those of us who have worked in the area (I have)
might point to encouraging work, like ARGUS and AVALON, but the bottom
line is that there seems to be nothing in the way of a major breakthrough
that answers all the questions in a clean, transparent way.

I've read one recent paper that does a very good job of sorting out
the options.  The authors are members of the ANSA (Advanced Networked
Systems Architecture) group at ISA (a Cmabridge, England - based project
funded by Esprit to look at OO issues in network settings).  The
reference I have is:
	Access-specific concurrency control in distributed object systems.
	John Warne and David Oliver, ANSA/ISA Report APM/RC.084.04
	(August 1990)

	Copies available on request to: 
		apm@ansa.co.uk
		ANSA/ISA Project
		Poseidon House
		Castle Park
		Cambridge CB3 ORD, UK
		+44 223-323010
ANSA/ISA is a general OO programming "framework" within which things
like ISIS co-exist with transactional subsystems and other mechanisms.
I've followed this project closely for years and if you are working on
OO systems, I recommend that you have a close look at what they have done.

			Ken Birman
			ken@cs.cornell.edu or isis@cs.cornell.edu
			607-255-9199