[comp.sw.components] Concurrent Search Structure Algorithms

wtwolfe@hubcap.clemson.edu (Bill Wolfe) (08/25/89)
The construction of reuseable software which implements the concept of an
abstract data type (ADT) is a primary method of leveraging one's investment
in software.  In advanced programming languages such as Ada, it is possible
to construct secure ADTs whose instances can be shared among multiple tasks
such that an instance can process multiple service requests concurrently. 
This article reviews the paper on ADT design entitled "Concurrent Search 
Structure Algorithms", by Dennis Shasha and Nathan Goodman, which considers 
the design and verification of highly concurrent algorithms on the ADT 
known as the dictionary, which consists of the actions Member, Insert, 
and Delete over some arbitrary key space.  

We restrict our attention to unbounded dictionary implementations; such 
implementations typically involve data structures which contain nodes and
arcs, i.e., (possibly multirooted) tree structures.  Certain designated
sequences of instructions, typically involving accesses to nodes and/or arcs, 
are collectively referred to as operations; the execution of an action consists
of a series of such operations.  Operations are considered atomic if no 
instruction not belonging to the operation under consideration can modify
any data accessed by any instruction in the operation while the operation
executes; we generally restrict our attention to operations which are atomic.

Most approaches to concurrency control define serializability using a 
syntactic criterion whereby a concurrent computation of some set of ADT 
actions is serializable if 1) the operations are the same as the operations 
in some serial execution of the actions, and 2) the serial and concurrent 
computations order conflicting operations in the same way, where operations 
are deemed to conflict if they both operate on the same item and one or both 
operations modifies that item. For example, consider a Library ADT which 
includes the actions Borrow and Reshelve; suppose that the process handling 
a Borrow request descends a tree structure in search of the node containing 
a particular book, but upon arrival finds that a Reshelve process has moved 
that book to another node.  Fortunately, the Reshelve process left a pointer
to the book's "new shelf", so our Borrow process follows the pointer to the 
desired book, which is then Borrowed.  

This violates the syntactically defined serializability criterion, because 
in any serial execution of the Borrow and Reshelve actions, the Borrow process
would only visit one "shelf"; however, in this concurrent execution, it 
actually visited two shelves.  Thus, we have a situation in which the 
syntactic criterion is too restrictive, and this costs us a certain amount
of parallelism which could otherwise be achieved.  Therefore, this paper 
considers a semantically based definition of serializability: a concurrent 
computation of ADT actions starting in any representation of a state S is 
serializable if there is some serial sequence of those ADT actions such that 
the value returned by each serially executed ADT action, if any, is the same
as the value returned by the corresponding concurrent execution of that ADT 
action, and the final state of the ADT after the serial execution is 
equivalent to the final state of the ADT after the concurrent execution.

In order to exploit the increased parallelism provided by this semantically
based approach to serializability, it is useful to distinguish between
decisive and non-decisive operations.  Assume that more than one state of
the implementing data structure can represent a single state of the 
dictionary; then a non-decisive operation is one which may modify the form 
of the data structure but cannot modify the state of the dictionary, and
does not influence the return value of the action of which it is a part;
a decisive operation may modify the state of the dictionary, and it must
carry the semantics of the enclosing action.  Only decisive operations can
conflict; non-decisive operations, although they may modify the *form* of
the data structure, may interleave in arbitrary ways because they do not
change the data structure's *contents*.  The syntactic definition is again
too restrictive in that it would force non-decisive operations to conflict
simply because nodes or arcs are being modified, without regard to the
potential gain in concurrency available due to the non-decisive semantics.  

Three locking protocols are considered: lock-coupling, in which a process 
locks the child of the node to which it is about to descend before releasing
its lock on the current node, and two other protocols in which a process 
releases its lock on the current node before obtaining a lock on the node
to which it wishes to travel: the link technique, in which an item relocation
will result in the addition of a pointer from the "old" node to the "new" node 
and a process which finds such a pointer simply engages in pointer-chasing
until the proper node is arrived at, and the give-up technique, in which each 
node has a field indicating the range of items for which that node or its 
descendants are currently "responsible", and a process which arrives at a 
given node must check to see whether the item being searched for is currently 
within this range of items.  If not, the process gives up, retreats to some 
ancestor of the current node, and resumes the search; the process eventually 
does arrive at the node which is currently responsible for the item being 
searched for, after the ancestors of the desired node have been completely 
updated with the appropriate range information.  Simulation studies of these 
techniques are cited which suggest that the increased concurrency provided by 
the link and give-up techniques does in fact result in better performance than 
the lock-coupling technique, and those techniques are therefore emphasized.

Another design option considered is the question of when to perform
restructuring.  Dictionary actions are considered in terms of two phases:
a locate phase, in which non-decisive operations are used to locate the 
key in question, and a decisive phase containing a single decisive operation,
in which the contents of the dictionary may be modified.  It is noted that 
some of the algorithms presented by other authors have chosen to perform 
restructuring during the locate phase, and that locate-phase restructuring 
algorithms must either acquire unnecessary write-locks or attempt to promote 
read-locks to write-locks when a node must be split; the first approach 
restricts concurrency and therefore carries a performance penalty, and 
the second introduces the possibility of deadlock.  Therefore, it is 
suggested that restructuring be performed either asynchronously, by a 
separate process which is dedicated to maintenance activities, or by a 
cleanup phase which occurs after the decisive phase, or both.  

The paper describes a mathematical model of dictionary actions and search
structures, defines computations over that model, and derives conditions 
which, if satisfied by a computation, will guarantee serializability.  These
conditions are then used to prove the validity of three generic algorithms,
one for each of the three locking protocols, which can be used to implement
the three dictionary actions (Member, Insert, and Delete) on an arbitrary
search structure in such a way that the implementing algorithms are
guaranteed to preserve semantic serializability.  Examples are given for 
a B+ tree, a hashing table, and a fixed binary+ forest, and the specific
algorithms derived from the generic templates are mathematically proven
to be serializability-preserving algorithms.  Finally, there is a brief
consideration of recoverability issues, in which it is suggested that
either all restructuring be performed by asynchronous processes so as to
enable the use of standard recovery mechanisms, or a nested transaction
model be used in order to bound the recovery requirements.

Assuming the use of atomic operations, certain guidelines are presented
for the use of lock-coupling, link, and give-up protocols.  These guidelines
apply to the specific procedures which are used to instantiate the generic
templates;  by adhering to these guidelines, one can be certain that semantic
serializability will be preserved.  Three guidelines are common to all three
techniques, and the fourth guideline takes on a different form (but achieves
the same objective) for each technique.  First, a few definitions:  a good
state of a dictionary data structure is one in which each node is responsible
for a distinct portion of the key space (that portion is referred to as the
node's keyset), the separate regions of the key space controlled by individual 
nodes partition (or cover, if the data structure allows duplicates) the entire 
key space, and the keys contained in each node are a subset (not necessarily 
proper) of the portion of the key space controlled by that node.  The global 
contents of the data structure are simply the combined contents of each of the 
individual nodes.  The first guideline requires that every non-decisive
operation map a good state to another good state.  The second guideline 
requires that decisive operations carry the semantics of the enclosing action,
and that decisive operations also map a good state to another good state.
The third guideline requires that non-decisive operations refrain from
modifying the global contents.  The fourth guideline serves to ensure for
each specific technique that when one action holds a lock on some node,
no other concurrently executing action will change some important global
property of that node such as the set of keys for which that node and its
descendants are currently responsible.  Since the lock coupling technique
has fared poorly in simulations, we now concentrate our attention on the
link and give-up techniques.  For the link technique, the specific requirement
of the fourth guideline is that if an operation removes a key from the set
of keys for which a given node or its descendants are currently responsible,
including the descendants which arise as a consequence of the pointers left
behind due to physical key relocations, then no locate-phase operation of
any other action which is searching for that key will access that node
after the completion of that operation.  For the give-up technique, the
fourth guideline specifically requires that the range field of a node 
cannot include any key for which neither that node nor any of its descendants 
is currently responsible, and that if some operation removes a node 
from the data structure, then no other action can access that node after 
the completion of that operation.  With these guidelines in mind, the 
generic templates are now presented in the form of generic Ada source code 
which can be instantiated to meet the requirements of any dictionary search 
structure.  We close with comments regarding the modifications which would 
be necessary if the data structure were required to handle set-valued queries. 

   type NODE_TYPE;     -- this type definition must be completed...

   type KEY_TYPE;      -- and this one too...

   type RESPONSE_TYPE (SEARCH_COMPLETED : BOOLEAN) is
      record
         case SEARCH_COMPLETED is
            when True  => KEY_LOCATED : BOOLEAN;
            when False => NEXT_STOP   : NODE_TYPE;
         end case;
      end record;

   generic   -- instantiate this to form function SEARCH, which then 
             --   serves as an instantiation parameter for PERFORM_ACTION.

      with function NODE_RESPONSIBLE_FOR_KEY
                       (SOME_NODE : NODE_TYPE;
                        SOME_KEY  : KEY_TYPE  ) return BOOLEAN;
  
      with function NODE_CONTAINS_THE_KEY 
                       (SOME_NODE : NODE_TYPE;
                        SOME_KEY  : KEY_TYPE  ) return BOOLEAN;

      with function DETERMINE_PATH_OF_DESCENT
                       (SOME_NODE : NODE_TYPE;
                        SOME_KEY  : KEY_TYPE  ) return NODE_TYPE;

   -- with function SOME_ANCESTOR                                    -- give-up 
   --                  (OF_THE_NODE : NODE_TYPE) return NODE_TYPE;   --   only

   -- with function WITHIN_THIS_SUBTREE                              -- give-up 
   --                  (SUBTREE_ROOT : NODE_TYPE;                    --   only 
   --                   SOME_KEY     : KEY_TYPE  ) return BOOLEAN;

   function SEARCH_DATA_STRUCTURE 
                       (SOME_NODE : NODE_TYPE;
                        SOME_KEY  : KEY_TYPE  ) return RESPONSE_TYPE is

   begin  -- function SEARCH_DATA_STRUCTURE


      if NODE_RESPONSIBLE_FOR_KEY (SOME_NODE, SOME_KEY) then


         RESPONSE := (SEARCH_COMPLETED => True,
                      KEY_LOCATED      => NODE_CONTAINS_THE_KEY 
                                             (SOME_NODE, SOME_KEY) ); 

   -- For the give-up technique:
   --
   -- elsif WITHIN_THIS_SUBTREE (SOME_NODE, SOME_KEY) then
   --
   --    RESPONSE := (SEARCH_COMPLETED => False,
   --                 NEXT_STOP        => DETERMINE_PATH_OF_DESCENT
   --                                        (SOME_NODE, SOME_KEY) );
   -- else
   --    RESPONSE := (SEARCH_COMPLETED => False,
   --                 NEXT_STOP        => SOME_ANCESTOR (SOME_NODE) );
   -- end if;


   -- For the link technique:
   --
   -- else 
   --    RESPONSE := (SEARCH_COMPLETED => False,
   --                 NEXT_STOP        => DETERMINE_PATH_OF_DESCENT
   --                                        (SOME_NODE, SOME_KEY) );
   -- end if;


   end SEARCH_DATA_STRUCTURE;

   ---------------------------------------------------------------------

   generic

      with function SEARCH (SOME_NODE : NODE_TYPE;
                            SOME_KEY  : KEY_TYPE  ) return RESPONSE_TYPE;

      with procedure LOCK (SOME_NODE : NODE_TYPE);

         -- The type of lock to set will depend on your data structure
         --   and the action involved.  For a Member action, whose semantics
         --   do not include any modification of the global contents, the
         --   lock could always be a read-lock.  If the action's semantics
         --   do include a modification of the global contents, the type of
         --   lock could still depend upon your data structure; if a node
         --   consists of a variant record which is either an interior node
         --   or a leaf, and only the leaves have "real" contents, then a
         --   read-lock could be set for interior nodes and a write-lock for
         --   leaves.  Otherwise, a write-lock is probably always required.

      with procedure UNLOCK (SOME_NODE : NODE_TYPE);

      with procedure DECISIVE_ACTION (SOME_NODE : NODE_TYPE;
                                      SOME_KEY  : KEY_TYPE  );

         -- For a Member, this will decide whether the key is in the node;
         --   for an Insert, this will insert the key into the node; for a
         --   Delete, this will remove the key from the node, if present. 

   procedure PERFORM_ACTION (SOME_ROOT  : NODE_TYPE;
                             SEARCH_KEY : KEY_TYPE   ) is

      KEY      : KEY_TYPE;
      NODE     : NODE_TYPE;
      RESPONSE : RESPONSE_TYPE;

   begin  -- procedure PERFORM_ACTION

      NODE := SOME_ROOT;

      loop
         LOCK (NODE);
         RESPONSE := SEARCH (NODE, SEARCH_KEY);
         exit when RESPONSE.SEARCH_COMPLETED = True;
         UNLOCK (NODE);
         NODE := RESPONSE.NEXT_STOP;
      end loop;

      -- At this point, NODE is locked and is known to be responsible for
      --  storing this particular SEARCH_KEY, if it is in the dictionary.

      DECISIVE_ACTION (NODE, SEARCH_KEY);

      UNLOCK (NODE);

   end PERFORM_ACTION;

If the data structure is required to handle potentially set-valued queries,
then it is necessary to generalize the decisive phase such that it is possible
to have multiple decisive operations per action.  This requires a new condition 
which must be satisfied by concurrent executions; since decisive operations 
can truly conflict, and since there may be more than one decisive operation 
per action (thus opening up the possibility of interleaved decisive operations
between different concurrently executing actions), concurrent executions must 
ensure that the decisive operation conflict graph is acyclic.  Essentially, 
given that only decisive operations conflict, the standard requirement that 
the serialization graph of the execution be acyclic must be satisfied with 
respect to those decisive operations; this was required in the single-valued 
query case also, but this condition was being automatically satisfied in that 
case because each action contained only one potentially conflict-generating 
operation.

Thus, aside from the obvious need to update the semantics of the actions 
whenever set-valued queries are involved such that the value returned by 
an action is the union of the return values of its decisive operations, 
it is necessary to ensure that no problems are caused by the need to perform 
decisive operations on more than one node in the course of a single action.  
One option is to construct a scheduler which will guarantee that two actions 
do not execute concurrently if their arguments have a non-null intersection 
and one of them carries semantics which involve modifying the global contents.
Another option is to have each action execute a breadth-first search, locking 
the nodes which are responsible for some key in the desired set; when all such
nodes have been locked, the action issues its decisive operations; this latter
option would amount to using the "Conservative 2-Phase Locking" protocol.

Given the continuously increasing demand for reliable, reuseable, high-
performance software, techniques for the construction of concurrency-
maximizing ADTs are of increasing economic importance.  The generation
and utilization of such techniques is an important means by which the
productivity of software systems can be continuously maximized, thus
further leveraging one's software investment.

"Concurrent Search Structure Algorithms" appeared in the March 1988 issue
of ACM Transactions on Database Systems, Vol. 13, No. 1, pages 53 through 90.
The paper is copyright 1988 ACM 0362-5915/88/03-0053 $01.50.  The use of
portions of this paper, which I have done freely, is by permission of the 
ACM; as a condition of that permission, this article on the contents
of that paper must not be distributed for direct commercial advantage.
To do so requires a fee and/or specific permission from the ACM.


   Bill Wolfe, wtwolfe@hubcap.clemson.edu