[comp.object] Object-Oriented Databases and Memory Mapped Files

ricks@shambhala.Berkeley.EDU (Rick L Spickelmier) (08/13/90)

One quick method of implementing a persistent store for a language 
is to layout the persistent store just like main memory (including
pointers) and use memory mapped files or shared memory to
transparently map the data from disk to the appropriate location in
memory.  A few months back, someone in comp.object or comp.databases
mentioned a very early use of this technique (perhaps even in the
Persistent Algol work).  I lost the posting....  So is the original 
poster still around or does someone else have references to early
uses of this technique?

			Thanks,
				Rick

--
	    Rick L. Spickelmier
	    Electronics Research Laborartory
	    University of California at Berkeley

	    ricks@berkeley.edu, ...!ucbvax!ricks

noren@dinl.uucp (Charles Noren) (08/14/90)

In article <1990Aug13.095316@shambhala.Berkeley.EDU> ricks@berkeley.edu writes:
>One quick method of implementing a persistent store for a language 
>is to layout the persistent store just like main memory (including
>pointers) and use memory mapped files or shared memory to
>transparently map the data from disk to the appropriate location in
>memory...

This will provide a persistent store for a single process.  Other things
are needed to make it a database, notably concurrancy protection (read
and write locks) to prevent bad things from happening when more than
one process access the same object in memory.

At least one Object-Oriented Database (OODB) company does this,
Object Design Inc. (Eastern US phone: (617) 270-9797, Western U.S.
phone: (213) 412-8411).  Their product, ObjectStore (which is in
Beta right now, currently works only on Sun 3's and 4's with more to
come soon) puts database fetches into physical memory and maps the
process address space into it.  The C++ process referencing the data
uses C++ dereferencing mechanisms to access the data making for very
fast access for data in cache memory.  For data referenced outside of
cache, the ObjectStore runtime environment catches the segmentation
violation signal and determines what to bring in an remaps the
process address space into it.

We are in the process of purchasing ObjectStore from Object Design
because our evaluation of it (and particularly, the evaluation done
by another project in our company) has found this database to be very
fast.  The database is also extremely easy to use, basically you're
just writing C++ code to use the database.

...now if I can just coax Objective-C to use ObjectStore in the same way,
what a combination that would make!

-- 
Chuck Noren
NET:     dinl!noren@ncar.ucar.edu
US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260,
         Denver, CO 80201-1260
Phone:   (303) 971-7930

laffra@s8.cs.rul.nl (Chris Laffra) (08/17/90)

(long posting)

ricks@shambhala.Berkeley.EDU (Rick L Spickelmier) writes:

>> One quick method of implementing a persistent store for a language 
>> is to layout the persistent store just like main memory (including
>> pointers) and use memory mapped files or shared memory to
>> transparently map the data from disk to the appropriate location in
>> memory.  A few months back, someone in comp.object or comp.databases
>> mentioned a very early use of this technique (perhaps even in the
>> Persistent Algol work).  I lost the posting....  So is the original 
>> poster still around or does someone else have references to early
>> uses of this technique?

Here follows an article we wrote for TOOLS'90 held in Paris in June
of this year. It describes the implementation of persistent objects
in the object-oriented language Procol. We decided to use mapped memory
to make our object persistent. The paper described some design decisions
and a survey of existing work. 

	Chris Laffra
	Dept. of Computer Science,
	Leiden University,
	The Netherlands.

BTW, the figures mentioned in the paper are not included.

-----------------------------------------------------------------------

Persistent Graphical Objects

   Chris Laffra

   Department of Computer Science, University of Leiden P.O. Box 9512, 2500 
   RA Leiden, The Netherlands Email: laffra@cs.leidenuniv.nl

   Peter van Oosterom

   TNO Physics  and Electronics  Laboratory,  P.O.  Box 96864, 2509 JG  The  
   Hague, The Netherlands.  Email: oosterom@fel.tno.nl

Abstract

Persistent objects are objects whose contents  may  outlive  the  execution
time  of  the program. This paper describes the process of introducing per-
sistent objects in the object-oriented  programming  language  Procol.  The
strength  of  persistent objects in an object-oriented programming language
is the integration of a database system with a programming language.   Per-
sistent objects make the program development easier, because the programmer
does not have to implement the explicit loading and saving  of  data.   Be-
sides  the general functional aspects, special attention is paid to graphi-
cal applications in order to deal with their  specific  geometric  require-
ments.  For  example,  it must be possible to find, in an efficient manner,
all graphical objects that fall within a given region.  These issues,  per-
sistent  objects and their geometric requirements, have not yet got the at-
tention they deserve in the literature covering  object-oriented  graphical
systems where modeling and functional aspects dominate.

 1.  Introduction

The fact that the object-oriented approach is  so  successful  in  computer
graphics,  is  mainly  due  to  the  system  modeling capabilities that the
object-oriented paradigm offers. The specialization relationships which ex-
ist  between the graphical objects in a system, can be modeled with inheri-
tance or delegation which are present in many  object-oriented  development
environments.   Geometric  data types, such as points, vectors and matrices
may be implemented by abstract data types using object classes  [Blake  and
Cook,  1987,  Cox,  1986, Dietrich et al., 1989, Meyer, 1988].  However, in
practice this modeling power is not enough when implementing CAD systems or
Geographical  Information  Systems  (GISs) which deal with large data sets.
Integrated database facilities are required to support the graphical appli-
cation in an efficient manner.

In this paper we describe a solution that is based on the  introduction  of
persistent   objects  in  the  object-oriented  programming language called
Procol.   A short introduction to Procol is given in [Laffra  and  van  den
Bos,  1990],  also  contained  in  this proceedings.  A detailed functional
description of the C-based language Procol can be found in  [van  den  Bos,
1989]  and  some  implementation aspects in [van den Bos and Laffra, 1989].
The resulting system is also extendable with  new  (persistent)  types  and
operators.  In  itself  this  approach is not new and has been described by
several other authors [Atkinson and Buneman, 1987,  Richardson  and  Carey,
1989,  Straw,  et al., 1989], but we also tried to make the system suitable
for highly  interactive and graphical  applications. This goal is  achieved
by  putting  emphasis on both time-efficiency (offering techniques such as:
navigation, index structures,  and  parallelism)  and  the  recognition  of
multi-dimensional  data.  As an example, not only one dimensional, but also
multi-dimensional index structures are provided in Procol.  The emphasis in
this  document  is  on the inclusion of persistent graphical objects in the
syntax and semantics of Procol and on its implications  for  the  technical
realization;  e.g.,  how Procol deals with problems such as referential in-
tegrity and associative searching.

We will state the problem area in section 2 and define our general require-
ments  for  persistence  in  an object-oriented language in section 3.  Our
first attempt to introduce  persistence  in  Procol  and  comparisons  with
several other systems are described in section 4.  Building on this experi-
ence the following topics are discussed in  more  detail:  referential  in-
tegrity, (multi-dimensional) search problems, Procol extension alternatives
and object instances of different sizes in sections 5  through  8,  respec-
tively. In section 9 we present the solution as chosen in Procol. In nearly
all the sections the requirements of graphics play an important  role.  The
discussions  are  illustrated with examples based on graphical systems. Fi-
nally, section 10 indicates some topics where further research is required.

 2.  The Need For Persistence

The computer science research in the  area  of  programming  languages  em-
phasizes programming constructs and data structures.  One of the most popu-
lar paradigms is that of Abstract Data Types (ADTs).  Object-Oriented  Pro-
gramming  Languages  (OOPLs) encapsulate this paradigm in an elegant manner
using object types to describe the ADTs.  Access from outside to  the  data
inside  an  object  instance  is  only possible through the methods or pro-
cedures defined for that object type.  In this document  we  will  use  the
term   object type  to indicate a class of objects and the term  object  to
indicate one instance. In case more emphasis is needed we use the term  ob-
ject instance  explicitly.

The data stored in data structures (or objects) of a running program are in
general volatile, that is, as soon as the program stops, the data are lost.
However, in many applications the data itself are very important.  An obvi-
ous  solution  is  to save the data in a file by explicit write statements.
The next time the program is started it first reads the data from file into
the volatile data structures.  Persistent objects make the program develop-
ment more efficient, because the programmer does not have  to  worry  about
the  reading  and writing of data from and to disk.  Also, the structure of
the data file may become quite complex,  resulting  in  possibly  intricate
parsing.  Moreover, for large quantities of data this ''file'' solution be-
comes cumbersome during the execution of the program.  Consider as an exam-
ple  an  information system that registers bank accounts.  A characteristic
of this and many other information systems is that  the  objects  are  well
structured,  quite  passive and occur in large quantities.  Passive objects
are objects that hardly ever send messages to other objects (except for re-
plies);  they only react to messages from outside.  In the bank account ex-
ample, an account object replies its current amount when asked for, and up-
dates it when told so by a message from an authorized object.

Database Management Systems (DBMS) have been developed  to  deal  with  the
large amounts of data mentioned above. DBMSs concentrate on the information
representation and tackle related problems such  as,  integrity,  security,
redundancy,  consistency,  efficient  searching, query formulation and con-
currency control.  A major drawback is that the DBMSs of the  current  gen-
eration  are  not  extendable with new data types and operators. This makes
the use of these DBMSs inconvenient in non-standard applications that  need
support for other data types.

The database research community has recognized this deficiency and  is  now
trying  to  design   systems that are more open [Egenhofer and Frank, 1989,
Stonebraker and Rowe, 1986,  Wolf,  1989].  At  the  other  side  the  OOPL
research  community  has  recognized the need for persistent objects.  Now,
these two worlds meet. In some cases these encounters result  in  conflicts
because  of very different and incompatible principles. For example, expli-
cit (navigation) links amongst  instances are considered harmful by the da-
tabase  community,  because  they are hard to maintain. However, these same
links form the backbone in representing complex objects in OOPLs.   On  the
other  hand,  sometimes  the  combination  of  the two research communities
result in a nice symbiosis. An example of this is that the concurrency con-
trol  problem  in the database is solved by the model of an object that ac-
cepts messages one by one (as in Procol).

We are interested in interactive graphical applications, such as: CAD  sys-
tems,  VLSI Design, and GISs, see Figure 1. Common aspects in these systems
are: interactivity, graphics, and large amounts of data.  In [van  Oosterom
and van den Bos, 1988] we showed that the object-oriented approach offers a
good data modeling and design environment for GIS applications. In the same
paper  the  need  for persistent objects in our object-oriented programming
language Procol, was identified.

 3.  Our Wish List

This section shortly discusses the major requirements  for  persistent  ob-
jects  in  Procol. Some will be elaborated in one of the later sections, in
which case the section referred to will be mentioned here.  Note that these
requirements may neither be orthogonal nor independent of each other.

r1: Upward compatible.
    Persistent objects have to be introduced with a minimal change to
    Procol. This implies that existing Procol programs do not have to
    be changed in order to be compiled by the new version of the Pro-
    col compiler.

r2: Transparent persistent objects.
    Persistent objects are treated in the same manner as volatile ob-
    jects  by the application.  Perhaps except for incompatible data-
    base facilities; for example, associative searching, that is  ob-
    ject  searching  based on the contents or value of instances. At-
    kinson et al.[Atkinson et al., 1987] refine this  by  recognizing
    the following principles for persistent data (they assume several
    levels of persistence):

    1. Persistence independence: the persistence of an object is  in-
    dependent  of how the program manipulates that object. So, it has
    to be possible to call a procedure of which the actual parameters
    are  sometimes  persistent  objects  and other times volatile ob-
    jects.

    2. Persistence data type orthogonality: all objects  are  allowed
    the full range of persistence. This means that no matter how com-
    plicated the type is, its instances can still become persistent.

r3: Complex objects.
    This provides the system with  sufficient  modeling  power;  e.g.
    part-of hierarchies.   In Procol complex object types are defined
    by means of links (or references) to other object types that  to-
    gether  define the complex object. The complex objects are static
    in terms of the object type structure and dynamic in the sense of
    the object instances.

r4: Extendability with new ADTs.
    This wish might be a trivial one  in  the  context  of  OOPLs  or
    object-oriented  databases,  but  certainly not in the context of
    the traditional DBMSs.  The definitions  of  the  new  persistent
    ADTs  also  have to be stored somewhere, if we want  to  be  able
    to manipulate its object instances in a sensible manner.

r5: Efficient handling of large amounts of objects.
    Long-lived systems allow time for data to accumulate. This,  com-
    bined  with  the  fact that we aim at developing interactive sys-
    tems, justifies this efficiency requirement to be even  more  im-
    portant  than  in  other systems. Not only efficient retrieval by
    object id (which is very important in  OOPL  and  object-oriented
    databases, as used in navigation links) is required, but also ef-
    ficient associative searching has to be possible. This  is  real-
    ized, as usual, by indexing techniques such as B-trees [Bayer and
    McCreight, 1983, Comer, 1979] or hashing.

r6: Object instances of different sizes.
    A polyline or polygon has to be stored with a  minimum  of  over-
    head,  because of the required time (and space) efficiency in in-
    teractive systems.  This implies that different instances of  the
    same object type may have different sizes. To treat an object in-
    stance as a  unity  means that it is stored in a contiguous  part
    of memory. This may seem to be an implementation issue, but it is
    very important and by putting it in our wish  list  we  emphasize
    this. This topic is further discussed in section 8.

r7: Highly interactive and graphical applications.
    The  previous  two wishes actually are part of this more  general
    wish  to  make  Procol suitable for this kind of applications. It
    has to be kept in mind that  multi-dimensional   data   sometimes
    require other approaches than the data types encountered in trad-
    itional DBMSs.  Also,  the fact  that  Procol is  designed  as  a
    parallel programming language should be exploited.

r8: Exchangeable objects.
    It should be possible to exchange object instances  between  dif-
    ferent  systems.   Object instances created by one system must be
    directly applicable by other systems.

r9: Deal with referential integrity in a satisfactory manner.
    This is well-known problem in database and  programming  language
    research.  The topic will be discussed in depth in section 5.

 4.  The First Attempt

In this section we will give  an  example  of  a  Procol  object.  Then  we
describe  our  first  (and quite simple) attempt to provide a mechanism for
persistent objects, without trying to satisfy all the wishes of  our  list.
The  problems  we encountered give some insight in the complexity of intro-
ducing persistent objects. In the last subsection we  briefly  compare  our
first attempt with the approaches taken in some other systems.

 4.1  A Procol Object

A characteristic Procol feature is the ''protocol''.   The  protocol  regu-
lates  access to an object by ordering the incoming messages which activate
the actions and by checking the identity of the senders.   In  our  example
the  protocol happens to be simple.  An example is the object type  MUNICI-
PALITY  described in Procol:

OBJ      MUNICIPALITY (string name,
                       polygon geom,
                       PROVINCE parent);

Declare  ... local declarations ...

Protocol ANY -> Display        +
         ANY -> RetrieveData   +
         ANY -> ChangeData

Init     parent.AddMUN;

Actions  Display      = ......
         RetrieveData = ......
         ChangeData   = ......

EndOBJ   MUNICIPALITY.

In Procol, as usual, an object consists of a data-part  and  an  operation-
part.  In a Procol object the data-part consists of: the attributes (in OBJ
section) and the local data (in Declare section). The state of  the  object
is determined by the data-part and the current status of the protocol.  The
operation-part contains: the local functions (in Declare section)  and  the
Init,  Protocol,  Actions  and  Cleanup  sections.   In  order to have per-
sistence, the operation-part has to be stored once for each object type and
the state has to be stored once for each object individual instance.

 4.2  Introducing Persistence

Normally, object instances are only present when the program is  executing.
Data will have to be loaded from a file or a database system into the (new)
objects when the program is initialized.  Just before  the  program  stops,
the  data  have to be saved again.  This is, as argued in section 2, an in-
convenient method, especially in the case of applications with huge amounts
of  data  that are not entirely needed in each session.  An object-oriented
step in the right direction is to store the state  of  objects  themselves.
This can be compared with making a  core dump  of a single object.  When an
object is saved, a ''snapshot'' of the object  instance  is  made.  Changes
made after the save operation are not propagated to this snapshot.

The suggestion to store the objects themselves is  not  as  simple  as  one
might  expect.   This is because objects usually contain references (in at-
tributes or local variables) to other objects. A reference to an object  is
an  id  (identification of the proper type), assigned to that object by the
operating system when it was created with the Procol primitive   new.    In
some situations it is useless to save an object without also saving the re-
lated objects.

 4.3  Comparison with Other Systems

The ''snapshot'' method is used in several other OOPLs. In systems offering
multiple inheritance the object type that also has to be persistent, inher-
its this property from a general object type with methods to save and  load
the  object.  Egenhofer  and Frank [Egenhofer and Frank, 1989, Frank, 1988]
suggest the object type  db_persistent  with methods   store,  delete,  re-
trieve  and  modify.   ET++ [Weinand, Gamma, and Marty, 1989] has an object
hierarchy with the object type  object  in the top of this  hierarchy.  The
object type object

has methods called  PrintOn  and ReadFrom which enable transfer to and from
disk.   These  solutions  work  fine as long as the object types contain no
references to other objects but only simple attributes, such as for example
an array of coordinates describing a polygon.

The persistent data in PS-algol [Atkinson et al., 1987,  Morrison  et  al.,
1986]  are  organized into one or more databases. Each database has its own
root and may contain values of different (complex) data types. The data are
''imported''  into  a  program with the  open.database  procedure which re-
turns a pointer to the root. The root has the form of a name-value table in
which the value is usually a pointer to another data structure.  The actual
data are accessed by following these pointers and it is  assumed  that  the
programmer  has  to  know the structure of the database (though this is no-
where stated in the PS-algol papers).  Once imported, the data can be mani-
pulated  in  the  same manner as volatile data. The procedure  commit  pro-
pagates the changes made so far to the database, if it was open  for  writ-
ing. Everything that is accessible from the root is stored. This means that
values may change and data (structures) be added or removed.

In the OOPL Eiffel [Meyer, 1988] an object type that inherits from the  ob-
ject type  STORABLE  gets this kind of behavior by means of the methods
 store  and  retrieve.  If the method  store is invoked in object  instance
x   ,  the  whole  object structure starting at  x  is dumped (in a special
format) to a file, even if the referenced object types in  x  do not inher-
it  from  STORABLE.  Depending on  x and the object structure of the appli-
cation, it is possible to store the whole object structure, or just a  part
of  it.  Basically, this solution has two drawbacks. First, the application
programmer has to indicate when to save or  load  the  objects  explicitly.
So,  if  the  program is stopped before the save, the latest data are lost.
Second, updating one object in an object structure can become  very  expen-
sive  if  all  related  objects have to be saved also, even if they did not
change.

 5.  Referential Integrity

What happens if an object is deleted by its creator while other objects are
still referring to this deleted object? A dangling reference is not a prob-
lem specific to persistent objects, it is a problem in the case of volatile
objects too, but it manifests itself in a severe manner in combination with
persistent objects.  Assume, a persistent object contains a reference to  a
volatile  object  and the program is stopped.  The next time the program is
started, the reference to the volatile object is not valid any more (though
it  has  not been deleted).  By the way, dangling references can also occur
in non-OOPL.  For example, in C it is possible to have pointers to  deleted
data structures, which may be the cause of some severe errors in a program.
Some systems guarantee  referential integrity.   An associated  problem  is
that  of  an  ''unreachable''  object,  that is an object to which the last
reference is lost.  There are a number of possible approaches towards these
problems:

- If we want to guarantee referential integrity, we at  least  have
  to  be  able  to  detect  whether the integrity is damaged by the
  deletion of an object. This can  be  achieved  by  associating  a
  reference  count}  with  each  object  instance. If the reference
  count has a value greater than  zero,  the  object  will  not  be
  deleted  and  the creator is notified of this fact. The reference
  count mechanism introduces overhead, because the counters have to
  be updated in each assignment to an object variable. Problems are
  introduced by cyclic data structures.

- A slightly different approach, but  also  based  on  a  reference
  count,  is followed in  O_2 [Bancilhon et al., 1988].  The object
  deletion is not refused but postponed  until  the  value  of  the
  reference count is zero. The creator does not have to worry about
  trying to delete the object another time.

- Dangling references can not occur if we prohibit the deletion  of
  objects.  This  approach is taken in for example GemStone [Penney
  and Stein, 1987].  In order to avoid congestion  of  the  system,
  garbage  collection   has to be performed. Two well known methods
  for this are:

  - Using a reference count: when the count  becomes   zero,  the
    associated  object instance is automatically delet- ed by the
    system.

  - Performing a            sweep through  the  ob- ject space (a
    directed  graph)  in order            to detect which objects
    are unreachable. A  disadvantage  is   that   the  sweep   is
    performed  periodically  and during this operation the system
    can not be used by the applications.  This  can  be   avoided
    by   using               an  incremental version of the sweep
    algorithm.

- The maintenance of the reference count introduces overhead;  both
  memory usage and execution time increase. Clearly, it is more ef-
  ficient to omit a reference count and directly delete the  object
  at  request.  However,  in this case the system is not allowed to
  reuse the  id's  of deleted objects for new  objects.  So,  if  a
  message  is being sent to a deleted object, the system can detect
  this, and the sender will be notified. This strategy implies that
  the  address of an object can not be used as its id, because when
  the object is deleted we want to be able to reuse  that  part  of
  the memory space for new objects.

It may be clear by now that we are biased towards the latter approach.   In
the  context of Procol, dangling references are probably programming errors
and the detection of the illegal use of dangling references during run-time
is  an  adequate  solution.   Finally, it is interesting to note that PCTE+
[IEPG, 1988, IEPG, 1988] offers links both with and without referential in-
tegrity.  This is probably done for efficiency reasons. It is not stated in
the PCTE+ documents how the referential integrity is maintained.

 6.  Object Management

In order to solve the administrative problems associated with  the  use  of
object  id's,  there  is  a need of an Object Management System} (OMS) that
takes care of the (persistent) objects.  One of the responsibilities of the
OMS  to  keep  the  references in the object system consistent.  To be more
precise, an object system is consistent if [Khosafian and Copeland, 1986]:

-  No two distinct objects have the same identifier (unique identif-
   ier  assumption).  In  other  words,  the identifier functionally
   determines the type and the value of the object.

-  For each identifier present in the system there is an object with
   this identifier (no dangling identifier assumption).

 6.1  Object Identity

A uniform object identification mechanism has to be developed,  capable  of
dealing  with  objects  shared by multiple programs, multiple users or even
multiple computers (in a network).  There should be a mechanism to indicate
in  which  persistent  objects one is interested, so one is not bothered by
non-interesting objects of others. One possible method could be to organize
the object instances in ''datasets'' which are put in the normal hierarchi-
cal file system. This limits the scope and makes the task  of  finding  the
right communication partner easier for the OMS.

In relational databases [Codd, 1970] an identifier key is formed by one  or
more  user-supplied attributes. Value based matching is a transparent tech-
nique for expressing relationships. However, it provides no support for re-
ferential integrity at all. By contrast, OOPLs support the notion of object
identity which is independent of the  attribute  values  [Paton  and  Gray,
1988].   Khoshafian  and  Copeland  [Khosafian and Copeland, 1986] describe
several techniques for implementing object identity and they conclude  that
using  so called  surrogates  is the best technique. Surrogates are system-
generated, globally unique identifiers, completely independent of the  phy-
sical location and data contents of an object.

 6.2  Searching

The objects as presented so far  are  not  suited  for  associative  search
operations.  That  is, searching based on the contents of an object instead
of using the object id to find an object. This is especially useful  for  a
program  that  want  to  use objects created by other programs, because the
id's are unknown and have no semantic  meaning.  All  that  a  program(mer)
knows  is about: object types (the kind of data he wants to use) and attri-
bute values (restriction of instances).

Another use of associative searching is to solve the query: ''How many  in-
habitants  has  the  municipality  with the name attribute `Leiden'?''.  We
have to look at all the instances of the municipality object type until  we
have found the proper one.  This is an  O(n)-algorithm. However, this prob-
lem can be solved with an  O(log (n))-algorithm,  if  a  binary  search  is
used.   In  a  relational database, efficient search is implemented by a B-
tree [Bayer and McCreight, 1983, Comer, 1979] for attributes  on  which  an
index   is  put.  The  B-tree has many useful properties, such as: it stays
balanced under updates, it is adapted to paging (multiway branching instead
of binary) and has a high occupancy rate.

The B-tree solution in an object-oriented environment is established  by  a
set  of auxiliary (system) objects. These objects do not contain the appli-
cation data, but contain tree structures with  references  to  the  objects
with  the actual data. This B-tree has to be part of the OMS and, if possi-
ble, transparent to the ''application'' objects. Note that the  OMS  itself
can be implemented in Procol as a set of objects.

There is some friction between the concepts behind the ADTs and the idea of
associative  searching, because associative searching requires knowledge of
the internals of other objects. An object has to specify his query in terms
of  data-part  that are inside other objects.  To limit the damage, only so
called  visible  attributes may be used in the query. These visible  attri-
butes become part of the specification of an object type (together with the
actions of course), in contrast to the non-visible data-part  which  belong
to the implementation.  Note that an index may be put only on a visible at-
tribute of an object type.

 6.3  Multi-dimensional Data

The searching problem also applies to the graphic or geometric data.  If no
spatial  structure  is used, then queries such as ''Give all municipalities
within rectangle  X'' are hard to answer. A spatial data structure which is
especially  suited for the object-oriented environment is the R-tree [Gutt-
man, 1984].  This is because the R-tree already deals with objects; it only
adds a minimal bounding rectangle (MBR) and then it tries to group the MBRs
which lie close to each other; see Figure 2.  This grouping process is  re-
flected  in  a  tree  structure,  which  in turn may be used for searching.
Several test results [Faloutsos, et al., 1987, Greene, 1989] indicate  that
the R-tree is a very efficient spatial data structure.

Not all known spatial data structures [van Oosterom, 1988] are  suited  for
this  purpose.   For  example  kd-trees  [Bentley, 1975], quadtrees [Samet,
1984], R+-tree, bsp-trees [van Oosterom, 1989], cell-tree  [Gunther,  1988]
and  gridfiles,  are more difficult to integrate in the object-oriented en-
vironment because they cut the geographic objects  into  pieces.   This  is
against  the  spirit  of  the object-oriented approach, which tries to make
complete ''units,'' with meaning to the user.  The  Field-tree  [Frank  and
Barrera,  1989],  KD2B-tree and the Sphere-tree [van Oosterom and Claassen,
1990] are good candidates for integration in an object-oriented system, be-
cause they do not split the objects.  Each of the spatial search structures
has its own strengths and weaknesses, so if several  alternatives  are  of-
fered by the system, an application can use the structure that fulfills its
needs the best.

In Procol, trees can be implemented  in  two  different  ways.   The  first
method  stores  the  entire  tree  in  one single search object. The second
method stores each node of the tree in a separate instance  of  the  search
object.  The latter introduces overhead by creating a lot of search objects
(nodes).  However, it has the advantage of being suited for  parallel  pro-
cessing  in  Procol  because the search objects can run on parallel proces-
sors.  This is useful for range queries:  ''Give  all  municipalities  with
more  than 10.000 and less than 20.000 inhabitants.'' Appendix A contains a
part of the Procol implementation of the R-tree.   In  this  implementation
each  node is a separate object. In case of the R-tree this is a reasonable
choice, because there is a fair amount of work in each node. The same would
be valid for B-trees, but not for binary trees.  In any case, for practical
reasons, there has to be a separate search tree (index) for each  attribute
for which efficient searches are required. This has to be made clear to the
OMS before the queries are posed.

It is possible that the value of an attribute  changes,  after  the  search
tree  has  been created for that attribute. In that case, the tree may have
become incorrect or inconsistent. This can be solved by sending an  (impli-
cit) message to the search tree object(s) in the OMS, just after the attri-
bute has changed. Upon receipt of this message the search tree adjusts  it-
self.

 7.  The Procol Extension

This section presents some issues concerning the syntax  and  semantics  of
the constructs which might be added to Procol for the support of persistent
objects.  This is done here without worrying how this can  be  achieved  in
our implementation of Procol.  From the users point of view, this extension
should be as small and simple as possible. First, we have to decide how  to
indicate that an object is persistent. Some alternative possibilities are:

-   On the fly:
    Make a volatile object instance persistent by applying a new Pro-
    col  primitive  persistent.   Assume the variable  x holds the id
    of an object instance; then this instance is made persistent  by:
    persistent  x.    Note  that there is a difference with the  save
    operation of section 4, because the values  (states)  of  a  per-
    sistent object are always guaranteed to be up-to-date.

-   Per Object Instance:
    At the moment an object is created, it is decided whether it will
    be  persistent  or  not.  A  convention  can be made that objects
    created with the  new  primitive are volatile and the ones creat-
    ed with the  persistent  primitive are persistent.

-   Per Object Type:
    At the moment the object type is defined it is specified  whether
    all  instances  of  this  type are persistent or not.  A modified
    keyword  OBJ  could indicate this: PERSISTENT_OBJ.

A combination of these approaches is  also  possible.  In  the  language  E
[Richardson  and  Carey,  1989]  the  programmer  has  to indicate per type
(class) that instances are optionally persistent. The programmer has to de-
cide per instance if it is really a persistent object instance.

The advantage of persistence per object type is that only once, during  the
object  type definition, there is a difference for the application program-
mer between persistent and volatile objects. In the other solutions  it  is
required  to  indicate  that  the  object is persistent for each object in-
stance.  The major drawback of the latter  choice  is  that  two  different
types  have  to be defined if we want to use both the volatile and the per-
sistent variants of basically one object type.  In the case of strong  type
checking  this means that we can not freely interchange the use of volatile
and persistent objects as arguments in messages and procedure calls.

In order to get hold of persistent objects with unknown id, Procol will  be
extended  with  the  retrieve  primitive.  Perhaps it is better to take the
following approach towards the primitives new,  delete,}  and   retrieve  :
consider   them  as  messages  to  the  object  types  themselves  (''class
methods'').  These are system objects  (partly)  responsible  for  the  OMS
tasks.  These system objects have to maintain index structures if requested
by sending them an create_index message.

The  retrieve  primitive has some resemblance to the  new   primitive,  be-
cause it also assigns the id of an object to a variable of the proper type.
Unlike  new , retrieve will not execute the Init section, because that  al-
ready happened when this object was created for the first time.  The proto-
col (expression) of the object regulating access to the object  is  matched
starting at the current (saved) state.

A  discrimination condition  can be used, because the object type  informa-
tion  may not be specific enough.  Of course only visible attributes can be
specified in the condition.  A retrieve returns the id of the object of the
proper  type  for  which  the  discrimination condition evaluates True.  If
there is more than one object satisfying these criteria, only  one  is  re-
turned. If there is no object satisfying these criteria, a  NULL  object is
returned.

It is a small step from the  retrieve  primitive to the associative  search
operation.  In  fact,  it  could be considered as an iteration over the re-
trieve operation. If fast replies are required, then in case of  large  set
of  objects,  an index has to be used.  This index could be a spatial index
structure; e.g., needed for  efficiently  solving  a  ''rectangle''  query.
There are several options for returning the answer of a search:

 - Return one big set that contains the id's  of  all  objects  that
   satisfy  the  query. In case of large answers, a lot of temporary
   memory is required and it may take quite a while to generate  the
   complete answer.

 - Another strategy is first to state the query  and  then  retrieve
   the  answer  one  by one (or perhaps in buffers of a fixed size).
   The first part of the answer will probably be ready  sooner  than
   the  complete  answer would be.  This promotes parallelism and is
   also quite important in an interactive application,  because  the
   end-user can already see something on his screen then.

The problem with the second solution for returning a search result is  that
other  objects  might interfere with the set of objects that belongs to the
queried object type. could change values and add new  instances  or  delete
existing  ones.  We still have to investigate whether this can be solved by
applying the right protocols in the system (OMS) objects.  This has  to  be
solved before we decide on the syntax of a search query.

 8.  Object Instances of Different Sizes

In section 3 we saw that the wish to store a polyline  or  polygon  with  a
minimum  of  overhead,  implies that different instances of the same object
type may have different sizes.  So for example, the pure  relational  solu-
tion,  presented  by van Roessel [van Roessel, 1987] is not acceptable, be-
cause a polyline is scattered over several tuples in a table and first  has
to  be  aggregated  before it can be used again.  In [van Oosterom, et al.,
1989] a solution in the context of the relational data model is presented.

Different sizes have an (enormous) impact on  the  implementation  of  per-
sistent  objects. In  CO_2, the C implementation of  O_2 [Bancilhon et al.,
1988], it was decided to prohibit object instances of different sizes.   In
contrast  we  would even like to have persistent objects whose sizes change
dynamically. For example, to make it possible to remove points from or  add
points  to  a polyline. However, this would even further complicate the im-
plementation.  A decrease of the size of an object is not too hard, but  an
increase  of object size means that an object does not fit in his (contigu-
ous) part of memory and the memory after this object is  probably  occupied
by  another  instance.   The object will have to be moved to another larger
place, because we want to treat an object as a unity and  do  not  want  to
split it. This would have been impossible if the objects id is its address.
However, this was already disapproved of because of reasons discussed  ear-
lier.   In  any  case,  growing persistent objects could introduce a lot of
overhead; e.g.  moving of objects.

A more feasible situation is that after the Init section, the  size  of  an
object  may  not  vary any more.  It is still possible to deal with dynamic
problems. For example, use a pointer (id) to an object of type linked list.
This object type has an ''application'' data-part and a pointer to the next
list element.  Each instance represents one list element and they all  have
the same fixed size. We can extend this approach and simplify our implemen-
tation of Procol, if we only allow the following data types as  attributes:
Basic  types  (int, char, float, ...),  References  (or links) to other ob-
jects, and  Arrays  with fixed size after the Init.

 9.  Persistent Objects in Procol

The question how to implement persistent objects in Procol can  be  divided
into  two  sub-questions.  The first is how to adapt the languages features
(the external implementation). The second is how to implement this  on  the
underlying platform (the internal implementation).

 9.1  External Implementation

Our decisions according the external implementation of persistence in  Pro-
col included the introduction of the following new keywords:

1.   persistent <object-id>  in <dataset-key>

With this statement a volatile object instance  identified  by  <object-id>
can  be made persistent by coupling it to a dataset identified by <dataset-
key>.

2.    volatile <object-id>

With this statement an object instance (identified  by  <object-id>),  that
has  been  made persistent before, can be made volatile. The result of this
statement is that the persistent object instance is removed  from  its  da-
taset.

3.    retrieve <object-id>  from <dataset-key>
      where <discrimination-string>

and

      next <object-id>

With this statement we can  retrieve  an  instance  of  the  same  type  of
<object-id>  from the dataset with identifier <dataset-key>, that satisfies
the <discrimination-string>.  If the dataset in question contains more than
one  instance  of  the required type, only one will be returned in <object-
id>. Any successive execution of the  next  statement,  will  retrieve  the
next   instance of the required type until all required instances have been
retrieved from the dataset.

In general, <dataset-key> will be a string. This string can then be used to
compose  the  filenames  of the necessary dataset files.  An example use of
the provided persistence in the Declare section of an object type using the
object  DRAWING :

Declare

     object DRAWING drawing;

     allocate_drawing(char *name)
     {
         new drawing;
         persistent drawing in name;
     }
     read_old_drawing(char *name)
     {
         retrieve drawing from name;
     }
     erase_drawing(char *name)
     {
         retrieve drawing from name;
         volatile drawing;
     }
     move_drawing(char *name1, *name2)
     {
         retrieve drawing from name1;
         volatile drawing;
         persistent drawing in name2;
     }
     copy_drawing(char *name1, *name2)
     {
         retrieve drawing from name1;
         persistent drawing in name2;
     }

 9.2  Internal Implementation

We will now motivate our design decisions for the internal  implementation.
Procol  has been developed and implemented on a network of Sun workstations
running under SunOS Release 4. Five possible implementations of the  exten-
sion of Procol with persistent objects were considered, (compare with [Tsi-
chritzis and Nierstrasz, 1988]):

1.  Bare  implementation: The problem of persistent  objects  is  de-
    ferred  to  the kernel of Procol. Procol stores and manages these
    objects in (one or more) files, by using  the  Unix  OS-interface
    calls:  open, close, read and write. The advantage of this method
    is that it is very portable.  Disadvantages are: a lot  of  work,
    relatively  inefficient because of the many read/write calls that
    have to be made.  We also have to implement the memory management
    aspects  (within  each  file), such as: allocation, deallocation,
    gaps, fit strategies, and so on.

2.  Shared mapped memory  (virtual files) implementation: A  file  is
    mapped directly by the Unix OS on the address space of a process.
    A major advantage is that there  is  no  difference  between  the
    ''stored'' object instances and their ''running'' counterpart, at
    least not at the level of the Procol kernel. At OS level there is
    a  difference and this is the same as the difference between vir-
    tual memory pages that are in main-memory and the ones  that  are
    swapped  on  disk. We expect this implementation to be very effi-
    cient.  In order to gain some experience we  are  currently  con-
    verting  a  local  application  that uses explicit read and write
    statements, into  a  mapped  memory  implementation.  First  test
    results  indicate that the elapse times decrease with about 40 in
    applications with a lot of read and write statements.   A  disad-
    vantage of the mapped memory approach is that we still have to do
    the memory management ourselves.

3.  INGRES [Sun Microsystems, 1987] (or other relational DBMS)  based
    implementation:  The  memory  management  is now done by database
    system. Also, all kinds of other useful DBMS features are provid-
    ed.   Wiederhold  [Wiederhold,  1986]  suggest  the use of  view-
    object generators  to  reconstruct  objects  from  relations  and
    view-object  decomposers/archivers}  for  the  inverse operation.
    However, there are several disadvantages. Object  types  can  not
    (always)  be  mapped on a relation in which each tuple represents
    one object instance. For example,  objects  with  polylines  have
    changing instance sizes.  Another disadvantage is that there is a
    difference between the by the Procol kernel. Of course, not  from
    the point of view of the application program.

4.  POSTGRES  [Stonebraker and Rowe, 1986] (or other extendable DBMS)
    based implementation: New data types can be defined, so it should
    be possible to have a one to one relationship between tuples in a
    relation  and  the objects.  Still, the difference stored/running
    parts of objects remains.  Another disadvantage is  that  a  DBMS
    implementation  is probably less efficient that the mapped memory
    implementation.

5.  PCTE+  [IEPG, 1988, IEPG, 1988] based implementation:  The  Port-
    able  Common Tool Environment PCTE+ offers a powerful OMS derived
    from the  Entity-Relationship Model  of Chen [Chen,  1976].   The
    objects  in  PCTE+ are not (complete) encapsulations of ADTs.  An
    advantage is that the OMS of PCTE+ is already distributed amongst
    workstations  in  a  network.  It has to be investigated if it is
    possible to manipulate multi-dimensional  data  in  an  efficient
    manner.   A  disadvantage of a PCTE+ based implementation is that
    we have to switch completely to the PCTE+ concepts  and  will  be
    tied  to  these concepts.  Finally, the availability of PCTE+ may
    be a problem. It is not in the  public domain  and  therefore  we
    expect problems with the distribution of Procol.

We have decided to use the mapped memory approach for the prototype  imple-
mentation  of  persistent  objects in Procol. The main reasons were the ex-
pected efficiency of the approach and the provided synchrony between

An object instance is identified by a  surrogate  [Khosafian and  Copeland,
1986].   That  is, the object id is not the actual address in memory but we
need an indirection [Straw, et al., 1989] to locate the object.  See figure
3 for a graphical demonstration of the process.  Each surrogate contains an
indication whether the object instance is volatile, persistent or  deleted.
When, during the execution of a Procol program, an object is referenced, we
have to check the first part of the surrogate.  If the the object  instance
is  volatile, the surrogate contains a key than can be used to retrieve the
memory address of the instance variables. If the object  instance  is  per-
sistent,  the surrogate contains a dataset identifier, and a key. With this
dataset identifier and the key, the OMS is  able  to  retrieve  the  actual
memory  address  of  the the instance variables of the persistent object in
question.

The disadvantage of surrogates is that there is an extra indirection.  How-
ever, surrogates have several important advantages.  They enable objects to
be switched between volatile and persistent state, by modifying a  part  of
the  surrogate.  We decided, based on the advantages and disadvantages men-
tioned in section 5, that the control of referential integrity is  not  re-
quired  in  Procol.  However, the use of illegal references is signalled at
run-time.  This is done by inspecting the  surrogate  and  taking  the  ap-
propriate measures.

If a persistent object contains a reference  to  a  volatile  object,  this
volatile  object is not saved automatically. The proper way to program this
case is to make the other object also persistent, if the referenced  object
is still needed in the future.  The state of each object instance is stored
as unity, that is in contiguous memory.  Instances of different  sizes  are
no  problem.  The state also contains some additional data, for example the
creator.  The application programmer decides whether instances of the  same
object type are ''stored together'' without instances of other types in one
dataset or if a dataset contains instances of different types. The later is
advantageous  for  representing complex objects in an efficient manner, be-
cause the instances of the different types that define a complex object are
stored close together.  Note that complex objects are important in CAD sys-
tems, because this is one of the main modeling tools.

10.  Further Research

Besides an object-oriented programming language, Procol is also a  parallel
programming  language.  It  is possible that the objects run in parallel on
multiple processors. We have only used this in a  few  examples.   Clearly,
this  topic deserves more attention and more research is needed in the con-
text of highly interactive and graphical systems.

It is a small step from one single user Procol program with persistent  ob-
jects  to a system with multiple users. At least conceptually, because each
object has its own protocol which regulates the communication.   It  should
not  matter  from which program a message originates. However, we will have
to reconsider some of the concepts.

The provided query facilities are very limited; with  retrieve  it is  only
possible  to  get hold of one ''starting'' object id of a specified type or
to perform the selection of instances from one set (object type) at a time.
More  complex  queries  have  to be programmed into the objects (the Procol
program). Attention has to be  paid  to  avoid  object  types  becoming  to
specific.  That is in contradiction with one of the basic principles of da-
tabases of data being independent of applications.  More research  in  this
area is necessary.

References

Atkinson,   Malcolm P. and   O. Peter Buneman, "Types and per-
     sistence in database programming languages," ACM Computing
     Surveys, vol. 19, no. 2, pp. 105-190, June 1987.

Atkinson,   M.P.,   P.J. Bailey,   K.J. Chisholm,   P.W.
     Cockshott, and   R. Morrison, "An approach to persistent
     programming," The Computer Journal, vol. 26, no. 4, pp.
     360-365, 1983.

Bancilhon,   F.,   G. Barbedette,   V. Benzaken,   C. Delobel,
     S. Gamerman,   C. Lecluse,   P. Pfeffer,   P. Richard, and
     F. Velez, "The design and implementation of O2, an Object-
     Oriented database system," in Advances in Object-Oriented
     Database Systems, 2nd      International Workshop on
     Object-Oriented Database Systems, pp. 1-22, Bad Munster am
     Stein-Ebernburg, FRG, September 1988.

Bayer,   R. and   E. McCreight, "Organization and maintenance of
     large ordered indexes," Acta Informatica, vol. 1, pp. 173-
     189, 1973.

Bentley,   Jon Louis, "Multidimensional binary search trees used
     for associative searching," Communications of the ACM, vol.
     18, no. 9, pp. 509-517, September 1975.

Blake,   E.H. and   S. Cook, "On including part hierarchies in
     object-oriented languages, with an      implementation in
     smalltalk," in ECOOP '87, pp. 41-50, 1987.

Chen,   Peter Pin-Shan, "The entity-relationship model - toward a
     unified view of data," ACM Transactions on Database Systems,
     vol. 1, no. 1, pp. 9-36, March 1976.

Codd,   E.F., "A relational model of data for large shared data
     banks," Communications of the ACM, vol. 13, no. 6, pp. 377-
     387, June 1970.

Comer,   Douglas, "The ubiquitous B-Tree," ACM Computing Surveys,
     vol. 11, no. 2, pp. 121-137, June 1979.

Cox,   Brad J., Object-Oriented Programming - An Evolutionary Ap-
     proach, Addison-Wesley, Reading, Mass., 1986.

Jr.,   Walter C. Dietrich,   Lee R. Nackman,   Christine J. Sun-
     daresan, and   Franklin Gracer, "TGMS: An object-oriented
     system for programming geometry," Software - Practice and
     Experience, vol. 19, no. 10, pp. 979-1013, October 1989.

Egenhofer,   Max and   Andrew Frank, "Panda: An extensible DBMS
     supporting object-oriented software      techniques," in Da-
     tabase Systems in Office, Engineering, and Scientific
     Environment, Springer-Verlag, New York, March 1989.

Egenhofer,   Max J. and   Andrew U. Frank, "Object-oriented
     modeling in GIS: Inheritance and Propagation," in Auto-Carto
     9, pp. 588-598, Baltimore, April 1989.

Faloutsos,   Christos,   Timos Sellis, and   Nick Roussopoulos,
     "Analysis of object oriented spatial access methods," ACM
     SIGMOD, vol. 16, no. 3, pp. 426-439, December 1987.

Frank,   Andrew U., "Multiple inheritance and genericity for the
     integration of a database      management system in an
     Object-Oriented approach," in Advances in Object-Oriented
     Database Systems, 2nd      International Workshop on
     Object-Oriented Database Systems, pp. 268-273, Bad Munster
     am Stein-Ebernburg, FRG, September 1988.

Frank,   Andrew U. and   Renato Barrera, "The fieldtree: A data
     structure for geographic information system," in Symposium
     on the Design and Implementation of Large Spatial      Data-
     bases, Santa Barbara, California, July 1989.

Greene,   Diane, "An implementation and performance analysis of
     spatial data access      methods," in IEEE Data Engineering
     Conference, pp. 606-615, 1989.

Gunther,   Oliver, "Efficient Structures for Geometric Data
     Management," Number 337 in Lecture Notes in Computer Sci-
     ence, Springer-Verlag, Berlin, 1988.

Guttman,   Antonin, "R-trees: A dynamic index structure for spa-
     tial searching," ACM SIGMOD, vol. 13, pp. 47-57, 1984.

TA-13),   Independent European Programme Group - Technical Area
     13 (IEPG, PCTE+ C Functional Specification, Issue 2, July
     1988.

TA-13),   Independent European Programme Group - Technical Area
     13 (IEPG, Introducing PCTE+, 1989.

Khoshafian,   Setrag N. and   George P. Copeland, "Object identi-
     ty," in OOPSLA'86, pp. 406-416, September 1986.

Laffra,   Chris and   Jan van den Bos, "A layered object oriented
     model for interaction," in Proceedings of the Eurographics
     Workshop on Object Oriented Graphics, pp. ??-??,
     Konigswinter, BRD., 1990.

Meyer,   Bertrand, Object-oriented Software Construction, Pren-
     tice Hall, London, 1988.

Morrison,   R.,   A.L. Florianis,   A. Dearle, and   M.P. Atkin-
     son, "An integrated graphics programming environment," Com-
     puter Graphics Forum, vol. 5, pp. 147-157, 1986.

Paton,   Norman W. and   Peter M.D. Gray, "Identification of da-
     tabase objects by key," in Advances in Object-Oriented Data-
     base Systems, 2nd      International Workshop on Object-
     Oriented Database Systems, pp. 280-285, Bad Munster am
     Stein-Ebernburg, FRG, September 1988.

Penney,   D. Jason and   Jacob Stein, "Class modification in the
     GemStone Object-Oriented DBMS," in OOPSLA'87, pp. 111-117,
     September 1987.

Richardson,   Joel E. and   Michael J. Carey, "Persistence in the
     E language: Issues and implementation," Software - Practice
     and Experience, vol. 19, no. 12, pp. 1115-1150, December
     1989.

Samet,   Hanan, "The quadtree and related hierarchical data
     structures," Computing Surveys, vol. 16, no. 2, pp. 187-260,
     June 1984.

Stonebraker,   Michael and   Lawrence A. Rowe, "The design of
     POSTGRES," ACM SIGMOD, vol. 15, no. 2, pp. 340-355, 1986.

Straw,   Andrew,   Fred Mellender, and   Steve Riegel, "Object
     management in a persistent smalltalk system," Software -
     Practice and Experience, vol. 19, no. 8, pp. 719-737, August
     1989.

Inc.,   Sun Microsystems, SunINGRES Manual Set, January 1987.

Tsichritzis,   D.C. and   O.M. Nierstrasz, "Fitting round objects
     into square databases," in ECOOP '88, pp. 283-299, August
     1988.

Bos,   Jan van den, "PROCOL: A protocol-constrained concurrent
     object-oriented language," Information Processing Letters,
     vol. 32, pp. 221-227, September 1989.

Bos,   Jan van den and   Chris Laffra, "PROCOL - A parallel ob-
     ject language with protocols," in OOPSLA '89, pp. 95-102,
     New Orleans, October 1989.

Oosterom,   Peter van, "Spatial data structures in Geographic In-
     formation Systems," in NCGA's Mapping and Geographic Infor-
     mation Systems, pp. 104-118, Orlando, Florida, September
     1988.

Oosterom,   Peter van, "A Reactive Data Structure for Geographic
     Information Systems," in Auto-Carto 9, pp. 665-674, Bal-
     timore, April 1989.

Oosterom,   Peter van and   Eric Claassen, "Orientation insensi-
     tive indexing methods for geometric objects," in 4th Inter-
     national Symposium on Spatial Data Handling, Zurich,
     Switzerland, July 1990.

Oosterom,   Peter van and   Chris Laffra, "Persistent graphical
     objects," in Eurographics Workshop on Object Oriented Graph-
     ics, June 1990.

Oosterom,   Peter van and   Jan van den Bos, "An object-oriented
     approach to the design of Geographic Information      Sys-
     tems," Computers & Graphics, vol. 13, no. 4, pp. 409-418,
     1989.

Oosterom,   Peter van,   Marcel van Hekken, and   Marco Woesten-
     burg, "A geographic extension to the relational data model,"
     in Geo '89 Symposium, The Hague, October 1989.

Roessel,   J.W. van, "Design of a spatial data structure using
     the relational normal forms," International Journal of Geo-
     graphical Information Systems, vol. 1, no. 1, pp. 33-50,
     1987.

Weinand,   Andre,   Erich Gamma, and   Rudolf Marty, "Design and
     implementation of ET++, a      seamless object-oriented ap-
     plication framework," Structured Programming, vol. 10, no.
     2, pp. 63-87, 1989.

Wiederhold,   Gio, "Views, objects, and databases," IEEE Comput-
     er, pp. 37-44, December 1986.

Wolf,   Andreas, "The DASDBS GEO-Kernel, concepts, experiences
     and the second step," in Symposium on the Design and Imple-
     mentation of Large Spatial Databases, Santa Barbara, Cali-
     fornia, July 1989.

Appendix - The R-tree in Procol

This appendix contains the Procol code of the objects  R-TREE and  R-NODE ,
which  together  implement  a persistent multi-dimensional index structure;
see section 6. In case of a GIS with attributes such as points,  lines  and
regions  in  the  plane the dimension of the R-tree is 2. A CAD system with
solid objects, for example a polyhedron, will need a 3 dimensional  R-tree.
Higher  dimensional R-trees are also possible, because the in some applica-
tions it is beneficial to interpret a combination k  scalar  attributes  as
one k-dimensional point attribute on with range queries may be formulated.

Not all the code is given here. This is  indicated  by  three  dots  (...).
Especially, some tricky parts of the insert and delete algorithms are omit-
ted, but can be found in [Guttman, 1984]. The code is  non-trivial  because
the  tree has to be kept in balance under the insert and delete operations.
The purpose of this appendix is to show how the R-tree is implemented in an
object-oriented  manner.  We only showed one query type: the box select (in
2D that is, a rectangle select), which returns the id of  every  object  in
the  R-tree  that overlaps the search box  sbox. The result is sent back to
the original object  sid  by invoking the action  InRegion  with the id  of
the  found (graphical data) object.  This happens once for each object that
is found.

Note that we implemented a parallel or distributed version of the  tree  by
using  the  object   R-NODE.  During the search operation several nodes can
work in parallel on different processors.  There is quite a bit of work  in
each  node,  because  a typical value for  M (maximum number of entries) is
100. This solution would probably not be very efficient  for  binary  tree,
because the overhead introduced by sending messages to other processors may
require more time than the time that is gained by the  parallel  execution.
Besides  the  search operation, the delete and insert operations might also
benefit from parallel execution in case of node overflow and node underflow
respectively.  This is not shown in the code below.
#define DIM 2                                  /* or any other value > 1 */
#define LENGTH 256

typedef struct{
   ANY       id;                           /* R-NODE or graphical object */
   float     box[DIM][2];
} EntryType;

---------------------------------------------------------------------------

OBJ          R-TREE (int m, int M, char dataset[LENGTH]);

Description  In R-tree literature m and M are the minimum and
             maximum number of entries per node. The tree is suited
             for DIM dimensions.
             Assumed is that just after the creation of the R-TREE,
             it made persistent by its creator in 'dataset'.

Declare      R-NODE       root, node;
             float        box[DIM][2], sbox[DIM][2];

Protocol     /* Assumption: The Add and Delete actions are               */
             /* invoked by the (graphical data) objects themselves and   */
             /* they send their correct minimal bounding box in box.     */
             ANY (box)  -> Add      +
             ANY (box)  -> Delete   +
             ANY (sbox) -> Select

Init         root = NULL;

Actions      Add = {
                if (root==NULL) {
                   new root(m, M, true);
                   persistent root in dataset;
                   root.EntryAdd(sender, box);
                } else {
                   ...
                   new node(m, M, true);
                   persistent node in dataset;
                   ...
                }
             }
             Delete = {
                ...
                /* If a (persistent) node is deleted, then this also     */
                /* implies removal from the dataset                      */
                ...
             }
             Select = { root.Search(sender, box); }
EndOBJ       R-TREE.

---------------------------------------------------------------------------

---------------------------------------------------------------------------

OBJ          R-NODE (int m, int M, boolean leaf);

Description  m and M have same meaning as in R-TREE.
             If leaf has value true, then this node is a leaf,
             else this is an internal node.

Declare      float        box[DIM][2], sbox[DIM][2];
             ANY          id, sid;
             EntryType    entry[M];
             int          NrOfEntries, i;
             R-NODE       next;

Protocol     (NrOfEntries < M): R-TREE (id, box) -> EntryAdd    +
             (NrOfEntries > m): R-TREE (id, box) -> EntryDelete +
             R-TREE (sid, sbox)                  -> Search      +
             R-NODE (sid, sbox)                  -> Search      +
             R-TREE ()                           -> Full

Init         NrOfEntries = 0;

Actions      EntryAdd = {
                /* Assumption: R-NODE is not full */
                entry[NrOfEntries].box  = box; entry[NrOfEntries++].id = id;
             }
             EntryDelete = { ... }
             Full = { sender.(NrOfEntries==M); }
             Search = {
                for (i = 0; i < NrOfEntries; i++)
                   if (overlap(sbox, entry[i].box)
                      if (leaf)
                         /* Return found object */
                         sid.InRegion(entry[i].id);
                         else { /* Propagate search to lower level */
                               /* This part of the code causes the parallelism */
                         next = entry[i].id;
                         next.Search(sid, sbox);
                      }
             }
EndOBJ       R-NODE.

---------------------------------------------------------------------------