[comp.object] Schema/Type Evolution in Traditional and O-O DBMSs

clamen@CS.CMU.EDU (Stewart Clamen) (05/06/91)

I am doing research in the area of type evolution in databases
(object-oriented databases in particular) and wish to learn a bit more
about how existing systems deal with the problem.  I'd much appreciate
it if those readers with intimate knowledge of a particular DBMS or
OODBMS would drop me a note telling me if and how the system provides
support for the evolution of types.  

By "type evolution", I am referring to the process of redefining an
existing type (or schema in database parlance) in a DBMS and
provisions for the reformatting of the data representing the instances
of that type present in the database prior to the redefinition.

I am already familiar with the services provided by ORION and
GemStone. 

[Please don't post your replies to the 'net.  I will post a summary if
 people express an interest.]

--
Stewart M. Clamen			Internet:    clamen@cs.cmu.edu
School of Computer Science		UUCP: 	     uunet!"clamen@cs.cmu.edu"
Carnegie Mellon University		Phone: 	     +1 412 268 3620
Pittsburgh, PA 15213-3890, USA		Fax:	     +1 412 268 1793

clamen@CS.CMU.EDU (Stewart Clamen) (05/22/91)

In article <CLAMEN.91May5174756@BYRON.SP.CS.CMU.EDU> I made the
following request:

   I am doing research in the area of type evolution in databases
   (object-oriented databases in particular) and wish to learn a bit more
   about how existing systems deal with the problem.  I'd much appreciate
   it if those readers with intimate knowledge of a particular DBMS or
   OODBMS would drop me a note telling me if and how the system provides
   support for the evolution of types.  

   By "type evolution", I am referring to the process of redefining an
   existing type (or schema in database parlance) in a DBMS and
   provisions for the reformatting of the data representing the instances
   of that type present in the database prior to the redefinition.

   I am already familiar with the services provided by ORION and
   GemStone. 

I have so far, received relevant information now on the following
systems:

Research Systems		Commercial Systems
----------------		-----------------
ORION				Symbolics' Statice
IRIS				Versant
GemStone			Object Design
Encore				Objectivity
COOL/COCOON			ObjectStore
Exodus				Ontos (formerly VBase)
Machiavelli			NMP-CAD's Base/OPEN 
ConceptBase			PICK
AVANCE				Persistent Data Systems' IDL products


Before I consider my survey complete, and post what I have collected
to the net (and mailed it to those who so requested), I'd like to make
a directed request for schema evolution and database conversion as it
pertains to other particular systems I have read about.

As you might have noticed, most of these systems are OODBMS. I am
interested in learning about the more traditional DBMSs, as well as
other non-OO systems.  Notably:

ADAPLEX
CODASYL
CLASSIC
DAPLEX
POSTGRES


I would also like to hear something about Altair/O_2, an OODB
project in France.


Thank you again.
--
Stewart M. Clamen			Internet:    clamen@cs.cmu.edu
School of Computer Science		UUCP: 	     uunet!"clamen@cs.cmu.edu"
Carnegie Mellon University		Phone: 	     +1 412 268 3620
Pittsburgh, PA 15213-3890, USA		Fax:	     +1 412 268 1793

marcs@slc.com (Marc San Soucie) (05/23/91)

Stewart M. Clamen writes:
>
> In article <CLAMEN.91May5174756@BYRON.SP.CS.CMU.EDU> I made the
> following request:
>
>    I am doing research in the area of type evolution in databases
>    (object-oriented databases in particular) and wish to learn a bit more
>    about how existing systems deal with the problem.
>
> I have so far, received relevant information now on the following
> systems:
>
> Research Systems		Commercial Systems
> ----------------		-----------------
> ORION				Symbolics' Statice
> IRIS				Versant
> GemStone			Object Design
> Encore			Objectivity
> COOL/COCOON			ObjectStore
> Exodus			Ontos (formerly VBase)
> Machiavelli			NMP-CAD's Base/OPEN
> ConceptBase			PICK
> AVANCE			Persistent Data Systems' IDL products

				GemStone

Please be aware that GemStone is in fact a commercial product, not a research
product, and has been a commercial product since 1987.

    Marc San Soucie
    Servio Corporation
    Beaverton, Oregon
    marcs@slc.com

clamen@CS.CMU.EDU (Stewart Clamen) (05/30/91)

The following is the result of my public survey into the schema
evolution and database conversion support exhibited by known research
and commerical DB and OODBMS.  Further contributed are welcome.

	     --------------------*-*---------------------

Direct quotes are attributed by including the email address of the
poster directly following the information.  Prose written by the
poster, but with primary information provided by email are so
identified.  Information gleaned from publications are so noted.  The
information included here is not intended to completely describe the
systems addressed, but rather, to describe what support, if any, is
provided by the system for the evolution of schemas and the conversion
of database objects (class instances) resulting from the schema
change.

							SMC

		       ----------*-*----------


		 <<< EXTENDED RELATIONAL DB MODEL >>>

			<< Research Systems >>


> POSTGRES (Berkeley)

You ask explicitly about type evolution.  We support schema
modification on all classes, including user classes.  This means that
you can add attributes (instance slots) and methods at any time.
Further, since postgres is a shared database system, such changes are
instantly visible to any other user of the class.

The language syntax supports attribute deletion, but the system won't
do it yet.  Since all data is persistent, removing attributes from a
class requires some work -- you need to either get rid of or ignore
all the values you've already stored.

[mao@postgress.berkeley.edu]



<			<<< OO DATA MODEL >>>

			<< Research Systems >>

> COOL/COCOON (ETH Zurich)

No implementation as yet.  Project goals are:

- to develop a general formal framework for investigations of all
  kinds of schema changes in object-oriented database systems
  (including schema design, schema modification, schema tailoring, and
  schema integration);
- to find implementation techniques for evolving database schemas,
  such that changes on the logical level propagate automatically to
  adaptations of the physical level (without the need to modify all
  instances, if possible).

Contact Markus Tresch <tresch@inf.ethz.ch> for more information.


> Encore (Brown)

Objects are never converted, rather, classes are versioned, and the
user can specify filters to make old-style instances appear as new
instances to new applications (and vice versa).

REFS:
 	Andrea H. Skarra,  and Stanley B. Zdonik. "Type
	Evolution in an Object-Oriented Database."  In the
	book, "Research Directions in Object-Oriented
	Programming", by Shriver and Wegner.  (An earlier
	version of the paper appears in the proceedings to
	OOPSLA86.) 

[clamen]


> ORION (MCC/Itasca System, Inc.)

ORION is a prototype OODBMS developed at MCC, an American consortium.
It is built on top of Common Lisp, and is intended to support
applications such in the CAD/CAM, AI, and OIS domains.  Advanced
functions supported include [object] versions, change notification,
composite objects, dynamic schema evolution, and multimedia data.

For schema evolution, ORION identifies a list of database-consistency
constraints that must be preserved across any class evolution
operation.  They then list the type of evolution operations you can
perform, and how the relevant instances can be converted.  Conversion
is performed as the instances are accessed.

I have found nearly a dozen papers published by the ORION folks.  The
most recent and general one is:

	W. Kim, N. Ballow, H-T. Chou, J.F. Garza, D. Woelk,
	and J.  Banerjee. "Integrating an Object-Oriented
	Programming System with a Database System."
	Proceedings of OOPSLA88.  [Pointers to the previous
	papers documenting each of the advanced features
	listed above are cited therein.]

The paper most relevant to the issue of schema evolution is the
following:

	J. Banerjee, W. Kim, H.J. Kim, H.F. Korth.
	"Semantics and Implementation of Schema Evolution in
	Object-Oriented Databases." Proceedings of SIGMOD87.

[clamen]


> Exodus (UWisc)

No solution for the problem of schema evolution is provided.
Emulation is rejected by the authors, who claim that the addition of a
layer between the EXODUS Storage Manager and the E program would
seriously reduce efficiency.  Automatic conversion, whether lazy or
eager, is also rejected, as it does not mesh well with the C++ data
layout.  To implement immediate references to other classes and
structures, C++ embeds class and structure instances within its
referent.  The resulting change in the size of the object might
invalidate remote pointer references.

	Joel E.  Richardson and Michael J.  Carey.  "Persistence
	in the E langauge: Issues and Implementation."  Appeared
	in "Software -- Practice and Experience",
	19(12):1115-1150, December 1989.

[clamen]


> Machiavelli (UPenn)

Machiavelli is a statically-typed persistent programming language
project at the University of Pennsylvania.  It does not address type
evolution. 

[communication with limsoon@saul.cis.upenn.edu]


> ConceptBase

We have developed a deductive object-oriented database called
ConceptBase where everything (tokens, classes, meta-classes
,meta-meta-classes ,attributes, instantiations, specializations) is
treated as an object. That means that you may update the "schema"
(classes) at any time just as any other ordinary object.

The systems has (user-defined and builtin) integrity constraints that
prevent inconsistency (e.g. violation of ref.integrity).  Integrity
constraints in ConceptBase are (as in most other systems) static,
i.e., they are conditions that each database "state" must satisfy.

The data model we use does not distinguish schema level information
(i.e. classes) from instance level information. If you change for
example some classes and this change violates some integrity
constraints, e.g.  some instances now don't have the right attribute
types anymore, then you have the choice either to reject the update or
to change the existing DB. Currently, ConceptBase simply rejects such
updates.  We are thinking of exploiting abduction (see VLDB'90 article
of Kakas&Mancarella) to make more clever reactions in the sense of
"reformating" instances.

[Manfred Jeusfeld <jeusfeld@forwiss.uni-passau.de>]



> AVANCE (SYSLAB)

An object-oriented, distributed database programming language.  Its
most interesting feature is the presence of system-level version
control, which is used to support schema evolution, system-level
versioning (as a way of improving concurrency), and objects with their
own notion of history.  System consists of programming language (PAL)
and distributed persistent object manager. 

REFS: 
	Anders Bjornerstedt and Stefan Britts. "AVANCE: An
	Object Management System".  Proceedings of OOPSLA88.

[clamen]


> Altair/O_2 (INRIA)
Neither of the two articles I have (bibliographic information below)
address the issue of schema evolution or database conversion.

REFS:
	F. Bancilhon, G. Barbette, V. Benzaken, C. Delobel,
	S. Gamerman, C. Lecluse, P. Pfeffer, P. Richard,
	and F. Velez.  "The Design and Implementation of
	O2, and Object-Oriented Database System".
	Advances in Object-Oriented Database Systems,
	Springer Verlag. (Lecture Notes in Computer Science
	series, Number 334.)

	C. Lecluse, P. Richard, and F. Velez. "O2, an
	Object-Oriented Data Model".  Proceedings of
	SIGMOD88.  Also appears in Zdonik and Maier,
	"Readings in Object-Oriented Database Systems",
	Morgan Kaufmann, 1990.
	
[clamen]


> OTGen (CMU)

OTGen describes a scheme for computer-assisted schema evolution.  A
wide variety of changes (wider than those supported by Orion or
GemStone) can be expressed in the evolution "mini-language", which
describes a procedure for transforming instances from their new to old
representations.  Objects are converted as databases (which in the
invisioned OTGen system are rather small) are opened.

REFS:

	Barbara Staudt Lerner and A. Nico Habermann. "Beyond
	Schema Evolution to Database Reorganization" in
	Proceedings of OOPSLA/ECOOP '90.

[clamen, blerner@cs.umass.edu]




		       << Commercial Systems >>

> CLOS

Not persistent, but implementations must support redefinition of
classes and the conversion (either lazy or eager) of existing
instances. [c.f. CLtL II]  In spite of this freedom, implementations
seem to convert lazily.
[communication with gregor@parc.xerox.com, hornig@symbolics.com,
 dussud@lucid.com] 


> Statice (Symbolics)

I'm familiar with Statice, sold by Symbolics Inc.  The Statice command
"Update Database Schema" brings an existing database into conformance
with a modified schema.  Changes are classified as either compatible
(lossless, i.e., completely information-preserving) or incompatible
(i.e., potentially information-losing in the current implementation).
Basically, any change is compatible except for the following:

    -- If an attribute's type changes, all such attributes extant
    are re-initialized (nulled out).  Note that Statice permits
    an attribute to be of type T, the universal type.  Such an
    attribute can then take on any value without schema
    modification or information loss.

    -- If a type's inheritance (list of parents) changes, the
    type must be deleted and re-created, losing all extant
    instances of that type. This is Statice's most serious
    current limitation.  The simplest workaround is to employ a
    database dumper/loader (either the one supplied by Symbolics
    or a customized one) to save the information elements and
    then reload them into the modified schema.

[lgm@iexist.att.com]


> Versant 

Versant provides schema evolution. But in the current release, only
leaf classes in the schema can be modified. Leaf classes can be
added, dropped, renamed and individual attributes and methods 
changed. The class instances are modified later as they are accessed.
There are no security mechanisms for preventing users from 
changing schema. Schema changes are done using a separate utility 
which compares files (with .sch extension) which contain new schema
definitions with those of a database and changes the database schema
so that there is no difference. In case of conflicting class names
or other situations user has control on resolving the conflict.
[h.subramanian@trl.OZ.AU]

I've been looking at the C++ database vendors. Versant has schema
evolution at the leaf class level only. They're trying to come up with
a good way to do it for the general case. They talk about using
versioning to mark class evolution. Then they want to test timestamps
when an object is retrieved to see whether its class has been changed.
If it has, they reformat the object to conform to the new definition
at that time.
[arc!chet@apple.com]


> Object Design

Object Design, to the best of my knowledge, do[es]n't support schema
evolution at this time.

[arc!chet@apple.com]


> Objectivity 

Objectivity, to the best of my knowledge, do[es]n't support schema
evolution at this time.

[arc!chet@apple.com] 


> ObjectStore

ObjectStore does not provide schema evolution as yet but it has
promised to provide schema evolution in the next release.
[h.subramanian@trl.OZ.AU]

	
> Ontos [formerly VBase] (Ontologic)

Ontos provides schema evolution. It allows any class to be modified.
The major drawback is that data does not migrate ie., instances are
not modified to adopt to the new class definition. So schema changes
can be done only on classes that do not contain instances and do not
have sub classes that contain instances.
[h.subramanian@trl.OZ.AU]

As a system for experiments, we are currently using ONTOS from
Ontologic Inc.  Unfortunately, there is no transparent concept of
schema evolution for populated database. Thus, we still investigate
how it works.
[Markus Tresch <tresch@inf.ethz.ch>]


> GemStone (Servio-Logic)

The authors reject the emulation scheme and the lazy conversion
approach as previously outlined.  Instead, they favor a mixed
strategy, which involves lazy conversion until the next garbage
collection, at which point all remaining old instances are upgraded.
(Their current implementation, however, does not yet support this
feature --- the conversion being done eagerly for the time being.)
They identify a list of constraints which must be preserved across
modification to type descriptions and to the inheritance hierarchy.
The authors then proceed to enumerate a number of categories of object
updates that are permitted, and what changes to the dependent
instances and subclasses must be performed in order to maintain the
integrity of the database (i.e., to preserve the above constraints).

REFS: 
	Robert Bretl, David Maier, Allan Otis, Jason Penney,
	Bruce Schuchardt, Jacob Stein, E. Harold Williams,
	Monty Williams. "The GemStone Data Management
	System."  Chapter 12 of "Object-Oriented Concepts,
	Databases and Applications", by Kim and Lockovsky.

[clamen]


> Base/OPEN (NMP-CAD)

A structurally object-oriented system (ie. methods are not stored),
only schema extension is supported.  Instances of older type-versions
are never converted, but can coexist in the database with newer
objects. 

[communication with tomas@basf.nmpcad.se]



			 <<< OTHER MODELS >>>

		       << Commercial Systems >>

> Pick

With Pick and its variants you only have problems if you want to
redefine an existing field.  Because of the way the data are stored
and the separation of the data and the dictionary you can define
additional fields in the dictionary without having to do anything to
the data - a facility which we have found very useful in a number of
systems.

There is no general facility to redefine an existing field - you just
make whatever changes are required in the dictionary then write an
Info Basic program to change the data.  We have seldom needed to do
this, but it has not been complicated to do.

[Geoff Miller <ghm@ccadfa.cc.adfa.oz.au>]


> IDL (Persistent Data Systems)

IDL is a schema definition language. Schema modifications are defined
in IDL, requiring ad-hoc offline transformations of the database, in
general.  A simple class of transformations can be handled by
IDL->ASCII and ASCII->IDL translators (i.e., integer format changes,
list->array, attribute addition).

[conversation with Ellen Borison of Persistent Data Systems]


			<< Research Systems >>

> IRIS (HP Labs)

Objects in the Iris system may acquire or lose types dynamically.
Thus, if an object no longer matches a changed definition, the user
can choose to remove the type from the object instead of modifying the
object to match the type.  In general, Iris tends to restrict class
modifications so that object modifications are not neccssary.  For
example, a class cannot be removed unless it has no instances and new
supertype-subtype relationships cannot be established.

REFS:
	D.H. Fishman, D. Beech, H.P. Cate, E.C. Chow, T.
	Connors, J.W. Davis, N. Derrett, C.G. Hock, W. Kent,
	P. Lyngbaek, B. Mahbod, M.A. Neimat, T.A. Tyan, M.C.
	Shan. "Iris: An Object-Oriented Database Management
	System".  ACM Transactions on Office Information
	Systems 5(1):48-69, Jan 1987.

[clamen]
--
Stewart M. Clamen			Internet:    clamen@cs.cmu.edu
School of Computer Science		UUCP: 	     uunet!"clamen@cs.cmu.edu"
Carnegie Mellon University		Phone: 	     +1 412 268 3620
Pittsburgh, PA 15213-3890, USA		Fax:	     +1 412 268 1793

jgk@osc.COM (Joe Keane) (06/04/91)

In article <CLAMEN.91May29225024@BYRON.SP.CS.CMU.EDU> clamen+@CS.CMU.EDU
compiles the following comments:
>Versant provides schema evolution. But in the current release, only
>leaf classes in the schema can be modified. Leaf classes can be
>added, dropped, renamed and individual attributes and methods 
>changed. The class instances are modified later as they are accessed.
>There are no security mechanisms for preventing users from 
>changing schema. Schema changes are done using a separate utility 
>which compares files (with .sch extension) which contain new schema
>definitions with those of a database and changes the database schema
>so that there is no difference. In case of conflicting class names
>or other situations user has control on resolving the conflict.
>[h.subramanian@trl.OZ.AU]

This is an accurate description.

>I've been looking at the C++ database vendors. Versant has schema
>evolution at the leaf class level only.  They're trying to come up with
>a good way to do it for the general case.  

It's true that we currently only support evolution of leaf classes.  However,
i'd like to point out that this is only an implementation restriction of the
current release, and there are no architectural or technical problems with
doing it.  As a practical matter, i haven't heard our customers complaining
about this restriction.  I'd guess that if you're changing the design of your
base classes, then you have more to worry about than evolving your current
databases.

>They talk about using
>versioning to mark class evolution. Then they want to test timestamps
>when an object is retrieved to see whether its class has been changed.
>If it has, they reformat the object to conform to the new definition
>at that time.
>[arc!chet@apple.com]

These are all things we do in our released product, with the restriction given
above.  A minor correction is that we use internal class version identifiers,
rather than comparing timestamps.

Disclaimer: I work for Versant.
--
Joe Keane, professional C programmer
jgk@osc.com (...!uunet!stratus!osc!jgk)