[comp.databases] OO DBMSs

jack@odi.com (Jack Orenstein) (08/21/89)

Here are replies to some recent questions that have come up in the OO
DBMS discussions. The answers are, for the most part, specific to the
OO DBMS being built at Object Design, but will often, I believe, apply
to competitors' products as well.

David Masterson writes:
   
   Based on Jack Orenstein's message, I have a couple of questions:
   
   1. In implementing an OODB on top of C++ using the notion of persistent and
   transient type objects, when you refer to information in the OODB, is it
   always by an object identifier?  How, therefore, would you find objects
   meeting some qualification if you don't know its identifier?  Is this even a
   type of query you would ask in an OODB world?  (you ALWAYS know the identifier
   because even a qualification would be wrapped in an object which contains the
   identifier?)

It will often be the case that object ids are known because they are
stored in persistent variables. For example, a persistent variable of
type part* stores the id of a part.

In other cases, an id will not be known, but properties of the object
can be described as part of a query. Queries are expressed using
existing C++ syntax for control (i.e. boolean) expressions. For
example, given a set of parts (which may contain both transient and
persistent instances), queries can be written to ask for all parts
whose weight exceeds a given amount, all parts containing a given
sub-part, all parts contained in a given part, etc. Compound queries
can be expressed also, e.g. find all parts containing a frammis-joint
linkage that were manufactured by Acme.

   
   2.  Again using the architecture of persistent and transient objects,  is a
   persistent object ever in memory?  Or is it just a transient copy of a
   persistent object that is in memory?  Then, how are persistent objects
   created?

Yes, the persistent objects themselves are manipulated by
applications. Copying isn't good enough since a copy of an object has
a different identity. (This might not be true in other languages, but
the idea of equating an object's id with its address is fundamental to
C++. Of course, it is possible to define a base "object" class, define
it to have an "id" data member, redefine initialization, =, and == to
work off this id, and then use "object" to derive all other classes,
but the space and time overhead will be significant). One example of
the difficulties that arise is that pointers to an object do not point
to copies of the object.

Copies of objects can be made, as is usually the case in C++, and the
semantics of C++ are preserved. I.e., the copy is a distinct object.
   
   
   
From D. C. Martin:

   Dan Weinreb of ODI writes:
   
       There should not be any special declaration for
       "pointers to persistent" or "pointers to possibly persistent" data as
       distinct from ordinary pointers.
       
   It would be nice if no one ever had to consider if a pointer was persistent
   or non-persistent, but someone will have to build the access methods and
   other low-level interface routines to your storage manager in order to
   provide this type of "pointer swizzling" to the application developer.
   At UW - Madison the Exodus Project is developing a language called E, which
   is a persistent C++ language designed to allow an individual to write an
   her own access methods, and to a certain extent pointers to resident objects
   are equivalent to persistent.  However, for this equivalency the pointer
   types must be DB pointers, i.e. dbchar* != char*, but a persistent dbchar*
   is equivalent to a non-persistent dbchar*.

We are very familiar with the Exodus project, and with the E language.
While the type system of E is far preferable to that of a typical
host-language/DBMS combination, it still has two distinct, but
"parallel", type systems, and programmers have to be careful about the
use of db types. In our product, there will be a single type system,
that of C++. There is no fundamental reason why persistent and
transient types have to be distinguished in the language used by the
application programmer.

Unfortunately, the details of how "swizzling" works are proprietary,
so I can't discuss the issue.

   
       In particular, de-referencing a
       pointer has exactly the same semantics and syntax regardless of
       whether the objects are persistent or transient.  In general, data
       manipulation (storing, fetching, testing, adding, printing, field
       extraction, function calling, casting) looks exactly as it does for
       normal C++.
   
   What about dereferencing a pointer to a 40mb image?  Does this mean
   bringing the entire image into core?  There must be some low-level routines
   to allow the application programmer to inform the language that certain
   special methods should be used to store, fetch, etc... for special
   datatypes.

I'll have to take the 5th again, but I will say that there is no need
to bring in the entire 40mb image just to retrieve one byte of it.
   
   
Jack Orenstein
Object Design, Inc.

davidm@cimshop.UUCP (David Masterson) (08/23/89)

>Unfortunately, the details of how "swizzling" works are proprietary,
>so I can't discuss the issue.
>
Let me just ask then, is there anything inherit in your design that prevents
it from being put on top of a relational database system (you've expressed
requirements that prevent it [performance], but not design constraints)?  This
question has some implications in what I am currently doing.

David Masterson
uunet!cimshop!davidm
415-691-6311

ballou@nebula.ACA.MCC.COM (Nat Ballou) (08/25/89)

In article <1989Aug21.132525.3179@odi.com>, jack@odi.com (Jack Orenstein) writes:
> We are very familiar with the Exodus project, and with the E language.
> While the type system of E is far preferable to that of a typical
> host-language/DBMS combination, it still has two distinct, but
> "parallel", type systems, and programmers have to be careful about the
> use of db types. In our product, there will be a single type system,
> that of C++. There is no fundamental reason why persistent and
> transient types have to be distinguished in the language used by the
> application programmer.
> 

I am interested in how the Object Design people intend to do schema
evolution in their forthcoming system.  Suppose I make a persistent subclass
of a non-persistent class.  I then go on to populate the persistent class.
How does one add/drop attributes, superclasses, indices, etc.  If a
non-persistent superclass of my persistent class changes (i.e., an
attribute/method is deleted/added, etc.) what happens to the instances in
the database.  Will all programs compiled against the schema have to be
recompiled after such a change?  I'm more interested in semantics and theory
than in implementation details.

> Jack Orenstein
> Object Design, Inc.

Nat Ballou
Orion Project
MCC

dlw@odi.com (Dan Weinreb) (08/29/89)

In article <459@cimshop.UUCP> davidm@cimshop.UUCP (David Masterson) writes:

   Let me just ask then, is there anything inherit in your design that prevents
   it from being put on top of a relational database system (you've expressed
   requirements that prevent it [performance], but not design constraints)?  This
   question has some implications in what I am currently doing.

Yes, there is; our design could not plausibly be implemented on top of
a relational database system.  (That is, if you did, you'd lost most
of its benefits.)  The architecture just doesn't fit together that
way.

Dan Weinreb		Object Design, Inc.		dlw@odi.com

craig@gpu.utcs.utoronto.ca (Craig Hubley) (09/04/89)

(John Orenstein requested analogies for 'adding a column' in O-O terms,
 sorry I haven't the article on hand)

I think this issue is important because it may help to define relational
operations as a proper subset of object-oriented operations, and let
OODBMS developers provide all standard relational capabilities in their
systems, which I think is a worthwhile goal.  I have also included some
discussion of Linda, which is a DBMS in Krueger's sense, and well worth
investigating as a unifying access metaphor.  

DATA-BASED ANALOGY

If you accept the analogy of C++ classes to relational table definitions,
which seems relatively sound to me, then you might also accept the analogy
of 'adding a column' to mean adding a data item to each instance of the class.
Of course, to be truly object-oriented, you are actually adding a set of
*legal operations* on each object of that class, including the ability to
alter and retrieve that data item. There may be other, more involved meanings,
but that is a sort of default.  Alternatively, you could think of inserting
these 'more involved meanings', which is addressed below.

Clearly, you may also be adding a set of more involved or advanced operations
or virtual capabilities, but if the data item or 'column' didn't exist before
then you aren't relying on it anywhere.  Presumably the implementation of 
other operations (methods) on the class might change due to the prescence
(or absence) of this new item, but that doesn't change the availability of
these methods from outside, which in terms of the access algebra is the only
issue.

So if I add 'Country' to a relational table containing company addresses,
then by default I would add methods like 'SetCountry()' and 'GetCountry()',
but these are optional.  If the definition of 'SendMail()' changes because
of the new 'Country' field, that is not an issue at the interface.

METHOD-BASED ANALOGY

Another interpretation of 'adding a column' is 'adding a method',
and this is probably a more supportable analogy.  Adding a new method
may or may not involve adding new data to all instances of the class.
You could think of 'inserting' this method in terms of defining a new
function in an incremental compiler for an object-oriented language.
In fact, it could be implemented this way.

LINDA

Finally, if one accepts the definition of a 'database' as 'shared access to
persistent objects', then everyone should have a look at Linda, the Yale
language extensions developed for parallel processing.  In fact, Linda defines
a 'tuple space', like a simple table of variable-length records, that in fact
provides 'shared access to persistent objects'.  It is assumed in Linda that
the persistence is of short duration, but this assumption is in no way built
into the interface.  It would be relatively straightforward to extend Linda
to full relational DBMS capabilities, and ultimately to OODBMS status, in the
bargain gaining Linda's unification of all IPC and RPC schemes!

Linda's interface is quite simple, basically:
	in(tuple pattern)	removes a tuple from the tuple space
	out(tuple)		places a tuple in the tuple space
	rd(tuple pattern)	reads a tuple from the space, without removing
	eval(active tuple)	(one or more fields are actually a function
				call, when finished the tuple becomes passive
				and remains in the tuple space).

ref: "Linda in Context", April 1989 Communications of the ACM.

Linda's present conception of 'tuple pattern' is limited, and it inherits its
datatypes from the language these primitives are added to (the one-language
unification of DBMS and other programming constructs).  However, the idea of
tuples is very general, and were the system extended to work set-at-a-time,
accepting tuple pattern definitions in relational algebra, and active
tuple definitions as programs in relational algebra, Linda could just as
easily extend SQL !

For an object-oriented system, each field in the tuple can be thought of
as an object, and the operations as retrieving or emitting a set of objects,
or a container object holding the set, which would probably be better.

Linda is traditionally a preprocessor and implements its capabilities on
top of existing shared-memory, semaphore, or message port schemes, but
there are other approaches, including building 'tuple space' as a real DBMS. 
Contrary to it's abstract appearance, Linda has been added efficiently to
C, C++, PostScript, and other languages - one scalable supercomputer, the
Cogent XTM, even uses Linda to do *all* it's low-level IPC and RPC, even to
the level of mouse moves.  It is based on transputers and exploits their
1 microsecond context switches.  Their version, Kernel Linda, defines a
set of language-independent data types, though their primary interface is C++.


CONCLUSION

So it would seem that not only is it possible to define an access algebra
that is very useful in traditional computing approaches, it seems possible to
extend it to include database needs as well (that is, if we accept 'shared
access to persistent objects' as a good working definition).  Linda tuples
can be effectively made very shared and very persistent, as evidenced by the
Cogent implementation.  Although 'tuples' seem like a relational concept,
they are in fact only lists of fields, and these fields could contain object
tags as easily as anything else.  In fact, since the Linda algebra is
presently defined to be one-at-a-time, not set-at-a-time, it would be
easier to have the primitives work with container objects than with sets.
It seems thus more suited for object-oriented approaches.  The more so since
developers would have two selling points:  the DBMS and the unified and 
explicit metaphor for dealing with transient objects too!  As the Cogent
system shows, optimizing for this single shared-access metaphor can be
made very efficient and totally scalable.  This is the logical extension
of the 'one-language' advantage that the Object Design people claim.

    Craig Hubley			-------------------------------------
    Craig Hubley & Associates		"Lead, follow, or get out of the way"
    craig@gpu.utcs.utoronto.ca		-------------------------------------
    craig@gpu.utcs.toronto.edu    mnetor!utgpu!craig@uunet.UU.NET
    {allegra,bnr-vpa,decvax,mnetor!utcsri}!utgpu!craig    craig@utorgpu.bitnet
-- 
    Craig Hubley			-------------------------------------
    Craig Hubley & Associates		"Lead, follow, or get out of the way"
    craig@gpu.utcs.utoronto.ca		-------------------------------------
    craig@gpu.utcs.toronto.edu    mnetor!utgpu!craig@uunet.UU.NET
    {allegra,bnr-vpa,decvax,mnetor!utcsri}!utgpu!craig    craig@utorgpu.bitnet

render@m.cs.uiuc.edu (09/05/89)

Written  5:23 pm  Sep  3, 1989 by craig@gpu.utcs.utoronto.ca :
>(John Orenstein requested analogies for 'adding a column' in O-O terms,
>sorry I haven't the article on hand)
>
>I think this issue is important because it may help to define relational
>operations as a proper subset of object-oriented operations, and let
>OODBMS developers provide all standard relational capabilities in their
>systems, which I think is a worthwhile goal.  I have also included some
>discussion of Linda, which is a DBMS in Krueger's sense, and well worth
>investigating as a unifying access metaphor.  

There are other studies of the properties of changing OO schema and the
effects on an active database:

    Qing Li and Dennis McLeod, "Object Flavor Evolution in an Object-Oriented 
    Database System." In _Proceedings of the Conference on Office Information 
    Systems_, edited by Robert B. Allen.  March 1988, pp. 265--275.

    Jay Banerjee, Won Kim Hyoung-Joo Kim and Henry F. Korth,
    "Semantics and Implementation of Schema Evolution in Object-Oriented 
     Databases", ACM SIGMOD Notices, 16:3 (December 1987), pp. 311-322.

    Gia-toan Nguyen and Dominique Rieu, "Schema Evolution in Object-Oriented 
    Database Systems."  Technical Report no. 947, Unite De Recherche, 
    INRIA-Rocquencourt, December 1988.

hal.

perez@csc.ti.com (edward perez) (09/07/89)

>In article ???, Dan Weinreb writes:
>>In article <459@cimshop.UUCP> davidm@cimshop.UUCP (David Masterson) writes:
>>
>>   Let me just ask then, is there anything inherit in your design that prevents
>>   prevents it from being put on top of a relational database system?
>>
>Yes, there is; our design could not plausibly be implemented on top of
>a relational database system.  (That is, if you did, you'd lost most
>of its benefits.)  The architecture just doesn't fit together that
>way.

so, withouth getting into your proprietary details, what is it about your design
that prevented you from using an rdb ?  is it because you would be splitting up
objects into many relations, would you lack control of the underlying storage ?
what benefits would you lose from being on top of an rdb ?  has your company
written any papers describing the architecture to show why you can't use an rdb ?
will you have any info at oopsla '89 on this ??

"inquiring minds want to know."

edward perez			apranet: perez@csc.ti.com
texas instruments                 csnet: perez%ti-csl@relay.cs.net
dallas, tx.