[comp.object] Availability of class extension

rowley@bath.cs.ucla.edu (Michael T Rowley) (05/29/91)

On the topic of whether the extension of a class (a collection of all
its instances) should be available:

|> In article <4863@osc.COM>, jgk@osc.COM (Joe Keane) writes:
|> 
|> But how about `find all instances of a given class'?  What does it mean?  Find
|> all instances in memory?  In general only some of the objects you're working
|> with are actually in main memory at a given time, and the whole point of an
|> ODBMS is to hide whether objects are `in' or `out'.  Find all instances in all
|> databases you're connected to?  Again, which databases you're connected to
|> should not be such an important part of your state.  Find all instances in all
|> databases in existence?  This makes sense, but it has some obvious practical
|> problems.


This last approach is the approach taken by relational databases,
assuming you map 'relation' in relational databases to 'class' in
OODBs.  Relational database systems also manage to provide decent
concurrency control mechanisms in addition to this facility.  Is there
anything about OODBs which makes the problem of reaching all instances
of a class more difficult for them than it is for relational databases?

In general, a database which is only used for application programming
can get away without support for listing the extension of a class.
However, if there is to be a associative querry language, it must be
possible for the user to say "give me all objects which meet these
conditions".  One of the conditions would be the object's class,
since this determines what other aspects of the objects may be
specified in the conditions (in relation db's this means specifying
the relation).

When relational databases took over from network databases, one of
their main selling points was their support for an associative query
language.  If OODBs do not also support this capability, many people
will write them off, incorrectly, as souped-up network databases.  It
should be possible for OODBs to provide an associative query language,
as long as there is a facility for retrieving all instances of a
class.

Michael Rowley

dlw@odi.com (Dan Weinreb) (05/29/91)

In article <1991May28.232832.28284@cs.ucla.edu> rowley@bath.cs.ucla.edu (Michael T Rowley) writes:

								 Is there
   anything about OODBs which makes the problem of reaching all instances
   of a class more difficult for them than it is for relational databases?

No, there isn't.  An OODB can work either way.  It can be designed so
that it always maintains an explicit, semantically-visible extent for
all objects of a certain class within a certain database, or it can be
designed so that it does not necessarily do so.

   However, if there is to be a associative querry language, it must be
   possible for the user to say "give me all objects which meet these
   conditions".  

No, that's not true.  If you have an OODB that does not keep extents,
you can still have an associative query language.  Each query simply
needs to be handed a "collection" object.  So a typical query might be
"find all of the employees within this set of employees for which the
salary is greater than 42", just like the standard mathematical
notation {x element-of X | x.emp > 100} (for "element-of" read a
little epsilon).  The queries can be as complicated as you like; more
than one collection can be involved; and automatic optimization can be
performed.

So an OODB can have associative queries even if it does not
automatically maintain extents.  These are two separate, orthagonal
issues.

rowley@bath.cs.ucla.edu (Michael T Rowley) (05/30/91)

In article <1991May29.134314.6850@odi.com>, dlw@odi.com (Dan Weinreb) writes:
|> In article <1991May28.232832.28284@cs.ucla.edu> rowley@bath.cs.ucla.edu (Michael T Rowley) writes:
|> 
|>    However, if there is to be a associative querry language, it must be
|>    possible for the user to say "give me all objects which meet these
|>    conditions".  
|> 
|> No, that's not true.  If you have an OODB that does not keep extents,
|> you can still have an associative query language.  Each query simply
|> needs to be handed a "collection" object.  So a typical query might be
|> "find all of the employees within this set of employees for which the
|> salary is greater than 42", just like the standard mathematical
|> notation {x element-of X | x.emp > 100} (for "element-of" read a
|> little epsilon).  The queries can be as complicated as you like; more
|> than one collection can be involved; and automatic optimization can be
|> performed.
|> 
|> So an OODB can have associative queries even if it does not
|> automatically maintain extents.  These are two separate, orthagonal
|> issues.

It may be true that associative queries can work with of arbitrary
collections, in which case the term "associative query language" may
not be the best descriptor of the features I'm trying to describe.
Maybe a better description would be that the query language is
declarative, rather than imperative.  It is desirable to be able to
specify a query by stating the conditions which the result objects
must meet --- without specifying a navigational strategy for
retrieving the objects.

All queries must start with some known object.  This may be a globally
known object (relation or class) or it may be the result of a previous
query.  In network databases not very much could be retrieved by a
single query using only global objects, so the user would have to make
multiple queries, each building off the previously retrieved data.
In order to facilitate this, the query language looked very much like
an imperative programming language, complete with "cursors" to keep
track of past results.

Relational databases, on the other hand, easily reach all objects in
the database from queries starting with only globally known objects
(relations).  As a result, it is easy for these databases to provide a
declarative query language.

In the example you wrote above:

  "find all of the employees within this set of employees
   for which the salary is greater than 42"

the important words are "within this set of employees".  The
interesting set of employees may only be reachable through a multi-step
navigation through the network.  In which case, the user would be
reduced to writing imperative programs to retrieve his data.  I think
the user community will balk at this.

Hence, I think all objects in an OODB should be easily reachable from
globally known objects.  The most obvious candidates for such objects
in an OODB would be collections representing the extents of the
classes.

Another, less desirable solution would be to provide a single
collection representing every object in the database.  The problem
with this is that it would be hard to write the conditions for the
result objects, since the questions that can be asked of objects
depend on their types.

Michael Rowley

dlw@odi.com (Dan Weinreb) (05/31/91)

In article <1991May30.003525.21161@cs.ucla.edu> rowley@bath.cs.ucla.edu (Michael T Rowley) writes:

   the important words are "within this set of employees".  The
   interesting set of employees may only be reachable through a multi-step
   navigation through the network.  In which case, the user would be
   reduced to writing imperative programs to retrieve his data.  I think
   the user community will balk at this.

(1) May or may not.  If you are using a non-enforced-extent OODB, you
are perfectly free to maintain extents anyway.  Or there might be
several sets of employees, all of which are stored in global
variables.  A non-enforced-extent OODB gives you the freedom to build
up complex structures, but you don't have to use it if you don't want to.

(2) Which user community?  Different user communities have different
ideas about how they want to organize data.  Consider the existing
community of ECAD software developers.  They write in programming
languages such as Pascal or C++, and they generally don't store their
transistors and so on in database systems.  I think you'll find that
they don't all maintain a concept of "the set of all resistors" and so
on for every data type in the entire system.  Just because they want
to switch to using an OODB does not necessarily mean that they want to
change their whole idea of how to organize their data structures.  In
fact, it might even be a goal to *not* be forced into changing all of
the data structures.  Perhaps you would argue that they are all wrong;
that they really ought to change their data structures; that they have
been suffering for years with an inferior notion, and they are simply
unware of their own pain.  I don't know how many of them you'd
persuade.  A key question is why databases should be so different from
programming languages in this regard.  Surely questions of
persistence, concurrency control, and recovery have nothing to do with
it.  Why doesn't Pascal have a way to say "iterate over all the records
of type foo?", and why don't people think that this lack makes Pascal
unacceptable?

   Hence, I think all objects in an OODB should be easily reachable from
   globally known objects.  The most obvious candidates for such objects
   in an OODB would be collections representing the extents of the
   classes.

I certainly think any OODB should allow you to do this.  The only
question is whether it should force you to do this.

bobm@server.Berkeley.EDU (Bob Muller) (06/01/91)

In article <1991May30.003525.21161@cs.ucla.edu>, rowley@bath.cs.ucla.edu (Michael T Rowley) writes:
|> In article <1991May29.134314.6850@odi.com>, dlw@odi.com (Dan Weinreb) writes:
|> |> In article <1991May28.232832.28284@cs.ucla.edu> rowley@bath.cs.ucla.edu (Michael T Rowley) writes:
|> |> 
|> |>    However, if there is to be a associative querry language, it must be
|> |>    possible for the user to say "give me all objects which meet these
|> |>    conditions".  
|> |> 
|> |> No, that's not true.  If you have an OODB that does not keep extents,
|> |> you can still have an associative query language.  Each query simply
|> |> needs to be handed a "collection" object.  So a typical query might be
|> |> "find all of the employees within this set of employees for which the
|> |> salary is greater than 42", just like the standard mathematical
|> |> notation {x element-of X | x.emp > 100} (for "element-of" read a
|> |> little epsilon).  The queries can be as complicated as you like; more
|> |> than one collection can be involved; and automatic optimization can be
|> |> performed.
|> |> 
|> |> So an OODB can have associative queries even if it does not
|> |> automatically maintain extents.  These are two separate, orthagonal
|> |> issues.
|> 
|> It may be true that associative queries can work with of arbitrary
|> collections, in which case the term "associative query language" may
|> not be the best descriptor of the features I'm trying to describe.
|> Maybe a better description would be that the query language is
|> declarative, rather than imperative.  It is desirable to be able to
|> specify a query by stating the conditions which the result objects
|> must meet --- without specifying a navigational strategy for
|> retrieving the objects.
|> 
|> All queries must start with some known object.  This may be a globally
|> known object (relation or class) or it may be the result of a previous
|> query.  In network databases not very much could be retrieved by a
|> single query using only global objects, so the user would have to make
|> multiple queries, each building off the previously retrieved data.
|> In order to facilitate this, the query language looked very much like
|> an imperative programming language, complete with "cursors" to keep
|> track of past results.
|> 
|> Relational databases, on the other hand, easily reach all objects in
|> the database from queries starting with only globally known objects
|> (relations).  As a result, it is easy for these databases to provide a
|> declarative query language.
|> 
|> In the example you wrote above:
|> 
|>   "find all of the employees within this set of employees
|>    for which the salary is greater than 42"
|> 
|> the important words are "within this set of employees".  The
|> interesting set of employees may only be reachable through a multi-step
|> navigation through the network.  In which case, the user would be
|> reduced to writing imperative programs to retrieve his data.  I think
|> the user community will balk at this.
|> 
|> Hence, I think all objects in an OODB should be easily reachable from
|> globally known objects.  The most obvious candidates for such objects
|> in an OODB would be collections representing the extents of the
|> classes.
|> 
|> Another, less desirable solution would be to provide a single
|> collection representing every object in the database.  The problem
|> with this is that it would be hard to write the conditions for the
|> result objects, since the questions that can be asked of objects
|> depend on their types.
|> 
|> Michael Rowley

Sorry for including all of the above, but my response won't make much sense
without it.

I'm with Mr. Weinreb for the most part; but Mr. Rowley's approach is not
really in opposition, it just makes some invalid assumptions.

The key to understanding declarative query languages is in understanding
scope.  Mr. Rowley is correct in supposing that there must be globally known
objects such as relations.  You've got to use some kind of global name in
the declaration of what you want.  However, where he errs is in thinking that
the only valid candidate for such names are classes or types.  In fact, an
OODB can make available various kinds of storage extents as global objects, and
this can be orthogonal to the type system.  In a type forest, it can be a separate
type hierarchy (storage hierarchy), the objects of which can be found by either
declarative specification (if they have names, for example) or by navigation.

The second bad assumption made by Mr. Rowley is that if you don't have global
names you must navigate.  This may be true in the old navigational (and relational)
databases, but not in modern OODBs, most of which implement encapsulation and
functional composition in some way.  This kind of nested (or networked) scoping can
allow for both full data independence and for declarative access through constructs
such as Mr. Weinreb suggests.  So "within this set of employees" could be fully
declarative, not navigational, if the query language provides declarative constructs
for specifying the set using storage object declarations in addition to type declarations.
You could say "IN DB 'OBJY'", for example, to look only at employees in the Objectivity
employee database named "OBJY".  This is hardly navigational; what it does is to
allow the query system to do the navigation for you.  Obviously this language could
include the standard logical operators to combine extents in whatever way you wish.

But getting back to the original post (way back, I guess)--I think any DBMS must provide
a way to look at "all the objects of a certain class", regardless of logical storage
location or any other orthogonal attribute.  Clearly this is a useful kind of query.
Just as clearly, an OODBMS must not restrict you to this kind of query, as it may involve
too much overhead for the limited query you really want to do.  Certainly this is true
in engineering and CASE applications.  You also don't want queries to be limited to
"extents of classes", because you may want to mix objects of different types in a
certain extent, querying both at once; this is usually called "clustering", but looks
very different than the clustering in a relational database because of the
orthogonality.

So my general response is that you should talk about the problem declaratively, not
navigationally, both for type-based and for storage-based queries.  Talking about
"reaching" objects from global objects is not the point; you should be able to
describe _any_ set of objects with a fully declarative language that takes advantage
of whatever global or local names are available, and in an OODBMS there are more such 
names than just the global type names.
-- 
The opinions expressed here are mine, not my employer's.

    -- Bob Muller
       Objectivity, Inc.
       bobm@objy.com