[comp.object] Queries and navigation in OODBs

plogan@apd.mentorg.com (Patrick Logan) (06/06/91)

In article <1991May31.175612.16655@objy.com> bobm@server.Berkeley.EDU (Bob Muller) writes:
   In article <1991May30.003525.21161@cs.ucla.edu>, rowley@bath.cs.ucla.edu (Michael T Rowley) writes:
   |> In article <1991May29.134314.6850@odi.com>, dlw@odi.com (Dan Weinreb) writes:
   |> |> Each query simply needs to be handed a "collection" object.
   |> |> So a typical query might be "find all of the employees within
   |> |> this set of employees for which the salary is greater than
   |> |> 42"
   |> 
   |> It may be true that associative queries can work with arbitrary
   |> collections, ... It is desirable to be able to specify a query
   |> by stating the conditions which the result objects must meet ---
   |> without specifying a navigational strategy for retrieving the
   |> objects.
   |> 
   |> All queries must start with some known object.  This may be a
   |> globally known object (relation or class) or it may be the
   |> result of a previous query.
   |> 
   |> Relational databases ... easily reach all objects in
   |> the database from queries starting with only globally known objects
   |> (relations).

Relations and _tables_. The tables are the globally known objects. The
relations relate them. In an OODB there could be a top level
collection of employees. This could be the set of employees mentioned
in the example query Dan Weinreb used above, e.g. "Find all employees
in the top level employee collection for which...".

   |> In the example you wrote above:
   |> 
   |>   "find all of the employees within this set of employees
   |>    for which the salary is greater than 42"
   |> 
   |> the important words are "within this set of employees".  The
   |> interesting set of employees may only be reachable through a
   |> multi-step navigation through the network.

This could be true. There could be a top level collection of
departments, each with a collection of employees. Then the query would
have to state which department's employees should be used, e.g. "Find
all employees in all departments for which..." or "Find all employees
in department A for which...". Or there could be both the top level
collection and each department collection. Each employee could still
be stored just once. Essentially, any graph you create in your
run-time structures can be stored in the database. The relational
model has all top level tables (collections) and relationships between
them, there is no opportunity to store nested tables. In _some_
applications, the top level tables would have to be restructured to
represent the more complex graph at run time. This is supposed to be
the advantage of OODBs, they represent the graph more efficiently. If
you are maintaining the furniture inventory in a building, a
relational DB is probably better. If you are designing buildings, an
OODB is probably better.

   |> Hence, I think all objects in an OODB should be easily reachable
   |> from globally known objects.  The most obvious candidates for
   |> such objects in an OODB would be collections representing the
   |> extents of the classes.

   Sorry for including all of the above, but my response won't make much sense
   without it.

Ditto, but I did more editing.

   The key to understanding declarative query languages is in understanding
   scope.  Mr. Rowley is correct in supposing that there must be globally known
   objects such as relations.  You've got to use some kind of global name in
   the declaration of what you want.  However, where he errs is in thinking that
   the only valid candidate for such names are classes or types.

That is, I think, because he failed to think of tables as instances
(objects) in a database. He was comparing tables to classes (types)
and so he was comparing apples to oranges.
-- 
Patrick Logan, Try: plogan@dad.mentor.com, plogan@ws.mentor.com,
plogan@mentorg.com or substitute patrick_logan for plogan and try
that. You can also try going through uunet!(mntgfx, mentorg, mentor.com)!plogan
[Can you tell things are changing around here?]

bobm@server.Berkeley.EDU (Bob Muller) (06/11/91)

In article <1991Jun6.143624.2485@apd.mentorg.com>, plogan@apd.mentorg.com (Patrick Logan) writes:
|> In article <1991May31.175612.16655@objy.com> bobm@server.Berkeley.EDU (Bob Muller) writes:
|>    In article <1991May30.003525.21161@cs.ucla.edu>, rowley@bath.cs.ucla.edu (Michael T Rowley) writes:
|>    |> In article <1991May29.134314.6850@odi.com>, dlw@odi.com (Dan Weinreb) writes:
	---- lots of stuff deleted -------
|> Relations and _tables_. The tables are the globally known objects. The
|> relations relate them. In an OODB there could be a top level
|> collection of employees. This could be the set of employees mentioned
|> in the example query Dan Weinreb used above, e.g. "Find all employees
|> in the top level employee collection for which...".
|> 
	-- More stuff deleted ----------
|> 
|> That is, I think, because he failed to think of tables as instances
|> (objects) in a database. He was comparing tables to classes (types)
|> and so he was comparing apples to oranges.

Well, I've gotten the included part down to half a page, anyway :).

This whole issue is pretty interesting, and I've heard lots of arguments for various
positions on it.

First, a bit of clarification:  a "table" is the same thing as a "relation" in a
relational database.  The term relation is the technical name, the term table is the
"marketing" name that SQL chose to use (as well as "column" for "attribute" and "row"
for "tuple").  I wonder why ?-).  The whole point of the relational calculus is that
relationships between tuples, rows, entities, objects, or whatever you want to call
them is in the data, not explicitly represented through relationship structures.  This
was the answer that Codd gave to the hierarchical and network DBMS people.  When you
want to relate two tables, you join them on the attributes--there is no other way to
relate them in a relational database.

Now, on to the messy part.  How does the relation correspond to the structures in an
OODBMS?  There are at least two ways to look at it, and they aren't really mutually
exclusive.  First, you can make the table/relation correspond to class, and the rows of
the table correspond to objects of the class.  This works until you consider inheritance,
which implies that a row in one table "is-a" row in another table as well, which is not
possible in a relational database.  Second, you can make the table correspond to an
object, with the rows as internal data in the object.  That is, a collection object
containing a bunch of other objects (presumably with a single tuple type).  This doesn't 
behave as you would expect a "relation" to behave, and implementing something like SQL on 
top of this is not possible.  Manipulating objects of unknown cardinality is quite
difficult mathematically.  This approach requires a pretty complete development of the
notion of a tuple type, and you would need to be able to create new tuple types dynamically
as a consequence of projection of the contents of the collection.

I've come down on the side of the first option--I would like to treat SQL tables as
classes or types (and hence be able to use them to declare columns, extending the SQL
type system through embedding objects) and SQL rows as instances of the class.  I
would also add inheritance and subclassing and object identity, so that a row in one
table may actually be the same thing as a row in another table (more or less like a
dynamic view, I guess).  The big benefit here is type safety--you don't have tuple types,
just object types (each "tuple" becomes an object typed with the "table" class).  But
you still need to be able to generate dynamic types because of projection--they're just
not tuple types, just classes with data members; no need to extend the mathematics of
the type system.  Interestingly, you can develop collection classes, but a "collection"
is not at the table level, just at the row or object level, with the collected objects
contained or nested within each collection object.

The original issue of how to specify queries using global names becomes an issue with
using the global type names and the abstract interface (data, function, and association
members) to declare what objects you want to retrieve.  If you specify a collection
class, you get back a collection; you could use the methods on the collection class to
project any of the internal objects into new classes.  Using the above example, you
don't ask for THE employee collection, but AN employee collection (there can be more
than one, possibly with overlapping contents).  If "employee" is a type, you ask for 
employees, not a collection object; this produces an iterator that lets you iterate over 
the employee objects.  You can see that this is saying the same thing in a slightly 
different way; the benefit is that you don't need to add any concepts to the OO model at 
the top level.

I'd really like to hear from anybody out there with opinions on this stuff.
-- 
    -- Bob Muller
       Objectivity, Inc.
       bobm@objy.com