[comp.object] Comment on the "Third-Generation Database System Manifesto"

donovan@julius.csl.sri.com (Donovan Hsieh) (09/22/90)

Recently, the paper "Third-Generation Database System Manifesto" was 
published in the 1990 ACM SIGMOD Conference Proceedings.  Its authors 
include some of the most influential researchers in the relational 
database community. The main theme of this paper recommends a set of
characteristics for the next generation of database systems. Often,
this becomes a discussion of whether a pure object-oriented model
or an extended relational model is a better approach. The paper was 
also partly written in response to an earlier published paper, 
"The Object-Oriented Database System Manifesto," which was authored by 
some of the leading researchers in the OODB community.

As a knowledgeable and impartial observer of this debate, I found that 
the arguments and criticisms that appeared in the paper "Third-Generation 
Database System Manifesto", which I will call "the manifesto" hereafter, 
contains both JUSTIFIED and MISLEADING opinions. Through this news file, 
I would like to comment objectively on the manifesto.  I would also like 
to see opinions from others on this issue. The following comments are my own 
opinions, and are not those of my current employer or other organizations.
 
The manifesto uses the term "third-generation database systems" to
distinguish itself from the approach taken by the OODB community. It states 
that next generation databases should evolve and be extended from current 
relation systems, rather than be a new approach built from the ground up 
like OODBs. In the manifesto, it listed four groups of propositions that
describe the fundamental requirements for next generation DBMS. I shall 
comment on those propositions where I feel they require further clarification.
Less critical or generally agreeable propositions, such as "Inheritance is a 
good idea" (proposition 1.2), are not discussed in the following comments.

Proposition 1.1 of the manifesto says that "A third generation DBMS must 
have a rich type systems." I agree that entirely new database systems 
such as OODBs are not needed to support abstract data types (ADT). But I 
doubt that ALL types can be added to or extended from the current 
relational systems. Stretching existing relational systems beyond their 
inherent limits would most likely cause an inefficient implementation. 
I feel that a pure OODB approach is more suitable and will be better able 
to provide full ADT features that are compatible with the existing 
object-oriented programming languages, such as C++.

Proposition 1.3 of the manifesto says that "Functions, including database
procedures and methods, and encapsulation are a good idea." However, it 
makes the criticism that some OODBs require users to use only functions to 
access data elements (attributes) of a collection (object instance). In fact, 
there are some OODB systems that allow an object class to specify public 
and private attributes, where a public attribute can be directly accessed 
by database query languages, and a private attribute can only be accessed 
through pre-defined methods to protect object integrity. 

Proposition 1.4 of the manifesto says that "Unique identifiers (UIDs) for
records should be assigned by the DBMS only if a user-defined primary key is 
not available". It also argues that a human-readable, immutable primary key in 
relational systems is superior to the UID or OID used by OODBs. A UID in an 
OODB has different meanings and purposes than a primary key in a relational 
system. A UID guarantees that no two object instances will contain the 
same ID during the lifetime of the system. It also facilitates the internal 
referencing and representation of objects. Without a UID, the unique 
identification of an object must be properly defined and enforced in 
the schema. For example, although a social security number suffices to 
uniquely identify a person, it still must be defined as a unique primary 
key in the relational schema definition. It is also possible for a person 
to die, and then have his/her SS# mistakenly entered or reused without 
this error being caught by a relational system. 

Also, it is very costly when a primary key is CHANGED in a relational 
database, such as when someone's SS# is initially entered incorrectly. 
It must then be changed everywhere it was used as a foreign key. If a 
person is referenced by UID, the SS# only has to be changed in one place.

Proposition 1.5 of the manifesto says that "Rules (triggers, constraints) 
will become a major feature in future systems. They should not be associated 
with a specific function or collection." I agree that the use of methods 
in OODBs to define arbitrary constraints, rules, or triggers is an 
undesirable approach. A more declarative language should be used. I 
prefer PROLOG-like languages, which are more declarative, expressive, 
and powerful than SQL. However, their query optimizers are also more 
difficult to implement.

Proposition 2.1 of the manifesto says that "Essentially, all programmatic 
access to a database should be through a non-procedural, high-level access 
language".  It argues that the navigational approach used in OODBs is 
undesirable and inefficient comparing with the use of non-procedural query 
languages in relational systems. I feel that this proposition is rather 
misleading. The manifesto claims that a well-written and well-tuned query 
optimizer can almost always produce a better execution method than a human.
A query optimizer could probably do a better job for repetitive and 
straightforward accesses. However, there are cases where human navigation 
is required and a query optimizer cannot foresee all patterns and usages. An 
example is computing the transitive closure of a given parent object. First 
of all, the standard relational algebra does not support a query like
"Find all 
children belonging to a given parent" in a single query expression (although 
some extended relational systems allow queries to compute transitive 
closures). A common solution is to implement a "for loop" in the application 
code to compute the closure one record at a time. The result is that the 
"select" query must be optimized for each loop (some smarter query optimizers
will detect the looped query and stored its optimized query graph and execution
method in pre-compiled modules so that they can be reused). Furthermore, the 
query optimizer cannot take advantage of buffered records because the next 
query will use the previous child value as its current parent search value
that is most likely not in the same buffer (or page). On the other hand, an 
OODB user could use procedure calls to write the same loop without going 
through time-consuming optimizations and could dereference child pointers 
recursively. If the OODB schema defines a clustering based on this reference 
mode, users will be able to gain even more performance with fewer disk 
accesses because most child objects will have been cached into the buffer 
during the initial access.

Another example would be a traversal of objects that involves computation, 
like a CAD application where some optimization of the connections between 
objects involves computation in the application language. I would argue that 
(1) navigation is much more "natural" for these computations than using a 
mixture of queries and programming, and (2) the mixture is inefficient for 
the reasons suggested above.

As for the impact of schema evolution, I agree that the use of "views" in 
relational databases offers good insulation for applications from changes 
to the database schema definition. However, specifying the data elements 
with a declarative query language does not guarantee insulation if the 
primitive data element definition is changed. Also, some OODBs support 
"derived" objects, which provide a service like views. (Derived objects 
are defined procedurally or declaratively by a set of pre-conditions and 
post-conditions to instantiate or modify the objects.)

In the same proposition, the manifesto questioned the performance benefit 
for OODBs that use low-level calls to navigate individual objects. It also
criticized CAD programmers as being close-minded for not using query 
optimizers provided by databases systems. I feel that both arguments are 
rather misleading. First, there are various techniques proposed by many 
OODB researchers to address and resolve the performance issue. For example, 
a direct memory map technique currently used by one OODB vendor has reported 
tremendous performance gains over other indexed or hash-based dereferencing 
techniques, such as those that were mentioned in the manifesto. Numerous 
published cases have also reported poor performance when using off-the-shelf 
relational databases to support object navigation, such as in the closure 
computation example described earlier.

Proposition 2.2 of the manifesto says that "There should be at least two 
ways to specify collections, one using enumeration of members and one using 
the query language to specify membership." I agree that defining objects 
"intentionally" with declarative expressions does offer more powerful 
abstractions than defining objects "extentionally".

Proposition 2.4 of the manifesto says that "Performance indicators have 
almost nothing to do with data models and must not appear in them." I 
disagree with this claim. Although performance is heavily influenced by 
individual implementation techniques, there exists inherent limitations on 
the performance achievable for the underlying data models. For example, the 
relational model explicitly disallows the storing of ordered tuples. This 
makes it very inefficient to represent lists, and users are forced to sort 
on a sequence number implemented by their applications.

It is always possible to extend existing database models with new features
and constraints through arbitrary implementations. But the end result would
be undesirable if the extension exceeds the limitations of the model, or 
lacks the support of a formal mathematical representation. 

Proposition 3.1 and 3.2 of manifesto say that "Third generation DBMSs must
be accessible from multiple HLLs" and "Persistent X for a variety of Xs
is a good
idea. They will all be supported on top of a single DBMS by compiler extensions
and a (more or less) complex run time system". In theory, I agree that 
next-generation databases (either third-generation or OODB) should be
accessible 
from multiple HLLs (High Level Languages), and the DBMS should provide a
multiple
run-time type translations between declarative query languages and HLLs.
However
it is impractical for DBMSs to support all HLLs. For example, many MIS
programmers 
are interested in adopting new object-oriented technologies (that is, to use
object-oriented design methodology and object-oriented programming languages) 
to implement new MIS applications if they are given the opportunity to do so 
rather than revamping and retrofitting existing COBOL code. If they are given
the opportunity to choose a DBMS to match with their new object-oriented 
applications, most likely they will use a fully supported OODB product because
it provides a better match. 

Many CAD/CAM and CASE software vendors have long abandoned their use of 
(extended) relational DBMSs because of poor performance and data modeling 
capabilities. Another important reason is because they also have adopted 
object-oriented programming languages as their language of choice. It is 
natural for these software vendors to use OODBs rather than stretching
existing (extended) relational systems to match with their needs.
The advantage for OODB and object-oriented programming languages are that
they have very little backward compatibility burden as third-generation DBMS
does. Although providing a gateway to access old DBMSs from OODBs is helpful
for OODBs, but it is not critical. A similar situation occurred when computer 
industry adopted the cheaper and faster RISC processors over the old CISC. 
History has proven that the migration is the correct choice. 

In relational databases, the type "impedance mismatch" between SQL and 
HLLs have long been criticized as being inefficient and unnatural. Even if the
third-generation DBMSs provide brilliant ways to bridge the gap between
all HLLs
and SQL, new object-oriented users will always opt for the OODB because
they are a natural fit.

As for OODBs, although the lack of declarative query languages and a formal 
object algebra/calculus make it less intuitive for end users to use currently,
many researchers have proposed different solutions and approaches to resolve 
this deficiency. We must allow more time for this new technology to be refined
and improved, just as it took more than a decade for relational databases to 
become mature and popular, and replace the old network and hierarchical 
databases.

In summary, I feel that there is room for both technologies to co-exist, 
and new database models will always be proposed to address existing 
deficiencies.  We will probably see some fusion of both database approaches 
in the near future that will benefit database users. In the long run, I 
foresee OODBs replacing (extended) relational DBMSs in selected market 
segments. I would also predict that the next wave after OODBs will be 
fully-integrated, intelligent database (or knowledge-based) systems that 
will combine both AI and database technologies.




Donovan Hsieh

Computer Science Lab
SRI International
Menlo Park, CA

rmy@beach.cis.ufl.edu (Rasthiyaadu Yakaa) (09/25/90)

Oh, how I  agree ....

Having read both articles, i.e. The Object Oriented Database System 
Manifesto and The Third Generation Database System Manifesto, I was
pretty dismayed with the approach taken by the latter. Being in the
midst of a Ph.D in databases, and being quite aware of the current
developments in the field (both research and applied), it seemed that
the latter article, the 3G DB manifesto, viz, The Third Generation 
Database System Manifesto presented on many occasions an anti-OODB slant, 
taking (as the posting by Hsieng stated) quite misleading arguments to 
make a case. As the posting indicated, the 3G manifesto contained both
JUSTIFIABLE and MISLEADING arguments. The JUSTIFIABLE arguments MUST be 
presented -- after all this is a technical paper, and it should be 
expected. There are no reasons or excuses for presenting MISLEADING 
arguments. Clearly, the database community is the loser, if articles
can take such views. The OODB manifesto, on the other hand, made a
valid attempt to define what features an OODB must contain.

Clearly there is a place for extended relational systems, and the Third
Generation Database System Manifesto should have made a case accordingly,
presenting and painting an unbiased portrait. Taking a dig at OODBMS
(in spite of whatever shortcomings, they may have) is turning one's
back on emerging technology. If OODBMSs have any major shortcomings,
then either solutions to overcome these will appear or OODBMSs will
reach a level where they are reasonably useful or else, OODBMS
will fade away. From the plethora of researchers and organizations
currently active in OODBMSs, it seems that OODBMSs are clearly
alive and kicking .. if extended relational systems were 
superior, I for one personally, believe that the amount of research
and development carried out on OODBMSs will be far lesser. 

BTW, I am sometimes puzzled to see extended relational systems
being called OODBMSs. Recently, in this newsgroup, POSTGRES was
described as being an OODBMS by someone from the postgres group
at Berkeley. Is is possible that some extended relational systems
can be relational and also object oriented BUT not vice versa
(i.e. an object oriented system is object oriented but not relational).
Or are we simply playing with terminology.

yaseen