donovan@julius.csl.sri.com (Donovan Hsieh) (09/22/90)
Recently, the paper "Third-Generation Database System Manifesto" was published in the 1990 ACM SIGMOD Conference Proceedings. Its authors include some of the most influential researchers in the relational database community. The main theme of this paper recommends a set of characteristics for the next generation of database systems. Often, this becomes a discussion of whether a pure object-oriented model or an extended relational model is a better approach. The paper was also partly written in response to an earlier published paper, "The Object-Oriented Database System Manifesto," which was authored by some of the leading researchers in the OODB community. As a knowledgeable and impartial observer of this debate, I found that the arguments and criticisms that appeared in the paper "Third-Generation Database System Manifesto", which I will call "the manifesto" hereafter, contains both JUSTIFIED and MISLEADING opinions. Through this news file, I would like to comment objectively on the manifesto. I would also like to see opinions from others on this issue. The following comments are my own opinions, and are not those of my current employer or other organizations. The manifesto uses the term "third-generation database systems" to distinguish itself from the approach taken by the OODB community. It states that next generation databases should evolve and be extended from current relation systems, rather than be a new approach built from the ground up like OODBs. In the manifesto, it listed four groups of propositions that describe the fundamental requirements for next generation DBMS. I shall comment on those propositions where I feel they require further clarification. Less critical or generally agreeable propositions, such as "Inheritance is a good idea" (proposition 1.2), are not discussed in the following comments. Proposition 1.1 of the manifesto says that "A third generation DBMS must have a rich type systems." I agree that entirely new database systems such as OODBs are not needed to support abstract data types (ADT). But I doubt that ALL types can be added to or extended from the current relational systems. Stretching existing relational systems beyond their inherent limits would most likely cause an inefficient implementation. I feel that a pure OODB approach is more suitable and will be better able to provide full ADT features that are compatible with the existing object-oriented programming languages, such as C++. Proposition 1.3 of the manifesto says that "Functions, including database procedures and methods, and encapsulation are a good idea." However, it makes the criticism that some OODBs require users to use only functions to access data elements (attributes) of a collection (object instance). In fact, there are some OODB systems that allow an object class to specify public and private attributes, where a public attribute can be directly accessed by database query languages, and a private attribute can only be accessed through pre-defined methods to protect object integrity. Proposition 1.4 of the manifesto says that "Unique identifiers (UIDs) for records should be assigned by the DBMS only if a user-defined primary key is not available". It also argues that a human-readable, immutable primary key in relational systems is superior to the UID or OID used by OODBs. A UID in an OODB has different meanings and purposes than a primary key in a relational system. A UID guarantees that no two object instances will contain the same ID during the lifetime of the system. It also facilitates the internal referencing and representation of objects. Without a UID, the unique identification of an object must be properly defined and enforced in the schema. For example, although a social security number suffices to uniquely identify a person, it still must be defined as a unique primary key in the relational schema definition. It is also possible for a person to die, and then have his/her SS# mistakenly entered or reused without this error being caught by a relational system. Also, it is very costly when a primary key is CHANGED in a relational database, such as when someone's SS# is initially entered incorrectly. It must then be changed everywhere it was used as a foreign key. If a person is referenced by UID, the SS# only has to be changed in one place. Proposition 1.5 of the manifesto says that "Rules (triggers, constraints) will become a major feature in future systems. They should not be associated with a specific function or collection." I agree that the use of methods in OODBs to define arbitrary constraints, rules, or triggers is an undesirable approach. A more declarative language should be used. I prefer PROLOG-like languages, which are more declarative, expressive, and powerful than SQL. However, their query optimizers are also more difficult to implement. Proposition 2.1 of the manifesto says that "Essentially, all programmatic access to a database should be through a non-procedural, high-level access language". It argues that the navigational approach used in OODBs is undesirable and inefficient comparing with the use of non-procedural query languages in relational systems. I feel that this proposition is rather misleading. The manifesto claims that a well-written and well-tuned query optimizer can almost always produce a better execution method than a human. A query optimizer could probably do a better job for repetitive and straightforward accesses. However, there are cases where human navigation is required and a query optimizer cannot foresee all patterns and usages. An example is computing the transitive closure of a given parent object. First of all, the standard relational algebra does not support a query like "Find all children belonging to a given parent" in a single query expression (although some extended relational systems allow queries to compute transitive closures). A common solution is to implement a "for loop" in the application code to compute the closure one record at a time. The result is that the "select" query must be optimized for each loop (some smarter query optimizers will detect the looped query and stored its optimized query graph and execution method in pre-compiled modules so that they can be reused). Furthermore, the query optimizer cannot take advantage of buffered records because the next query will use the previous child value as its current parent search value that is most likely not in the same buffer (or page). On the other hand, an OODB user could use procedure calls to write the same loop without going through time-consuming optimizations and could dereference child pointers recursively. If the OODB schema defines a clustering based on this reference mode, users will be able to gain even more performance with fewer disk accesses because most child objects will have been cached into the buffer during the initial access. Another example would be a traversal of objects that involves computation, like a CAD application where some optimization of the connections between objects involves computation in the application language. I would argue that (1) navigation is much more "natural" for these computations than using a mixture of queries and programming, and (2) the mixture is inefficient for the reasons suggested above. As for the impact of schema evolution, I agree that the use of "views" in relational databases offers good insulation for applications from changes to the database schema definition. However, specifying the data elements with a declarative query language does not guarantee insulation if the primitive data element definition is changed. Also, some OODBs support "derived" objects, which provide a service like views. (Derived objects are defined procedurally or declaratively by a set of pre-conditions and post-conditions to instantiate or modify the objects.) In the same proposition, the manifesto questioned the performance benefit for OODBs that use low-level calls to navigate individual objects. It also criticized CAD programmers as being close-minded for not using query optimizers provided by databases systems. I feel that both arguments are rather misleading. First, there are various techniques proposed by many OODB researchers to address and resolve the performance issue. For example, a direct memory map technique currently used by one OODB vendor has reported tremendous performance gains over other indexed or hash-based dereferencing techniques, such as those that were mentioned in the manifesto. Numerous published cases have also reported poor performance when using off-the-shelf relational databases to support object navigation, such as in the closure computation example described earlier. Proposition 2.2 of the manifesto says that "There should be at least two ways to specify collections, one using enumeration of members and one using the query language to specify membership." I agree that defining objects "intentionally" with declarative expressions does offer more powerful abstractions than defining objects "extentionally". Proposition 2.4 of the manifesto says that "Performance indicators have almost nothing to do with data models and must not appear in them." I disagree with this claim. Although performance is heavily influenced by individual implementation techniques, there exists inherent limitations on the performance achievable for the underlying data models. For example, the relational model explicitly disallows the storing of ordered tuples. This makes it very inefficient to represent lists, and users are forced to sort on a sequence number implemented by their applications. It is always possible to extend existing database models with new features and constraints through arbitrary implementations. But the end result would be undesirable if the extension exceeds the limitations of the model, or lacks the support of a formal mathematical representation. Proposition 3.1 and 3.2 of manifesto say that "Third generation DBMSs must be accessible from multiple HLLs" and "Persistent X for a variety of Xs is a good idea. They will all be supported on top of a single DBMS by compiler extensions and a (more or less) complex run time system". In theory, I agree that next-generation databases (either third-generation or OODB) should be accessible from multiple HLLs (High Level Languages), and the DBMS should provide a multiple run-time type translations between declarative query languages and HLLs. However it is impractical for DBMSs to support all HLLs. For example, many MIS programmers are interested in adopting new object-oriented technologies (that is, to use object-oriented design methodology and object-oriented programming languages) to implement new MIS applications if they are given the opportunity to do so rather than revamping and retrofitting existing COBOL code. If they are given the opportunity to choose a DBMS to match with their new object-oriented applications, most likely they will use a fully supported OODB product because it provides a better match. Many CAD/CAM and CASE software vendors have long abandoned their use of (extended) relational DBMSs because of poor performance and data modeling capabilities. Another important reason is because they also have adopted object-oriented programming languages as their language of choice. It is natural for these software vendors to use OODBs rather than stretching existing (extended) relational systems to match with their needs. The advantage for OODB and object-oriented programming languages are that they have very little backward compatibility burden as third-generation DBMS does. Although providing a gateway to access old DBMSs from OODBs is helpful for OODBs, but it is not critical. A similar situation occurred when computer industry adopted the cheaper and faster RISC processors over the old CISC. History has proven that the migration is the correct choice. In relational databases, the type "impedance mismatch" between SQL and HLLs have long been criticized as being inefficient and unnatural. Even if the third-generation DBMSs provide brilliant ways to bridge the gap between all HLLs and SQL, new object-oriented users will always opt for the OODB because they are a natural fit. As for OODBs, although the lack of declarative query languages and a formal object algebra/calculus make it less intuitive for end users to use currently, many researchers have proposed different solutions and approaches to resolve this deficiency. We must allow more time for this new technology to be refined and improved, just as it took more than a decade for relational databases to become mature and popular, and replace the old network and hierarchical databases. In summary, I feel that there is room for both technologies to co-exist, and new database models will always be proposed to address existing deficiencies. We will probably see some fusion of both database approaches in the near future that will benefit database users. In the long run, I foresee OODBs replacing (extended) relational DBMSs in selected market segments. I would also predict that the next wave after OODBs will be fully-integrated, intelligent database (or knowledge-based) systems that will combine both AI and database technologies. Donovan Hsieh Computer Science Lab SRI International Menlo Park, CA
rmy@beach.cis.ufl.edu (Rasthiyaadu Yakaa) (09/25/90)
Oh, how I agree .... Having read both articles, i.e. The Object Oriented Database System Manifesto and The Third Generation Database System Manifesto, I was pretty dismayed with the approach taken by the latter. Being in the midst of a Ph.D in databases, and being quite aware of the current developments in the field (both research and applied), it seemed that the latter article, the 3G DB manifesto, viz, The Third Generation Database System Manifesto presented on many occasions an anti-OODB slant, taking (as the posting by Hsieng stated) quite misleading arguments to make a case. As the posting indicated, the 3G manifesto contained both JUSTIFIABLE and MISLEADING arguments. The JUSTIFIABLE arguments MUST be presented -- after all this is a technical paper, and it should be expected. There are no reasons or excuses for presenting MISLEADING arguments. Clearly, the database community is the loser, if articles can take such views. The OODB manifesto, on the other hand, made a valid attempt to define what features an OODB must contain. Clearly there is a place for extended relational systems, and the Third Generation Database System Manifesto should have made a case accordingly, presenting and painting an unbiased portrait. Taking a dig at OODBMS (in spite of whatever shortcomings, they may have) is turning one's back on emerging technology. If OODBMSs have any major shortcomings, then either solutions to overcome these will appear or OODBMSs will reach a level where they are reasonably useful or else, OODBMS will fade away. From the plethora of researchers and organizations currently active in OODBMSs, it seems that OODBMSs are clearly alive and kicking .. if extended relational systems were superior, I for one personally, believe that the amount of research and development carried out on OODBMSs will be far lesser. BTW, I am sometimes puzzled to see extended relational systems being called OODBMSs. Recently, in this newsgroup, POSTGRES was described as being an OODBMS by someone from the postgres group at Berkeley. Is is possible that some extended relational systems can be relational and also object oriented BUT not vice versa (i.e. an object oriented system is object oriented but not relational). Or are we simply playing with terminology. yaseen