jack@odi.com (Jack Orenstein) (08/21/89)
Here are replies to some recent questions that have come up in the OO DBMS discussions. The answers are, for the most part, specific to the OO DBMS being built at Object Design, but will often, I believe, apply to competitors' products as well. David Masterson writes: Based on Jack Orenstein's message, I have a couple of questions: 1. In implementing an OODB on top of C++ using the notion of persistent and transient type objects, when you refer to information in the OODB, is it always by an object identifier? How, therefore, would you find objects meeting some qualification if you don't know its identifier? Is this even a type of query you would ask in an OODB world? (you ALWAYS know the identifier because even a qualification would be wrapped in an object which contains the identifier?) It will often be the case that object ids are known because they are stored in persistent variables. For example, a persistent variable of type part* stores the id of a part. In other cases, an id will not be known, but properties of the object can be described as part of a query. Queries are expressed using existing C++ syntax for control (i.e. boolean) expressions. For example, given a set of parts (which may contain both transient and persistent instances), queries can be written to ask for all parts whose weight exceeds a given amount, all parts containing a given sub-part, all parts contained in a given part, etc. Compound queries can be expressed also, e.g. find all parts containing a frammis-joint linkage that were manufactured by Acme. 2. Again using the architecture of persistent and transient objects, is a persistent object ever in memory? Or is it just a transient copy of a persistent object that is in memory? Then, how are persistent objects created? Yes, the persistent objects themselves are manipulated by applications. Copying isn't good enough since a copy of an object has a different identity. (This might not be true in other languages, but the idea of equating an object's id with its address is fundamental to C++. Of course, it is possible to define a base "object" class, define it to have an "id" data member, redefine initialization, =, and == to work off this id, and then use "object" to derive all other classes, but the space and time overhead will be significant). One example of the difficulties that arise is that pointers to an object do not point to copies of the object. Copies of objects can be made, as is usually the case in C++, and the semantics of C++ are preserved. I.e., the copy is a distinct object. From D. C. Martin: Dan Weinreb of ODI writes: There should not be any special declaration for "pointers to persistent" or "pointers to possibly persistent" data as distinct from ordinary pointers. It would be nice if no one ever had to consider if a pointer was persistent or non-persistent, but someone will have to build the access methods and other low-level interface routines to your storage manager in order to provide this type of "pointer swizzling" to the application developer. At UW - Madison the Exodus Project is developing a language called E, which is a persistent C++ language designed to allow an individual to write an her own access methods, and to a certain extent pointers to resident objects are equivalent to persistent. However, for this equivalency the pointer types must be DB pointers, i.e. dbchar* != char*, but a persistent dbchar* is equivalent to a non-persistent dbchar*. We are very familiar with the Exodus project, and with the E language. While the type system of E is far preferable to that of a typical host-language/DBMS combination, it still has two distinct, but "parallel", type systems, and programmers have to be careful about the use of db types. In our product, there will be a single type system, that of C++. There is no fundamental reason why persistent and transient types have to be distinguished in the language used by the application programmer. Unfortunately, the details of how "swizzling" works are proprietary, so I can't discuss the issue. In particular, de-referencing a pointer has exactly the same semantics and syntax regardless of whether the objects are persistent or transient. In general, data manipulation (storing, fetching, testing, adding, printing, field extraction, function calling, casting) looks exactly as it does for normal C++. What about dereferencing a pointer to a 40mb image? Does this mean bringing the entire image into core? There must be some low-level routines to allow the application programmer to inform the language that certain special methods should be used to store, fetch, etc... for special datatypes. I'll have to take the 5th again, but I will say that there is no need to bring in the entire 40mb image just to retrieve one byte of it. Jack Orenstein Object Design, Inc.
davidm@cimshop.UUCP (David Masterson) (08/23/89)
>Unfortunately, the details of how "swizzling" works are proprietary, >so I can't discuss the issue. > Let me just ask then, is there anything inherit in your design that prevents it from being put on top of a relational database system (you've expressed requirements that prevent it [performance], but not design constraints)? This question has some implications in what I am currently doing. David Masterson uunet!cimshop!davidm 415-691-6311
ballou@nebula.ACA.MCC.COM (Nat Ballou) (08/25/89)
In article <1989Aug21.132525.3179@odi.com>, jack@odi.com (Jack Orenstein) writes: > We are very familiar with the Exodus project, and with the E language. > While the type system of E is far preferable to that of a typical > host-language/DBMS combination, it still has two distinct, but > "parallel", type systems, and programmers have to be careful about the > use of db types. In our product, there will be a single type system, > that of C++. There is no fundamental reason why persistent and > transient types have to be distinguished in the language used by the > application programmer. > I am interested in how the Object Design people intend to do schema evolution in their forthcoming system. Suppose I make a persistent subclass of a non-persistent class. I then go on to populate the persistent class. How does one add/drop attributes, superclasses, indices, etc. If a non-persistent superclass of my persistent class changes (i.e., an attribute/method is deleted/added, etc.) what happens to the instances in the database. Will all programs compiled against the schema have to be recompiled after such a change? I'm more interested in semantics and theory than in implementation details. > Jack Orenstein > Object Design, Inc. Nat Ballou Orion Project MCC
dlw@odi.com (Dan Weinreb) (08/29/89)
In article <459@cimshop.UUCP> davidm@cimshop.UUCP (David Masterson) writes:
Let me just ask then, is there anything inherit in your design that prevents
it from being put on top of a relational database system (you've expressed
requirements that prevent it [performance], but not design constraints)? This
question has some implications in what I am currently doing.
Yes, there is; our design could not plausibly be implemented on top of
a relational database system. (That is, if you did, you'd lost most
of its benefits.) The architecture just doesn't fit together that
way.
Dan Weinreb Object Design, Inc. dlw@odi.com
craig@gpu.utcs.utoronto.ca (Craig Hubley) (09/04/89)
(John Orenstein requested analogies for 'adding a column' in O-O terms, sorry I haven't the article on hand) I think this issue is important because it may help to define relational operations as a proper subset of object-oriented operations, and let OODBMS developers provide all standard relational capabilities in their systems, which I think is a worthwhile goal. I have also included some discussion of Linda, which is a DBMS in Krueger's sense, and well worth investigating as a unifying access metaphor. DATA-BASED ANALOGY If you accept the analogy of C++ classes to relational table definitions, which seems relatively sound to me, then you might also accept the analogy of 'adding a column' to mean adding a data item to each instance of the class. Of course, to be truly object-oriented, you are actually adding a set of *legal operations* on each object of that class, including the ability to alter and retrieve that data item. There may be other, more involved meanings, but that is a sort of default. Alternatively, you could think of inserting these 'more involved meanings', which is addressed below. Clearly, you may also be adding a set of more involved or advanced operations or virtual capabilities, but if the data item or 'column' didn't exist before then you aren't relying on it anywhere. Presumably the implementation of other operations (methods) on the class might change due to the prescence (or absence) of this new item, but that doesn't change the availability of these methods from outside, which in terms of the access algebra is the only issue. So if I add 'Country' to a relational table containing company addresses, then by default I would add methods like 'SetCountry()' and 'GetCountry()', but these are optional. If the definition of 'SendMail()' changes because of the new 'Country' field, that is not an issue at the interface. METHOD-BASED ANALOGY Another interpretation of 'adding a column' is 'adding a method', and this is probably a more supportable analogy. Adding a new method may or may not involve adding new data to all instances of the class. You could think of 'inserting' this method in terms of defining a new function in an incremental compiler for an object-oriented language. In fact, it could be implemented this way. LINDA Finally, if one accepts the definition of a 'database' as 'shared access to persistent objects', then everyone should have a look at Linda, the Yale language extensions developed for parallel processing. In fact, Linda defines a 'tuple space', like a simple table of variable-length records, that in fact provides 'shared access to persistent objects'. It is assumed in Linda that the persistence is of short duration, but this assumption is in no way built into the interface. It would be relatively straightforward to extend Linda to full relational DBMS capabilities, and ultimately to OODBMS status, in the bargain gaining Linda's unification of all IPC and RPC schemes! Linda's interface is quite simple, basically: in(tuple pattern) removes a tuple from the tuple space out(tuple) places a tuple in the tuple space rd(tuple pattern) reads a tuple from the space, without removing eval(active tuple) (one or more fields are actually a function call, when finished the tuple becomes passive and remains in the tuple space). ref: "Linda in Context", April 1989 Communications of the ACM. Linda's present conception of 'tuple pattern' is limited, and it inherits its datatypes from the language these primitives are added to (the one-language unification of DBMS and other programming constructs). However, the idea of tuples is very general, and were the system extended to work set-at-a-time, accepting tuple pattern definitions in relational algebra, and active tuple definitions as programs in relational algebra, Linda could just as easily extend SQL ! For an object-oriented system, each field in the tuple can be thought of as an object, and the operations as retrieving or emitting a set of objects, or a container object holding the set, which would probably be better. Linda is traditionally a preprocessor and implements its capabilities on top of existing shared-memory, semaphore, or message port schemes, but there are other approaches, including building 'tuple space' as a real DBMS. Contrary to it's abstract appearance, Linda has been added efficiently to C, C++, PostScript, and other languages - one scalable supercomputer, the Cogent XTM, even uses Linda to do *all* it's low-level IPC and RPC, even to the level of mouse moves. It is based on transputers and exploits their 1 microsecond context switches. Their version, Kernel Linda, defines a set of language-independent data types, though their primary interface is C++. CONCLUSION So it would seem that not only is it possible to define an access algebra that is very useful in traditional computing approaches, it seems possible to extend it to include database needs as well (that is, if we accept 'shared access to persistent objects' as a good working definition). Linda tuples can be effectively made very shared and very persistent, as evidenced by the Cogent implementation. Although 'tuples' seem like a relational concept, they are in fact only lists of fields, and these fields could contain object tags as easily as anything else. In fact, since the Linda algebra is presently defined to be one-at-a-time, not set-at-a-time, it would be easier to have the primitives work with container objects than with sets. It seems thus more suited for object-oriented approaches. The more so since developers would have two selling points: the DBMS and the unified and explicit metaphor for dealing with transient objects too! As the Cogent system shows, optimizing for this single shared-access metaphor can be made very efficient and totally scalable. This is the logical extension of the 'one-language' advantage that the Object Design people claim. Craig Hubley ------------------------------------- Craig Hubley & Associates "Lead, follow, or get out of the way" craig@gpu.utcs.utoronto.ca ------------------------------------- craig@gpu.utcs.toronto.edu mnetor!utgpu!craig@uunet.UU.NET {allegra,bnr-vpa,decvax,mnetor!utcsri}!utgpu!craig craig@utorgpu.bitnet -- Craig Hubley ------------------------------------- Craig Hubley & Associates "Lead, follow, or get out of the way" craig@gpu.utcs.utoronto.ca ------------------------------------- craig@gpu.utcs.toronto.edu mnetor!utgpu!craig@uunet.UU.NET {allegra,bnr-vpa,decvax,mnetor!utcsri}!utgpu!craig craig@utorgpu.bitnet
render@m.cs.uiuc.edu (09/05/89)
Written 5:23 pm Sep 3, 1989 by craig@gpu.utcs.utoronto.ca : >(John Orenstein requested analogies for 'adding a column' in O-O terms, >sorry I haven't the article on hand) > >I think this issue is important because it may help to define relational >operations as a proper subset of object-oriented operations, and let >OODBMS developers provide all standard relational capabilities in their >systems, which I think is a worthwhile goal. I have also included some >discussion of Linda, which is a DBMS in Krueger's sense, and well worth >investigating as a unifying access metaphor. There are other studies of the properties of changing OO schema and the effects on an active database: Qing Li and Dennis McLeod, "Object Flavor Evolution in an Object-Oriented Database System." In _Proceedings of the Conference on Office Information Systems_, edited by Robert B. Allen. March 1988, pp. 265--275. Jay Banerjee, Won Kim Hyoung-Joo Kim and Henry F. Korth, "Semantics and Implementation of Schema Evolution in Object-Oriented Databases", ACM SIGMOD Notices, 16:3 (December 1987), pp. 311-322. Gia-toan Nguyen and Dominique Rieu, "Schema Evolution in Object-Oriented Database Systems." Technical Report no. 947, Unite De Recherche, INRIA-Rocquencourt, December 1988. hal.
perez@csc.ti.com (edward perez) (09/07/89)
>In article ???, Dan Weinreb writes: >>In article <459@cimshop.UUCP> davidm@cimshop.UUCP (David Masterson) writes: >> >> Let me just ask then, is there anything inherit in your design that prevents >> prevents it from being put on top of a relational database system? >> >Yes, there is; our design could not plausibly be implemented on top of >a relational database system. (That is, if you did, you'd lost most >of its benefits.) The architecture just doesn't fit together that >way. so, withouth getting into your proprietary details, what is it about your design that prevented you from using an rdb ? is it because you would be splitting up objects into many relations, would you lack control of the underlying storage ? what benefits would you lose from being on top of an rdb ? has your company written any papers describing the architecture to show why you can't use an rdb ? will you have any info at oopsla '89 on this ?? "inquiring minds want to know." edward perez apranet: perez@csc.ti.com texas instruments csnet: perez%ti-csl@relay.cs.net dallas, tx.