vassos@utcsri.UUCP (06/27/87)
In response to my question as to the problems of "degree 3" consistency in connection with on-line trasnsactions, Gary Puckering gave some interesting scenarios illustrating these problems. It seems to me that the main point of his examples is that the trouble is caused by transactions holding exclusive locks on records and/or indices for too long, thereby forcing other transactions to wait too much. One of the remedies he suggests is lowering the degree of consistency. I disagree with this remedy for the following reasons: The only correct way of interleaving transactions in a general purpose system is by enforcing serialisability. (Let me state the sense in which I use the word, as it differs a bit from the definition Gary gave: An execution of transactions is serialisable if it is computationally equivalent to a serial execution of the same transactions. Note that the equivalence in question refers, not just to updates of transactions, but also to what they read. So my use of "serialisability" takes care of phantoms -- see Gary's article for what these are.) Kung and Papdimitriou have proved a theorem in effect stating that in the absence of detailed knowledge about the integrity constraints on the database and of the semantics of transactions, anything less than serialisability violates consistency. Thus, "degree 2 consistency" is really a degree of inconsistency! If we accept that the integrity of data is valuable (which, as database people, we do!), we can't afford to let silly things like bad interleavings of transactions mess up the integrity of our data. It's bad enough having to cope with inconsistencies due to data entry errors and the like! This is the main reason why I think system designers who make serialisability the default in concurrency control have actually made the right choice. In connection to the issue of performance, the following point (made by Jim Gray in his paper "A transaction model") is relevant: "In the experiments we have done, degree 3 consistency has a cost (throughput and processor overhead) indistinguishable from the lower degrees of consistency." Even though I know nothing about the experiments mentioned, I put much weight in this statement, coming as it does from the mouth (pen? keyboard?) of the person most responsible for the formulation of the theory of "degrees of consistency". Of course, the performance problem Gary has pointed out is still there. One possible avenue that could lead to a reasonable solution without compromising serialisability is through the use of mutliple versions. That is, each updater of a record does not destroy the old version of that record but creates a new one. There are various concurrency control algorithms that ensure serialisability while reducing the amount of waiting, by exploiting the existence of such multiple versions. Of course such an approach has its own costs (storage for the multiple versions, and complexity in the concurrency control algorithm) but it seems reasonable that one would have to pay a price for higher performance. In my opinion, however, that price should not be the price of inconcistent data. Apparently, Prime has a database system that employs a multiversion technique. It would be interesting to know if there are other commercial systems that do, and whether this leads to appreciable performance improvements for situations of the sort described by Gary. -- Vassos Hadzilacos vassos@csri.toronto.edu
garyp@cognos.uucp (Gary Puckering) (07/31/87)
In a recent article in comp.database, utcsri!vassos (Vassos Hadzilacos) writes: | In response to my question as to the problems of "degree 3" | consistency in connection with on-line trasnsactions, Gary | Puckering gave some interesting scenarios illustrating these | problems. | | It seems to me that the main point of his examples is that the | trouble is caused by transactions holding exclusive locks on | records and/or indices for too long, thereby forcing other | transactions to wait too much. One of the remedies he suggests | is lowering the degree of consistency. | | I disagree with this remedy for the following reasons: The only | correct way of interleaving transactions in a general purpose | system is by enforcing serialisability. This may come as a great surprise to thousands of designers who have built applications using more "primitive" database systems like IMS, TOTAL, ADABAS, etc. I can't help but wonder: if degree 3 consistency and protection against phantoms is so wonderful, how did we ever got along without it? | I think system designers who make serialisability the default in | concurrency control have actually made the right choice. I think I can agree with this, the operative word being "default". Unfortunately, many relational database systems don't give you a choice, and that is what I object to. In a 4GL you can enforce consistency within an application *because* you have more knowledge of the semantics of transactions than does the underlying database. So, I want the option of using degree 2. | In connection to the issue of performance, the following point | (made by Jim Gray in his paper "A transaction model") is relevant: | | "In the experiments we have done, degree 3 consistency has | a cost (throughput and processor overhead) indistinguishable | from the lower degrees of consistency." | | Even though I know nothing about the experiments mentioned, I put | much weight in this statement, coming as it does from the mouth | (pen? keyboard?) of the person most responsible for the formulation | of the theory of "degrees of consistency". Jim Gray's reputation speaks for itself. However, his opinions of the validity of degree 3 consistency can hardly be considered impartial. Theory and practice are not the same. Show me an application running on a commericial dbms with enforced serializability which has higher transaction throughput than the same application using a more conventional dbms. My experience, and the weight of evidence that I have seen, suggests that enforced serializability is a major cause of low concurrency in on-line transaction processing systems. Maybe the problem has been that most implementations of serializability are bad. I can accept that the theory is right and the implementations are wrong. But I can't live with it. I need higher throughput, not lower. | Of course, the performance problem Gary has pointed out is still | there. One possible avenue that could lead to a reasonable solution | without compromising serialisability is through the use of mutliple | versions. I agree with you on this 100%. In fact, I've tested the performance of two commercial database systems available on VAX/VMS, one which used multi-versioning and one which didn't. This was the major architectural difference between the two systems. In high concurrency situations, the single-version system generated numerous unnatural deadlocks which lead to transaction failures. The multi-version system had no unnatural deadlocks. Consequently, its throughput rate was almost double that of the single-version system. (This was despite the fact that on straight i/o operations, the single-version system was somewhat faster.) This may be an example of a good implementation of enforced serializability. At the recent SIGMOD conference, a representative from Oracle (R. Bamford, I believe) said that the new Oracle*XTP would employ multi-versioning techniques to achieve high throughput in on-line transaction processing applications. So, this seems to be the latest implementation technique for serializability. The jury is still out, though, on whether these systems can ever outperform "classical" database systems. (I hope they can, I think they can, but I don't *know* they can!) -- Gary Puckering 3755 Riverside Dr. Cognos Incorporated Ottawa, Ontario {allegra,decvax,ihnp4,linus,pyramid} (613) 738-1440 CANADA K1G 3N3 !utzoo!dciem!nrcaer!cognos!garyp
larry@xanadu.uucp (Larry Rowe) (08/06/87)
my understanding of the current state of rti/oracle xact mgt is the following: 1. oracle offers in-memory buffer versioning of pages. i think they only offer degree 3 (is this right?). 2. rti offers degree 1, 2, and 3 consistency with degree 3 being the default. with respect to jim gray's comments about the overhead, i believe he was refering to anticdotal (spell?) evidence he gathered from observing the use of IMS by large xact applications. in essence, most people used degree 3 because the performance penalty was worth the overhead. one thing to remember is that IMS applications are programmer developed and highly controlled applications. in other words, the application designers severely limit the kinds of queries that can be executed. the availability of SQL and forms-based end user interefaces makes it possible to ask questions that touch data in very different patterns than a typical IMS application would allow. larry
UH2@PSUVM.BITNET (Lee Sailer) (08/10/87)
In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says: > >The jury is still out, though, on whether these systems can ever >outperform "classical" database systems. (I hope they can, I think >they can, but I don't *know* they can!) -- Knowing that they can will be hard to establish. The ideal experiment would be to give about 50 different teams of professional programmers a big project to do, and let them use several differnt approaches, assigned randomly, and include a year for training for each team, and then see which teams (a) finish first, (b) finish best, (c) write fastest code, etc etc etc. Ah! This is a 50 million dollar study, and not even the Pentagon will fund it. The earliest place where this problem was discussed, that I know of, is Gerald Weinberg's The Psychology of Computer Programming. He did lots of little studies of students programming, and bemoaned the fact that we'll never be able to study *real programmers*. Oh well.
garyp@cognos.uucp (Gary Puckering) (08/21/87)
In article <18182UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes: >In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says: >> >>The jury is still out, though, on whether these systems can ever >>outperform "classical" database systems. (I hope they can, I think >>they can, but I don't *know* they can!) > -- > Knowing that they can will be hard to establish. The ideal experiment >would be to give about 50 different teams of professional programmers >a big project to do, and let them use several differnt approaches, >assigned randomly, and include a year for training for each team, >and then see which teams (a) finish first, (b) finish best, (c) write >fastest code, etc etc etc. > >Ah! This is a 50 million dollar study, and not even the Pentagon will >fund it. Let me have one more kick at the can (and then maybe I'll shut up for awhile on this topic). Dr. Codd, in his landmark paper "A Relational Model of Data for Large Shared Data Banks" proposed many interesting ideas, not the least of which was the notion of *consistency*. This idea was further developed by Jim Gray and others. Among the many ways in which a relational system is different from its network and hierarchical predecessors, the inclusion of a formal transaction management scheme is perhaps the most profound. The implementation of degree-3 consistency involves the assumption that the database management system does not have any knowledge of the semantics of the application (i.e. what it is doing in between database operations) and therefore it must "protect" the concurrent transactions from each other. In this sense, it assumes the worst-case. It assumes, in fact, that the application designer may not have taken concurrent transactions into account during design. This is clearly a very conservative position. I could argue that this is too conservative. I think degree-3 is a sensible default, but I also think an application designer should have the option of choosing degree-2. The obvious counter-argument is that humans are too fallible and we shouldn't give the designer a loaded gun. I figure an application designer can always subvert a relational system anyway, by doing things in two transactions that ought to be done in one. Nonetheless, I might concede that if humans were the only things that designed systems, then enforced degree-3 might be best. But humans aren't. Computers design systems too. In particular, 4GL's design systems. So what I will argue is that makers of general-purpose database systems should provide degree-2 consistency, as an option at the call-level interface, because 4GL systems need it. My reasons are as follows: 1. A 4GL is a "computerized" application designer. It can be counted on to obediently generate applications according to the set of rules given it by its creators. If the creator makes a mistake in the rules, that mistake will be made over and over again. It is almost certain to be spotted. 2. These rules provide a framework for consistency. That is, the 4GL can be counted on to generate an application in such a way that inconsistent transactions are not allowed. 3. Because of this, it is reasonable to allow the 4GL to have access to lower levels of isolation -- because a 4GL *does* understand the semantics of the class of applications it can generate. Our 4GL, PowerHouse, supports access to several kinds of indexed file systems, a two-level network database system, a network-hierarchical style DBMS, and two different relational systems. Supporting the relational systems caused us all kinds of problems, primarily due to enforced degree-3 consistency. Allowing us to use a lower isolation level would have eliminated these problems without jeopardizing the integrity of the data (we ensure that) and produced a system capable of supporting more concurrent users. Have any of you "purists" out there really built an on-line transaction processing application using degree-3 consistency? How many concurrent users did it support without running into long wait times and unnatural deadlocks? You know, I think relational databases have gotten a bad name for themselves, performance-wise, not because they are inherently slower than the "classical" systems (in fact, storage structures and access techniques have improved a lot over the last few years). I think they got a bad name because no-one could get the same concurrent performance levels they used to get. And I think it's because of degree-3 consistency! -- Gary Puckering 3755 Riverside Dr. Cognos Incorporated Ottawa, Ontario {allegra,decvax,ihnp4,linus,pyramid} (613) 738-1440 CANADA K1G 3N3 !utzoo!dciem!nrcaer!cognos!garyp
larry@xanadu.uucp (Larry Rowe) (08/25/87)
In article <1301@smokey.UUCP> garyp@cognos.UUCP (Gary Puckering) writes: >In article <18182UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes: >>In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says: >>> >>>The jury is still out, though, on whether these systems can ever >>>outperform "classical" database systems. (I hope they can, I think >>>they can, but I don't *know* they can!) >> -- >Dr. Codd, in his landmark paper "A Relational Model of Data for Large >Shared Data Banks" proposed many interesting ideas, not the least of >which was the notion of *consistency*. This idea was further >developed by Jim Gray and others. Among the many ways in which a >relational system is different from its network and hierarchical >predecessors, the inclusion of a formal transaction management scheme >is perhaps the most profound. > Jim Gray will tell you that when he and his collegues at ibm san jose developed the theory of transactions, they were just formalized what had already been implemented in IMS. in fact, jim spent considerable time studying IMS on-line transaction systems -- particularly the occurrence of deadlocks. So, the ``theory'' that relational systems use was, in fact, implemented in IMS. moreover, DB2 uses the same transaction manager as IMS. consequently, the xact functionality issue is really an issue of what an individual system implementer chooses to implement. some relational systems support something other than degree 3 consistency, others don't. another common problem with relational dbms applications that wasn't a problem with IMS/DBTG applications is that they can do long queries (e.g., find all employees that make more than their manager) which will probably lock the entire employee table (under degree 3 consistency) unless the implementers do something very sophisticated (e.g., scan locks where a record/page lock is held only until the record/page has been read). problem is that this takes several lines of code to implement (don't forget the heuristics to look at the query and choose the appropriate locking). IMS/DBTG systems didn't have this problem because you couldn't issue a long query! larry
garyp@cognos.uucp (Gary Puckering) (09/01/87)
In article <3391@zen.berkeley.edu> larry@xanadu.UUCP (Larry Rowe) writes: >So, the ``theory'' that relational systems use was, in fact, implemented >in IMS. moreover, DB2 uses the same transaction manager as IMS. consequently, >the xact functionality issue is really an issue of what an individual >system implementer chooses to implement. some relational systems support >something other than degree 3 consistency, others don't. Does this mean that degree-3 consistency can be obtained in IMS? If so, what method is used to eliminate "phantoms". Most relational systems use index or table locking for this. What does IMS do? -- Gary Puckering P.O. Box 9707 Cognos Incorporated 3755 Riverside Dr. VOICE: (613) 738-1440 FAX: (613) 738-0002 Ottawa, Ontario UUCP: decvax!utzoo!dciem!nrcaer!cognos!garyp CANADA K1G 3Z4
larry@postgres.uucp (Larry Rowe) (09/04/87)
In article <1381@smokey.UUCP> garyp@cognos.UUCP (Gary Puckering) writes: >Does this mean that degree-3 consistency can be obtained in IMS? If >so, what method is used to eliminate "phantoms". Most relational >systems use index or table locking for this. What does IMS do? i'm not sure since i've never used ims. i have some contacts at ibm that know more about this, i'll ask them and communicate the answer. if i had to hazard a guess, i'd say that ims doesn't handle phantoms. larry