[comp.databases] Degree 3 consistency

vassos@utcsri.UUCP (06/27/87)

In response to my question as to the problems of "degree 3"
consistency in connection with on-line trasnsactions, Gary
Puckering gave some interesting scenarios illustrating these
problems.

It seems to me that the main point of his examples is that the
trouble is caused by transactions holding exclusive locks on
records and/or indices for too long, thereby forcing other
transactions to wait too much. One of the remedies he suggests
is lowering the degree of consistency.

I disagree with this remedy for the following reasons: The only
correct way of interleaving transactions in a general purpose
system is by enforcing serialisability. (Let me state the sense
in which I use the word, as it differs a bit from the definition
Gary gave: An execution of transactions is serialisable if it is
computationally equivalent to a serial execution of the same
transactions. Note that the equivalence in question refers, not
just to updates of transactions, but also to what they read.
So my use of "serialisability" takes care of phantoms -- see Gary's
article for what these are.) Kung and Papdimitriou have proved a
theorem in effect stating that in the absence of detailed knowledge
about the integrity constraints on the database and of the semantics
of transactions, anything less than serialisability violates
consistency. Thus, "degree 2 consistency" is really a degree of
inconsistency! If we accept that the integrity of data is valuable
(which, as database people, we do!), we can't afford to let silly
things like bad interleavings of transactions mess up the integrity
of our data.  It's bad enough having to cope with inconsistencies
due to data entry errors and the like! This is the main reason why
I think system designers who make serialisability the default in
concurrency control have actually made the right choice.

In connection to the issue of performance, the following point
(made by Jim Gray in his paper "A transaction model") is relevant:

	"In the experiments we have done, degree 3 consistency has
	a cost (throughput and processor overhead) indistinguishable
	from the lower degrees of consistency." 

Even though I know nothing about the experiments mentioned, I put
much weight in this statement, coming as it does from the mouth
(pen? keyboard?) of the person most responsible for the formulation
of the theory of "degrees of consistency".

Of course, the performance problem Gary has pointed out is still
there. One possible avenue that could lead to a reasonable solution
without compromising serialisability is through the use of mutliple
versions. That is, each updater of a record does not destroy the old
version of that record but creates a new one. There are various
concurrency control algorithms that ensure serialisability while
reducing the amount of waiting, by exploiting the existence of such
multiple versions. Of course such an approach has its own costs
(storage for the multiple versions, and complexity in the concurrency
control algorithm) but it seems reasonable that one would have to pay
a price for higher performance. In my opinion, however, that price
should not be the price of inconcistent data. Apparently, Prime has
a database system that employs a multiversion technique. It would be
interesting to know if there are other commercial systems that do,
and whether this leads to appreciable performance improvements for
situations of the sort described by Gary.
-- 
Vassos Hadzilacos
vassos@csri.toronto.edu

garyp@cognos.uucp (Gary Puckering) (07/31/87)

In a recent article in comp.database, utcsri!vassos (Vassos 
Hadzilacos) writes:

| In response to my question as to the problems of "degree 3"
| consistency in connection with on-line trasnsactions, Gary
| Puckering gave some interesting scenarios illustrating these
| problems.
| 
| It seems to me that the main point of his examples is that the
| trouble is caused by transactions holding exclusive locks on
| records and/or indices for too long, thereby forcing other
| transactions to wait too much. One of the remedies he suggests
| is lowering the degree of consistency.
| 
| I disagree with this remedy for the following reasons: The only
| correct way of interleaving transactions in a general purpose
| system is by enforcing serialisability. 

This may come as a great surprise to thousands of designers who have
built applications using more "primitive" database systems like IMS,
TOTAL, ADABAS, etc.

I can't help but wonder: if degree 3 consistency and protection
against phantoms is so wonderful, how did we ever got along without it?

| I think system designers who make serialisability the default in
| concurrency control have actually made the right choice.

I think I can agree with this, the operative word being "default".
Unfortunately, many relational database systems don't give you a
choice, and that is what I object to.  In a 4GL you can enforce
consistency within an application *because* you have more knowledge of
the semantics of transactions than does the underlying database.  
So, I want the option of using degree 2.

| In connection to the issue of performance, the following point
| (made by Jim Gray in his paper "A transaction model") is relevant:
| 
| 	"In the experiments we have done, degree 3 consistency has
| 	a cost (throughput and processor overhead) indistinguishable
| 	from the lower degrees of consistency." 
| 
| Even though I know nothing about the experiments mentioned, I put
| much weight in this statement, coming as it does from the mouth
| (pen? keyboard?) of the person most responsible for the formulation
| of the theory of "degrees of consistency".

Jim Gray's reputation speaks for itself.  However, his opinions of the
validity of degree 3 consistency can hardly be considered impartial.
Theory and practice are not the same.  Show me an application running
on a commericial dbms with enforced serializability which has higher
transaction throughput than the same application using a more
conventional dbms.  My experience, and the weight of evidence that I
have seen, suggests that enforced serializability is a major cause of
low concurrency in on-line transaction processing systems.

Maybe the problem has been that most implementations of
serializability are bad.  I can accept that the theory is right and
the implementations are wrong.  But I can't live with it.  I need
higher throughput, not lower.

| Of course, the performance problem Gary has pointed out is still
| there. One possible avenue that could lead to a reasonable solution
| without compromising serialisability is through the use of mutliple
| versions. 

I agree with you on this 100%.  In fact, I've tested the performance
of two commercial database systems available on VAX/VMS, one which
used multi-versioning and one which didn't.  This was the major
architectural difference between the two systems.  In high concurrency
situations, the single-version system generated numerous unnatural
deadlocks which lead to transaction failures.  The multi-version
system had no unnatural deadlocks.  Consequently, its throughput rate
was almost double that of the single-version system.  (This was
despite the fact that on straight i/o operations, the single-version
system was somewhat faster.)

This may be an example of a good implementation of enforced
serializability.  At the recent SIGMOD conference, a representative from
Oracle (R. Bamford, I believe) said that the new Oracle*XTP would
employ multi-versioning techniques to achieve high throughput in
on-line transaction processing applications.  So, this seems to be the
latest implementation technique for serializability.

The jury is still out, though, on whether these systems can ever
outperform "classical" database systems.  (I hope they can, I think
they can, but I don't *know* they can!)
-- 

Gary Puckering        3755 Riverside Dr.
Cognos Incorporated   Ottawa, Ontario    {allegra,decvax,ihnp4,linus,pyramid}
(613) 738-1440        CANADA  K1G 3N3    !utzoo!dciem!nrcaer!cognos!garyp

larry@xanadu.uucp (Larry Rowe) (08/06/87)

my understanding of the current state of rti/oracle xact mgt
is the following:

1. oracle offers in-memory buffer versioning of pages. i
think they only offer degree 3 (is this right?).

2. rti offers degree 1, 2, and 3 consistency with degree 3
being the default.

with respect to jim gray's comments about the overhead, i believe
he was refering to anticdotal (spell?) evidence he gathered from
observing the use of IMS by large xact applications.  in essence,
most people used degree 3 because the performance penalty was 
worth the overhead.  one thing to remember is that IMS applications
are programmer developed and highly controlled applications.  in
other words, the application designers severely limit the kinds of 
queries that can be executed.  the availability of SQL and forms-based
end user interefaces makes it possible to ask questions that touch 
data in very different patterns than a typical IMS application would
allow.
	larry

UH2@PSUVM.BITNET (Lee Sailer) (08/10/87)

In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says:
>
>The jury is still out, though, on whether these systems can ever
>outperform "classical" database systems.  (I hope they can, I think
>they can, but I don't *know* they can!)
 --
 Knowing that they can will be hard to establish.  The ideal experiment
would be to give about 50 different teams of professional programmers
a big project to do, and let them use several differnt approaches,
assigned randomly, and include a year for training for each team,
and then see which teams (a) finish first, (b) finish best, (c) write
fastest code, etc etc etc.
     
Ah!  This is a 50 million dollar study, and not even the Pentagon will
fund it.
     
The earliest place where this problem was discussed, that I know of, is
Gerald Weinberg's The Psychology of Computer Programming.  He did lots
of little studies of students programming, and bemoaned the fact that
we'll never be able to study *real programmers*.
     
Oh well.

garyp@cognos.uucp (Gary Puckering) (08/21/87)

In article <18182UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes:
>In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says:
>>
>>The jury is still out, though, on whether these systems can ever
>>outperform "classical" database systems.  (I hope they can, I think
>>they can, but I don't *know* they can!)
> --
> Knowing that they can will be hard to establish.  The ideal experiment
>would be to give about 50 different teams of professional programmers
>a big project to do, and let them use several differnt approaches,
>assigned randomly, and include a year for training for each team,
>and then see which teams (a) finish first, (b) finish best, (c) write
>fastest code, etc etc etc.
>     
>Ah!  This is a 50 million dollar study, and not even the Pentagon will
>fund it.

Let me have one more kick at the can (and then maybe I'll shut up for
awhile on this topic).

Dr. Codd, in his landmark paper "A Relational Model of Data for Large
Shared Data Banks" proposed many interesting ideas, not the least of
which was the notion of *consistency*.  This idea was further
developed by Jim Gray and others.  Among the many ways in which a
relational system is different from its network and hierarchical
predecessors, the inclusion of a formal transaction management scheme
is perhaps the most profound.

The implementation of degree-3 consistency involves the assumption
that the database management system does not have any knowledge of the
semantics of the application (i.e. what it is doing in between
database operations) and therefore it must "protect" the concurrent
transactions from each other.  In this sense, it assumes the worst-case.
It assumes, in fact, that the application designer may not have taken
concurrent transactions into account during design.  This is clearly a
very conservative position.

I could argue that this is too conservative.  I think degree-3 is a
sensible default, but I also think an application designer should have
the option of choosing degree-2.  

The obvious counter-argument is that humans are too fallible and we
shouldn't give the designer a loaded gun.  I figure an application
designer can always subvert a relational system anyway, by doing
things in two transactions that ought to be done in one.  Nonetheless,
I might concede that if humans were the only things that designed
systems, then enforced degree-3 might be best.  But humans aren't.
Computers design systems too.  In particular, 4GL's design systems.

So what I will argue is that makers of general-purpose database systems
should provide degree-2 consistency, as an option at the call-level
interface, because 4GL systems need it.  My reasons are as follows:

  1.  A 4GL is a "computerized" application designer.  It can be
      counted on to obediently generate applications according to the
      set of rules given it by its creators.  If the creator makes a
      mistake in the rules, that mistake will be made over and over
      again.  It is almost certain to be spotted.

  2.  These rules provide a framework for consistency.  That is, the 
      4GL can be counted on to generate an application in such a way
      that inconsistent transactions are not allowed.

  3.  Because of this, it is reasonable to allow the 4GL to have access
      to lower levels of isolation -- because a 4GL *does* understand the
      semantics of the class of applications it can generate.

Our 4GL, PowerHouse, supports access to several kinds of indexed file
systems, a two-level network database system, a network-hierarchical
style DBMS, and two different relational systems.  Supporting the
relational systems caused us all kinds of problems, primarily due to
enforced degree-3 consistency.  Allowing us to use a lower isolation
level would have eliminated these problems without jeopardizing the
integrity of the data (we ensure that) and produced a system capable
of supporting more concurrent users.

Have any of you "purists" out there really built an on-line
transaction processing application using degree-3 consistency?  How
many concurrent users did it support without running into long wait
times and unnatural deadlocks?

You know, I think relational databases have gotten a bad name for
themselves, performance-wise, not because they are inherently slower
than the "classical" systems (in fact, storage structures and access
techniques have improved a lot over the last few years).  I think they
got a bad name because no-one could get the same concurrent
performance levels they used to get.

And I think it's because of degree-3 consistency!
-- 

Gary Puckering        3755 Riverside Dr.
Cognos Incorporated   Ottawa, Ontario    {allegra,decvax,ihnp4,linus,pyramid}
(613) 738-1440        CANADA  K1G 3N3    !utzoo!dciem!nrcaer!cognos!garyp

larry@xanadu.uucp (Larry Rowe) (08/25/87)

In article <1301@smokey.UUCP> garyp@cognos.UUCP (Gary Puckering) writes:
>In article <18182UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes:
>>In article <1215@smokey.UUCP>, garyp@cognos.uucp (Gary Puckering) says:
>>>
>>>The jury is still out, though, on whether these systems can ever
>>>outperform "classical" database systems.  (I hope they can, I think
>>>they can, but I don't *know* they can!)
>> --
>Dr. Codd, in his landmark paper "A Relational Model of Data for Large
>Shared Data Banks" proposed many interesting ideas, not the least of
>which was the notion of *consistency*.  This idea was further
>developed by Jim Gray and others.  Among the many ways in which a
>relational system is different from its network and hierarchical
>predecessors, the inclusion of a formal transaction management scheme
>is perhaps the most profound.
>
Jim Gray will tell you that when he and his collegues at ibm san jose
developed the theory of transactions, they were just formalized what
had already been implemented in IMS.  in fact, jim spent considerable
time studying IMS on-line transaction systems -- particularly the
occurrence of deadlocks.  

So, the ``theory'' that relational systems use was, in fact, implemented
in IMS.  moreover, DB2 uses the same transaction manager as IMS.  consequently,
the xact functionality issue is really an issue of what an individual
system implementer chooses to implement.  some relational systems support
something other than degree 3 consistency, others don't.  

another common problem with relational dbms applications that wasn't a
problem with IMS/DBTG applications is that they can do long queries
(e.g., find all employees that make more than their manager) which will
probably lock the entire employee table (under degree 3 consistency)
unless the implementers do something very sophisticated (e.g., scan locks
where a record/page lock is held only until the record/page has been
read).  problem is that this takes several lines of code to implement
(don't forget the heuristics to look at the query and choose the appropriate
locking).  IMS/DBTG systems didn't have this problem because you couldn't
issue a long query!
	larry

garyp@cognos.uucp (Gary Puckering) (09/01/87)

In article <3391@zen.berkeley.edu> larry@xanadu.UUCP (Larry Rowe) writes:
>So, the ``theory'' that relational systems use was, in fact, implemented
>in IMS.  moreover, DB2 uses the same transaction manager as IMS.  consequently,
>the xact functionality issue is really an issue of what an individual
>system implementer chooses to implement.  some relational systems support
>something other than degree 3 consistency, others don't.  

Does this mean that degree-3 consistency can be obtained in IMS?  If
so, what method is used to eliminate "phantoms".  Most relational
systems use index or table locking for this.  What does IMS do?
-- 
Gary Puckering                                   P.O. Box 9707
Cognos Incorporated                              3755 Riverside Dr.
VOICE:  (613) 738-1440   FAX: (613) 738-0002     Ottawa, Ontario
UUCP: decvax!utzoo!dciem!nrcaer!cognos!garyp     CANADA  K1G 3Z4

larry@postgres.uucp (Larry Rowe) (09/04/87)

In article <1381@smokey.UUCP> garyp@cognos.UUCP (Gary Puckering) writes:
>Does this mean that degree-3 consistency can be obtained in IMS?  If
>so, what method is used to eliminate "phantoms".  Most relational
>systems use index or table locking for this.  What does IMS do?

i'm not sure since i've never used ims.  i have some contacts at
ibm that know more about this, i'll ask them and communicate the
answer.  if i had to hazard a guess, i'd say that ims doesn't handle
phantoms.
	larry