pjs269@tijc02.UUCP (Paul Schmidt ) (03/14/88)
Distributed Database Management Systems and locking I am doing research into distributed databases for my Master's Thesis and I have a few comments and questions. First I feel there are two methods for locking distributed databases. One is to lock all resources before a transaction is started and the other is to lock resources as they are needed. Locking all resources first gaurantees deadlock prevention, but may lock resources that are not needed, and keep them locked for longer than they are needed. By locking resources when they are needed, one can avoid unnecessary locks but a deadlock detection scheme would have to be used. (Resource, in my view, is any peice of data; a relation, a tuple, or an attribute in a tuple.) Let me go over a few of the pros and cons of each method. Locking of all resources at the beginning of the transaction has the following advantages: ease of implementation and deadlock prevention. This will be easier to implement since a deadlock detection algorithm and transaction rollback scheme does not have to be implemented. Deadlock is prevented in this scheme. Locking of all resources at the beginning of each trasaction has the following disadvantages: possibility of wasted processing time at a node, all resources must be locked on transaction invocation, and unneeded resources may be locked. Wasted processing time can occur when a node has locked a resource but is not currently using it. Not all transactions can be clearly defined when they are invoked so forcing all resources to be locked at invocation will make some applications difficult, if not impossible, or may force the locking of unneeded resources. Locking of resources as needed has the following advantages: Resources are locked only when being used, and the total transaction does not need to be known at the start of the transaction. Since resources are only locked when they are being used, processing time will not be lost when another process is waiting on that resource. Locking of resources as needed has the following disadvantages: A deadlock detection and recovery scheme must be implemented, time may be wasted when a possible deadlock is recovered from, and it is much harder to implement. It appears that the first proposal is the simplest way of implementing a Distributed DBMS but it has some serious drawbacks. Specifically, it can hurt the performance of a distributed system by having unneeded resources locked. But the second proposal will have problems if deadlock is detected often and the recovery takes longer than the time we saved. Questions: Are there other methods? What are the distributed deadlock detection schemes? How can one tell when one method is better than the other? How do crash recovery and deadlock detection rollback relate? Are most of the recovery problems already solved by crash recovery? Any references and comments on this subject would help further my research greatly. Paul Schmidt - East Tennessee State University 502 West Pine Street Johnson City, TN 37604 EMAIL: mcnc!tijc02!pjs269
speegle@im4u.UUCP (Greg Speegle) (03/17/88)
In article <207@tijc02.UUCP> pjs269@tijc02.UUCP (Paul Schmidt ) writes: >Distributed Database Management Systems and locking > >I am doing research into distributed databases for my Master's Thesis >and I have a few comments and questions. > >First I feel there are two methods for locking distributed databases. Non-locking mechanisms also exist, such as distributed timestamps. >Locking of all resources at the beginning of the transaction has the >following advantages: ease of implementation and deadlock prevention. >... transaction rollback scheme does not have to be implemented. Transaction rollback must still be implemented in case of user aborts. One of the big advantages (or disadvantages of the lock as needed approach) is you don't have to check for non-existant deadlocks. Most deadlock detection schemes check whenever a deadlock MIGHT exist, thereby checking when one doesn't exist. >Questions: > >Are there other methods? See above. >What are the distributed deadlock detection schemes? There are so many distributed deadlock detection schemes out in the world that to even try to list them all would take forever. Check Sigmod and VLDB for the last 5 years (maybe more!) and you'll find plenty. Be careful, though, as incorrect algorithms are sometimes published. >How can one tell when one method is better than the other? This is an area of current research and depends on what you mean by better. Some are better in terms of messages sent, others are better in terms of time before a deadlock is detected. You'll have to establish your own criteria for what is better. Another idea would be to implement different algorithms and see which worked better. Does anyone know if this has been tried? I would think it has, but I don't know. >How do crash recovery and deadlock detection rollback relate? Are most of >the recovery problems already solved by crash recovery? A deadlock detection rollback is a system determined abort of a transaction. Any changes to the database must be "undone" by the system. However, some methods of crash recovery are based on the concept of "redo"-ing the transaction, while others are based on an "undo" scheme. So the answer is maybe, depending on the crash recovery algorithm used. >Any references and comments on this subject would help further my >research greatly. The best reference is the book by Ceri and Pelagatti(sp?) which is called "Distributed Databases" or something very similar. They cover all of the basics of concurrency, recovery and design of distributed databases. > Paul Schmidt - East Tennessee State University > 502 West Pine Street > Johnson City, TN 37604 > > EMAIL: mcnc!tijc02!pjs269 Greg Speegle -- {seismo,ihnp4,pyramid}!ut-sally!speegle speegle@sally.utexas.edu
john@geac.UUCP (John Henshaw) (03/18/88)
In article <207@tijc02.UUCP>, pjs269@tijc02.UUCP (Paul Schmidt) writes: > First I feel there are two methods for locking distributed databases. > [much text deleted] I'll confine most of my comments to LOCKING strategies. There are of course, several non-locking approaches, but I assume that you are focusing on locking only. Some comments further down reference other methods... The preemptive locking strategy (PRE) has the benefit that deadlock cannot occur. However, we rejected it as a viable method on "practical grounds". In particular, PRE requires that the programmer know *all* the resources that will be required *prior* to the initiation of the transaction. This is rarely the case. As far as I know, all non-PRE locking schemes require deadlock detection, which is a pain, but a necessary one. PRE will lock resources at the earliest point in time (transaction initiation) and hold them for the duration of the transaction. This means that the "growing phase" of the 2-phase locking activity happens very quickly, and the transaction can proceed without ever being blocked. This suggests that PRE may hold transaction locks for a SHORTER time than other schemes, not longer. I suspect that the real answer to this issue could be resolved via simulation and that the major elements affecting this would be the probability of conflict between transactions, and the level of concurrency (number of transactions executing at once). You write: > By > locking resources when they are needed, one can avoid unnecessary locks > but a deadlock detection scheme would have to be used. Could you please explain what is meant by an "unnecessary lock"? I imagine that you really meant to address the *timely* initiation of locks so that objects (resources) are locked for as short a period as possible. Otherwise, I will disagree: "If ya gotta lock it, ya gotta lock it...", hence all locks *are* necessary. You write: > Locking of all resources at the beginning of the transaction has the > following advantages: ease of implementation and deadlock prevention. > This will be easier to implement since a deadlock detection algorithm > and transaction rollback scheme does not have to be implemented. PRE, while deadlock free, does not free the implementor from transaction rollback. While it is clear that deadlock requires transaction rollback, there are other reasons why one may roll back a transaction. For example, a end-user may deliberately choose to ABORT a transaction, or the application software may decide, due to various conditions (data dependencies, data values, etc.) to ABORT. Either case requires rollback. You write: > Locking of resources as needed has the following advantages: Resources > are locked only when being used, and the total transaction does not need to > be known at the start of the transaction. Since resources are only locked > when they are being used, processing time will not be lost when another > process is waiting on that resource. This description is somewhat inexact. Resources, once locked, are locked until Commit time of the transaction. Having locked a resource/object, other processes will be blocked accessing that object. (This assumes that the object has been either written, or is assumed to be updatable at the time of lock initiation.) You point out a major advantage of standard 2 phase locking - that the programmer need not know which objects/resources will be locked at the Start Transaction juncture. It might be useful in your analysis to distinguish between the types of 2 phase locking that you have in mind. For example, when a lock is granted on an object which is read, do you assume that you are going to update it? or do you convert read locks to write locks at update time? A reasonably good paper involving this issue is: "The Performance of Concurrency Control Algorithms for Database Management Systems" by Michael Carey, and Michael Stonebreaker, 1984, VLDB Publications (Proceedings of the Tenth International Conference on Very Large Databases). It introduces a useful simulation framework that you may find helpful in your efforts. You ask: > Questions: > > Are there other methods? Sure: tree-locking, timestamp, optimistic, and multiversion are all written up. I suggest that you check out the library, looking at the last few years of: "Transactions on Database Systems". If you can get hold of "Distributed Systems: Volume 2", edited by Wesley Chu, ISBN 0-89006-213-7, chapter 2 talks about concurrency control, and other chapters address several case studies. Author names that may be useful are: Bernstein, Carey, Chu, Eager, Ellis, Goodman, Gray, Kung, Moss, Obermarck, Sevcik, Stonebreaker, Robertson, and Weiderhold, to name only a few! > What are the distributed deadlock detection schemes? The two I know about are "wait-for-strings", and "wait-for-graphs". Most systems that do deadlock detection, don't check for deadlock every time a lock is placed. Instead, a transaction, when blocked, times out, and then the check is made. In the distributed case, the lifetime of the average transaction may be quite long, and communication delays compound the difficulties that this approach already has. This approach of waiting is done in order to minimize the cost of checking graphs for cycles - usually a CPU-intensive process. > How can one tell when one method is better than the other? Good question. I think the answer to this question depends a lot on your criteria - something that differs for each situation. Reliability, robustness, efficiency, correctness, etc., etc., etc.... are all in the list somewhere. Judging DISTRIBUTED locking schemes is like judging CENTRALIZED locking schemes, except a few of the base rules/assumptions are changed. > How do crash recovery and deadlock detection rollback relate? Are most of > the recovery problems already solved by crash recovery? > Judging from your question, you are more interested in Distributed Transaction Management - not just "distributed locking". The "short" answer to this question is "yes", but the implementation details are where the real heart of the matter lies. Good luck with your work Paul. -john- -- John Henshaw, (mnetor, yunexus, utgpu !geac!john) Geac Computers Ltd. If we don't pay for education now, are we Markham, Ontario, Canada, eh? going to be able to pay for ignorance later?
lchirica@polyslo.UUCP (Laurian Chirica) (03/21/88)
In article <2465@geac.UUCP> john@geac.UUCP (John Henshaw) writes: > >In article <207@tijc02.UUCP>, pjs269@tijc02.UUCP (Paul Schmidt) writes: >> First I feel there are two methods for locking distributed databases. >> [much text deleted] > >I'll confine most of my comments to LOCKING strategies. There are of course, > [Much more good stuff deleted] >You write: >> By >> locking resources when they are needed, one can avoid unnecessary locks >> but a deadlock detection scheme would have to be used. > >Could you please explain what is meant by an "unnecessary lock"? I >imagine that you really meant to address the *timely* initiation of >locks so that objects (resources) are locked for as short a period as >possible. Otherwise, I will disagree: "If ya gotta lock it, ya gotta >lock it...", hence all locks *are* necessary. I am sure Paul can answer the question for himself but I volunteer the following explanation: What if the set of objects to be locked depends on some logical condition, say "if condition A then lock set X else lock set Y." Locking all resources at the beginning of a transaction require that we lock both sets X and Y. > [A lot more good stuff deleted] I fully agree with all the rest of your article, John. One more note, Jim Gray of Tandem (ex-IBM'er of IMS, System R fame and Transaction Management guru) wrote a paper sometime ago exposing the idea of deadlock prevention as a "straw man." If I remember correctly, he had analyzed some transaction logs from IBM installations which showed that the probability of deadlock was so small that it was not worth the expense of preventing deadlock. He, essentially said that we are better off (from a performance standpoint) to let it happen, and rollback one of the transactios involved in the deadlock. As you pointed out John, the rollback is needed anyway for recovery purposes. -- Laurian -- Laurian M. Chirica (lchirica@polyslo.UUCP) Computer Science Department California Polytechnic State University (CAL POLY) San Luis Obispo, CA 93407 - (805) 756-1332