mo@seismo.UUCP (Mike O'Dell) (03/13/85)
This locking business is another Undead Topic, but here goes my part. (1) General Petri nets have been shown to be insufficient to represent all syncronizing problems, and a more powerful formalism has not been put forward to my knowledge. This means WE CAN'T REALLY SOLVE THE PROBLEM, so we are reduce to Engineering Solutions which solve some useful subset. (2) One of the most difficult issues is what people expect locks to mean. In particular, when a process dies some horrid death while holding a lock, (segmentation violation, inadvertent gunning-down by an aberant operator), what should happen? If the lock means a simple append was being done, they breaking the lock is probably OK, if not desirable. But if the lock means "the database b-tree is inconsistant", then breaking the lock is the last thing you want to do. Moreover, in the latter case, such information should survive across system crashes if it to be of any use whatsoever. So, now we have quickly gotten from "simple file locking" to a stable, surviving lock which must be done with multi-phase commits, not to mention what happens across networks. So what does this mean?? It mean that concurrency control is an END-TO-END issue just like communications error control, and you CANNOT put a single mechanism into a system which will do the Right Thing all the time. People who want to update a simple file can use advisory file locks just fine without needing the hair of byte-span locks, although the byte-span lock might provide more concurrency - since we have an implementation, it is probably OK to use it. But people MUST understand what is NOT promised by locks!! This is all the more reason to make them discressionary. A non-discressionary lock is honored by all processes without their agreement, while discressionary locks are honored only by "consenting processes". This is VERY important where denial of service is considered a serious security issue. Locking /etc/password, /bin, your enemy's home directory or some other well-chosen target would bring a system to its knees in an instant, or reduce it to a battlefield. This brings up the next point. Protection is tricky in a locking environment. Who has the authority to prevent someone from reading or writing a file they would otherwise have permission operate upon?? To handle this anywhere close to right in the presence of non-discressionary locks requires adding several if not many more protection bits and thoroughly complicating the model. I claim this complexity buys nothing useful and merely makes things more complicated, particularly in distributed systems. Well, enough of this for now. Locking is like the Camel sticking its nose through the tent flap - before you know it, the Camel is in the tent with you. The trip from "simple file locking" to "stable storage" is beguilingly straight-forward to imagine, and damnably difficult to realize, not to mention the question of whether you even want that. -Mike O'Dell
sdo@u1100a.UUCP (Scott Orshan) (03/14/85)
In article <1685@seismo.UUCP> mo@seismo.UUCP (Mike O'Dell) writes: > >(2) One of the most difficult issues is what people expect locks to mean. >In particular, when a process dies some horrid death while holding a lock, >(segmentation violation, inadvertent gunning-down by an aberant operator), >what should happen? If the lock means a simple append was being done, >they breaking the lock is probably OK, if not desirable. But if the >lock means "the database b-tree is inconsistant", then breaking the lock is >the last thing you want to do. There are ways to manipulate B-trees (or slight modifications) such that they are NEVER inconsistent. This assumes that disk writes are atomic - either nothing gets written or an entire block gets written. If this is not true, then you need to have two copies of the tree which get updated in a well defined sequence such that one of them is always consistent even if a write fails in the middle. We once had a database implemented this way where a shadow driver provided the duplication of I/O without the database code's awareness. Obviously, all writes must be immediate - not delayed in the buffer cache. -- Scott Orshan Bell Communications Research 201-981-3064 {ihnp4,allegra,bellcore,pyuxww}!u1100a!sdo
mason@utcsri.UUCP (Dave Mason) (03/15/85)
Mike O'Dell claims out that non-discretionary locking could bring a system to it's knees. I suggest that the examples he gives are far from proof. The reason I disagree, is that I see no reason why any operating system would allow someone to lock a file that they cannot write to. Therefore the only user who could lock say /etc/passwd is root (and we trust her, right? ;-). In general the only people who can cause havoc with file A by locking it can presumably cause much more problems by modifying it. -- Usenet: {dalcs dciem garfield musocs qucis sask titan trigraph ubc-vision utzoo watmath allegra cornell decvax decwrl ihnp4 uw-beaver} !utcsri!mason Dave Mason, U. Toronto CSRI CSNET: mason@Toronto ARPA: mason%Toronto@CSNet-Relay
guy@rlgvax.UUCP (Guy Harris) (03/16/85)
> The reason I disagree, is that I see no reason why any operating system would > allow someone to lock a file that they cannot write to. What if program A wants to read, say, "/etc/passwd", and wants to make sure nobody's writing on top of it while it reads? Of course, another solution to this is to provide transaction atomicity on the operation "replace contents of file". In effect, you can create a "replacement" for the file which isn't visible until you've written the last block of the new copy and declared the transaction closed. One way of doing this is to create a temporary file and rename it; this technique is used in some UNIX software, including the S3/S5 "ed". However, it won't preserve links, doesn't work if you don't have write permission in the containing directory (these two problems are handled by "ed" not using this technique if the file has links or is in an unwritable directory), and doesn't work if you can't give files away. Furthermore, if the system crashes before the rename, the temporary file is not cleaned up. Another way is to support file "version numbers", Tenex/RSX/VMS style. In this case, you don't do a rename which deletes the old version; you just create a version with a higher version number. When the new version is being written out, it exists as the highest numbered version. You probably want to have this file specially flagged or locked, so that nobody can open it until you're done. You may want to return an error on attempts to read it, or may want to block the process until the new version is ready (which means you have locked the file), or you may want to have the previous version opened. A third way is to have the new file not have any directory entry pointing to it until it's fully written out and you declare it "ready", at which point its inode/file header/whatever replaces the previous version's, and the previous version's blocks get returned to the free pool (unless somebody has that file open, in which case you wait until they're done). (Thanks to Rick McShane for this idea.) This does raise the question of what to do if process A has the file open and process B starts updating it. Should process A just keep a handle to the old file with the now-stale data (which is probably acceptable in a lot of instances), or should it somehow be notified that the data has changed out from under it? (It is assumed that process B can do this kind of update; if you don't want it to, you have to do the sort of locking mentioned above.) All these arguments also apply to record locking and transaction atomicity for record updates. -- Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy