[net.arch] Locking, again

mo@seismo.UUCP (Mike O'Dell) (03/13/85)

This locking business is another Undead Topic, but here goes my part.

(1) General Petri nets have been shown to be insufficient to represent
all syncronizing problems, and a more powerful formalism has not been
put forward to my knowledge.  This means WE CAN'T REALLY SOLVE THE
PROBLEM, so we are reduce to Engineering Solutions which solve some
useful subset.

(2) One of the most difficult issues is what people expect locks to mean.
In particular, when a process dies some horrid death while holding a lock,
(segmentation violation, inadvertent gunning-down by an aberant operator),
what should happen?  If the lock means a simple append was being done,
they breaking the lock is probably OK, if not desirable.  But if the
lock means "the database b-tree is inconsistant", then breaking the lock is
the last thing you want to do.  Moreover, in the latter case,  such
information should survive across system crashes if it to be of any use
whatsoever.  So, now we have quickly gotten from "simple file locking"
to a stable, surviving lock which must be done with multi-phase commits,
not to mention what happens across networks.  

So what does this mean?? It mean that concurrency control is an END-TO-END
issue just like communications error control, and you CANNOT put a single
mechanism into a system which will do the Right Thing all the time.
People who want to update a simple file can use advisory file locks just
fine without needing the hair of byte-span locks, although the byte-span
lock might provide more concurrency - since we have an implementation,
it is probably OK to use it.  But people MUST understand what is NOT
promised by locks!!  This is all the more reason to make them discressionary.

A non-discressionary lock is honored by all processes without their agreement,
while discressionary locks are honored only by "consenting processes".
This is VERY important where denial of service is considered a serious
security issue.  Locking /etc/password, /bin, your enemy's home directory
or some other well-chosen target would bring a system to its knees in
an instant, or reduce it to a battlefield.  This brings up the next point.
Protection is tricky in a locking environment.  Who has the authority to
prevent someone from reading or writing a file they would otherwise
have permission operate upon??  To handle this anywhere close to right in
the presence of non-discressionary locks requires adding several if not
many more protection bits and thoroughly complicating the model.  
I claim this complexity buys nothing useful and merely makes things more
complicated, particularly in distributed systems.  

Well, enough of this for now.  Locking is like the Camel sticking its
nose through the tent flap - before you know it, the Camel is in the
tent with you.  The trip from "simple file locking" to "stable storage"
is beguilingly straight-forward to imagine, and damnably difficult
to realize, not to mention the question of whether you even want that.

	-Mike O'Dell

sdo@u1100a.UUCP (Scott Orshan) (03/14/85)

In article <1685@seismo.UUCP> mo@seismo.UUCP (Mike O'Dell) writes:
>
>(2) One of the most difficult issues is what people expect locks to mean.
>In particular, when a process dies some horrid death while holding a lock,
>(segmentation violation, inadvertent gunning-down by an aberant operator),
>what should happen?  If the lock means a simple append was being done,
>they breaking the lock is probably OK, if not desirable.  But if the
>lock means "the database b-tree is inconsistant", then breaking the lock is
>the last thing you want to do.

There are ways to manipulate B-trees (or slight modifications)
such that they are NEVER inconsistent.  This assumes that disk writes
are atomic - either nothing gets written or an entire block gets written.
If this is not true, then you need to have two copies of the tree
which get updated in a well defined sequence such that one of them
is always consistent even if a write fails in the middle.

We once had a database implemented this way where a shadow driver
provided the duplication of I/O without the database code's awareness.
Obviously, all writes must be immediate - not delayed in the buffer cache.

-- 

			Scott Orshan
			Bell Communications Research
			201-981-3064
			{ihnp4,allegra,bellcore,pyuxww}!u1100a!sdo

mason@utcsri.UUCP (Dave Mason) (03/15/85)

Mike O'Dell claims out that non-discretionary locking could bring a system
to it's knees.  I suggest that the examples he gives are far from proof.

The reason I disagree, is that I see no reason why any operating system would
allow someone to lock a file that they cannot write to.  Therefore the only
user who could lock say /etc/passwd is root (and we trust her, right? ;-).
In general the only people who can cause havoc with file A by locking it
can presumably cause much more problems by modifying it.
-- 
Usenet:	{dalcs dciem garfield musocs qucis sask titan trigraph ubc-vision
 	 utzoo watmath allegra cornell decvax decwrl ihnp4 uw-beaver}
	!utcsri!mason		Dave Mason, U. Toronto CSRI
CSNET:	mason@Toronto
ARPA:	mason%Toronto@CSNet-Relay

guy@rlgvax.UUCP (Guy Harris) (03/16/85)

> The reason I disagree, is that I see no reason why any operating system would
> allow someone to lock a file that they cannot write to.

What if program A wants to read, say, "/etc/passwd", and wants to make sure
nobody's writing on top of it while it reads?

Of course, another solution to this is to provide transaction atomicity
on the operation "replace contents of file".  In effect, you can create
a "replacement" for the file which isn't visible until you've written the
last block of the new copy and declared the transaction closed.

One way of doing this is to create a temporary file and rename it; this
technique is used in some UNIX software, including the S3/S5 "ed".  However,
it won't preserve links, doesn't work if you don't have write permission
in the containing directory (these two problems are handled by "ed" not
using this technique if the file has links or is in an unwritable directory),
and doesn't work if you can't give files away.  Furthermore, if the system
crashes before the rename, the temporary file is not cleaned up.

Another way is to support file "version numbers", Tenex/RSX/VMS style.
In this case, you don't do a rename which deletes the old version; you
just create a version with a higher version number.  When the new version
is being written out, it exists as the highest numbered version.  You
probably want to have this file specially flagged or locked, so that
nobody can open it until you're done.  You may want to return an error
on attempts to read it, or may want to block the process until the new
version is ready (which means you have locked the file), or you may want
to have the previous version opened.

A third way is to have the new file not have any directory entry pointing
to it until it's fully written out and you declare it "ready", at which
point its inode/file header/whatever replaces the previous version's, and
the previous version's blocks get returned to the free pool (unless somebody
has that file open, in which case you wait until they're done).  (Thanks
to Rick McShane for this idea.)

This does raise the question of what to do if process A has the file open
and process B starts updating it.  Should process A just keep a handle
to the old file with the now-stale data (which is probably acceptable in
a lot of instances), or should it somehow be notified that the data has
changed out from under it?  (It is assumed that process B can do this kind
of update; if you don't want it to, you have to do the sort of locking
mentioned above.)

All these arguments also apply to record locking and transaction
atomicity for record updates.
-- 
	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy