[net.arch] Locking, VMS Locking on Unix, INGRES

daveb@rtech.ARPA (Dave Brower) (03/28/85)
>> The VMS locking protocol supposedly works on VAX/VMS systems...
                            ^^^^^^^^^^
Well actually, it DOES work, very well.  I suspect the author's
orthodoxy has crept into this 'non-religious' discussion :-).

> Perhaps somebody from Relational Technology would be interested in
> discussing the problem of locking on UNIX, since they had to deal
> with it in porting Ingres to UNIX machines.
>				Phil Kos

For a good introduction to the issues, you should examine C.J. Date's
"Introduction to Database Systems, Volume II," (Addison-Wesley). Chapter
3 is devoted to concurrency control.  He discusses in detail the
tradeoffs that are made between maximizing concurrency, repeatability,
and data integrity.

The drive to increase concurrency is the major reason for the use of
many different locking levels, such as EXCLUSIVE, SHARED,
INTENDED_SHARED, SHARED_INTENDED_EXCLUSIVE, and INTENDED_EXCLUSIVE.
These are easily handled via the magic-cookie lock manager, but are
difficult with simpler schemes (like lockf()).  This is not to say that
lockf() is inadequate for data integrity, only that it does not allow
the maximum possible concurrency.  Flock(2) could be extended to support
more lock levels but has insufficient granularity.

OK, presume you've decided you need a cookie manager, how do you get
one? An adequate magic-cookie lock manager comes with the VMS tape; on
Unix we need to do some work.  Locking can be seen as a specialized
inter-process communications problem.  Since we know how well different
Unices handle IPC :-), building a fast lock manager gets ugly very fast.

We can't presuppose any locking primitives (such as the lockf()) beyond
the raw filesystem.  Uucp, tip, et. al. do locking using files, which
works, but is much too slow to consider for a database.  The options are
shared memory (System V only), semaphores (SV only), named pipes (SV
only), sockets and server processes (BSD only), and device drivers.  

RTI currently does locking for INGRES under Unix using a pseudo-device
controlling 'magic-cookies.'  Two of the advantages of this approach are
(relative) universality among the different Unixes, and kernel-level
atomicity of lock operations.  The major disadvantage is the need to
install the pseudo-device driver in the kernel of any system to support
multi-user INGRES.  This is not a simple operation for many, many Unix
sites, nor is it always easy to have an OEM install a driver for you.

There has been some discussion of the desirability of putting locks in
the filename space rather than the 'magic-cookie jar.'  I think this is
inappropriate for database locks for the following reasons:

  1) Creating names in the files space is irrelevant to an
application.  It just wants the lock.

  2) Creating and destroying a file-space name is always going to take
some time, even if it's just copying the magic-cookie name into the
directory file.  Given the number of potential locks in the database
space (one for each record), and the frequency of lock transactions,
this overhead is unacceptable in a high-performance application.

  3) In the network environment, there is not likely to be a common
namespace for files on all the connected machines.  This is why the BSD
IPC Unix domain (implemented using the filesystem namespace) is only
good on-machine.  The Internet domain for communication with other
machines is implemented using cookie-like network addresses.

My votes are disposed towards very general advisory schemes that go
VERY fast and can work in a networked environment.  

-dB

----------------
These opinions are my own, and do not necessarily represent those of my
employer.

VMS is a trademark of Digital Equipment Corporation.  Unix and
System V are tradmarks of ATT Bell Laboratories.
-- 
{ucbvax, decvax}!mtxinu \
           ihnp4!amdahl / !rtech!daveb

"If it worked, we wouldn't call it High Tech"