[net.unix-wizards] File locking on networks

gnu@hoptoad.uucp (John Gilmore) (01/10/86)

In article <1106@brl-tgr.ARPA>, gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
> > Note also that a serious file locking mechanism on a network must provide
> > a way for a user program to be notified that the system has broken its lock.
> > This situation occurs when a process locks a file on another machine, 
> > and a comm link between the two machines goes down.  You clearly can't
> > keep your database down for hours while AT&T (grin) puts your long line
> > back in service, so the lock arbiter reluctantly breaks the lock.  (It
> > can't tell if your machine crashed or whether it was just a comm
> > line failure anyway.)  Now everybody can get at the file OK, but when the
> > comm link comes back up, the process will think it owns the lock and
> > will muck with the file.  So far nobody has designed a mechanism to tell
> > the process that this has happened, which means to be safe the system must
> > kill -9 any such process when this happens (e.g. it must make it *look*
> > like the system or process really did crash, even though it was just a
> > comm link failure).  I'm not sure how you even *detect* this situation
> > though.
> 
> I don't see a big problem.  There are three possible cases of failure...
> (2)  Communication link crashes.  (3)  Remote system crashes after
> planting a lock.  Cases (2) and (3) are the interesting ones, but they
> can be easily handled by simply pinging the locking system when a lock
> conflict occurs.  (Various strategies could be used to reduce pinging
> frequency, if desired, but I don't think it would be necessary.)  If the
> locker denies knowledge of the lock, then void it locally and proceed.

I don't see how the above proposal solves anything.  Take case (2).
The system that contains the data notices a lock conflict.  It pings
the system holding the lock.  It gets "network not reachable".  It
voids the lock and the database is now accessible.  OK, but the
database is in an inconsistent state.  Maybe when it breaks the lock it
does a database cleanup.  OK, now suppose the comm link comes back up.
The system that was out of touch still thinks it holds the lock; it's
been pinging the server trying to get an I/O request in (for example).
When the link comes up, the I/O request will get thru.  What does the
server do with this request?  If it satisfies it, it has permitted the
database to be changed by someone who doesn't have the lock.  It must
reject the request (e.g. a Unix read() or write() call) specifying some
kind of lock failure error code.  The application program on the remote
machine thinks it owns the lock.  It must be written to go back to the
top of the transaction and try to obtain the lock again, when it gets
this error code.  There are no such provisions in the System V locking
facilities.  Thus programs written for those facilities will break when
moved onto networks.

How can I make this clearer?  I'd be glad to be convinced that there is
no problem, but I think there really is...

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/12/86)

> I don't see how the above proposal solves anything.  Take case (2).
> The system that contains the data notices a lock conflict.  It pings
> the system holding the lock.  It gets "network not reachable".  It
> voids the lock and the database is now accessible.  OK, but the
> database is in an inconsistent state.  Maybe when it breaks the lock it
> does a database cleanup.  OK, now suppose the comm link comes back up.
> The system that was out of touch still thinks it holds the lock; it's
> been pinging the server trying to get an I/O request in (for example).
> When the link comes up, the I/O request will get thru.  What does the
> server do with this request?  If it satisfies it, it has permitted the
> database to be changed by someone who doesn't have the lock.  It must
> reject the request (e.g. a Unix read() or write() call) specifying some
> kind of lock failure error code.  The application program on the remote
> machine thinks it owns the lock.  It must be written to go back to the
> top of the transaction and try to obtain the lock again, when it gets
> this error code.  There are no such provisions in the System V locking
> facilities.  Thus programs written for those facilities will break when
> moved onto networks.

The model I have in mind requires the owner of the actual file
(where the data is stored) to be the master of the file's locks.
Whenever it has to communicate with any slave about the locked
region, if there is a problem it cancels that slave's lock.
Similarly, each time a slave accesses a locked region, it tells
the master about it, and in case of disagreement about the state
of the locks, the master so informs the slave, which must correct
its local records.

Clearly, this can (as you say) make locks go away if the comm link
is flaky, but you should be doing this on top of virtual circuits
anyway, so that long-lasting communication flakiness is as severe a
problem as losing a disk (something that happens a lot around here).

I agree with your analysis of the necessary actions on the slave
when a lock breaks.  The slave is either trying to free a lock
(which is already done by the comm link breakage) or is trying to
do I/O on the locked region, which should return an error if the
master and slave do not agree as to the status of the lock.

Are Gilmore and I the only ones who care about this?
Does anyone have an elegant solution to the problem?
(Disallowing locks is not elegant!)

jack@boring.UUCP (Jack Jansen) (01/14/86)

[The problem: 
 - Program on machine A has lock on file on B.
 - Link between A and B goes down.
 - B decides A has crashed, breask lock, and does cleanup.
 - Link comes back up.
]
In my opinion, the best thing to do is that I/O operations done
by the program that still thinks it holds the lock return with EIO.

This is a condition that is understood by unix programs, for instance
when a disk goes offline), and if it is crucial to the application that
someth action is taken when an operation is partially completed, it
will catch the error, and do something intelligent.
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

mishkin@apollo.uucp (Nathaniel Mishkin) (01/15/86)

Here's how we (Apollo) deal with locking.  It's not perfect, but in
practice (e.g. on our internetwork of 1000+ workstations on 7 networks)
it works quite well:

There are two nodes associated with every lock:  the home node (i.e.
the node the file lives on), and the locking node (i.e. the node that
the process requesting the lock is running on).  The existence of a lock
is registered on both the home node and the locking node.  However, the
information on the home node is the one that really matters to the world,
since every lock request for files on that node come to it, not any other
locking nodes.  (Obviously, sometimes the home node and the locking node
can be identical, but this case is trivial, so I won't consider it.)

Locks are held in volatile storage (i.e. virtual memory, not disk) and
hence evaporate when a node goes down.  If a node is explicitly shut
down, many locks will be unlocked by virtue of processes holding locks
being killed.  Of any remaining locks, those held BY the node shutting
down, are force-unlocked.  Then the node broadcasts an "unlock all" message
to all other nodes.  Recipients of such a message force-unlock all locks
held BY the recipient ON files on the node that sent the message.

When a node boots, it broadcasts an "unlock all" message too.

When a node N locks a remote file, it sends a message to the remote (home)
node asking if it is OK to lock.  If the home node says "no, because
process P on node M has the file locked", N sends a message to M asking
if he really has that file locked.  If N says he doesn't have the file
locked, N tells the home node to force-unlock the file, and then N tries
to lock the file again.  This strategy is helpful in case a node has
missed an "unlock all" message.  (Since broadcasts aren't propagated
across bridges between networks, this can happen.)  Note that if node
M is unreachable, this scheme doesn't help.

So what do we do if you run into a "bad" case -- internet partition or
crashed node that hasn't been rebooted?  Well, someone will try to open
a file (and try to get a lock since all opens must be accompanied by
locks) but will get the error "object is in use".  We supply tools for
USERS to see who (what node and process) has the lock.  The user can
then decide whether it's safe to forcibly break the lock (there's another
tool to do that).  It's not a perfect scheme, but let's remember,
considering people run on Unix systems all the time with NO locking (even
in the local case), it's clearly a step up.

            -- Nat Mishkin
               Apollo Computer
               apollo!mishkin