[gnu.emacs] File-locking bug in NFS file-sharing environment

sdk@ra.cs.ucla.edu (Scott D Kalter) (11/23/88)

We have discovered a bug in the file locking strategy for Emacs when
being used in a multi-machine, shared file-system environment.  I
would guess that we are not the first to discover it but I have not
seen it discussed.

The problem occurs when two seperate Emacs processes running on two
DIFFERENT CPUs access and attempt to modify the same nfs-shared file.
The second Emacs that attempts to lock the file will find the file
already locked.  The lock, of course, contains the PID of the process
that made it.  However, the second emacs only looks at the machine it
is running on to see if that PID is valid (actually in use by a
running process).

Often, that process does not exist, and Emacs assumes that the process
died at some point without cleaning up the lock.  It then SILENTLY
steals the lock!!  This completely defeats the locking mechanism.

After noticing this problem, a couple of people from our lab looked
into the code to see if we could fix this.  We needed to support
locking in our environment where the same set of shared files are
edited by several people on different machines.

We could not find a completely general/elegant solution to this
problem that could be implemented by simply making changes to Emacs.
We could think of many obscure hacks to make it work better but these
included changing the contents of locks to include both PID and
system-name (a somewhat large change) and then performing some sort of
check to see if that process was running on the specified machine.
Unfortunately, we could not think of a nice, clean way to check a PID
on another machine (any ideas?).

We settled for the following, quick but unhealthy solution:

We disabled the code that checked for the PID so that emas always
assumes that the locks are being used by existing processes.  Emacs
simply checks to see if it is the owner of the lock or not.  This has
two bad effects:

1.  The lock file could fill up with locks made by dying Emacses.  The
lock directory would then have to be cleaned out by hand every once in
a while (a great while, we hope).

2.  The worst case is that two emacses are running on seperate
machines with the same PID!!  In this case, the second emacs to
attempt to lock the file will find a lock with its own PID in it.  It
will further assumes that the lock is it's own lock and simply
continue, effectively stealing the lock silently.

We are betting that under normal circumstances it will be unlikely for
two emacses on different machines to have the same PID (this
assumption has problems after something like a power-outage when
multiple machines come up together, ready to provide similar PID's to
everyone).  

We simply wanted to bring this problem to everyone's attention.
Shared file systems are becoming very commonplace so this may become a
more frequently cited problem.  The Emacs file-locking feature is
quite simple but very helpful under many circumstances.  We hope that
with some more minds at work an obviously simple solution (that we
missed) can be found.

Thanks for your attention,

-Scott Kalter  <sdk@cs.ucla.edu>