sdk@ra.cs.ucla.edu (Scott D Kalter) (11/23/88)
We have discovered a bug in the file locking strategy for Emacs when being used in a multi-machine, shared file-system environment. I would guess that we are not the first to discover it but I have not seen it discussed. The problem occurs when two seperate Emacs processes running on two DIFFERENT CPUs access and attempt to modify the same nfs-shared file. The second Emacs that attempts to lock the file will find the file already locked. The lock, of course, contains the PID of the process that made it. However, the second emacs only looks at the machine it is running on to see if that PID is valid (actually in use by a running process). Often, that process does not exist, and Emacs assumes that the process died at some point without cleaning up the lock. It then SILENTLY steals the lock!! This completely defeats the locking mechanism. After noticing this problem, a couple of people from our lab looked into the code to see if we could fix this. We needed to support locking in our environment where the same set of shared files are edited by several people on different machines. We could not find a completely general/elegant solution to this problem that could be implemented by simply making changes to Emacs. We could think of many obscure hacks to make it work better but these included changing the contents of locks to include both PID and system-name (a somewhat large change) and then performing some sort of check to see if that process was running on the specified machine. Unfortunately, we could not think of a nice, clean way to check a PID on another machine (any ideas?). We settled for the following, quick but unhealthy solution: We disabled the code that checked for the PID so that emas always assumes that the locks are being used by existing processes. Emacs simply checks to see if it is the owner of the lock or not. This has two bad effects: 1. The lock file could fill up with locks made by dying Emacses. The lock directory would then have to be cleaned out by hand every once in a while (a great while, we hope). 2. The worst case is that two emacses are running on seperate machines with the same PID!! In this case, the second emacs to attempt to lock the file will find a lock with its own PID in it. It will further assumes that the lock is it's own lock and simply continue, effectively stealing the lock silently. We are betting that under normal circumstances it will be unlikely for two emacses on different machines to have the same PID (this assumption has problems after something like a power-outage when multiple machines come up together, ready to provide similar PID's to everyone). We simply wanted to bring this problem to everyone's attention. Shared file systems are becoming very commonplace so this may become a more frequently cited problem. The Emacs file-locking feature is quite simple but very helpful under many circumstances. We hope that with some more minds at work an obviously simple solution (that we missed) can be found. Thanks for your attention, -Scott Kalter <sdk@cs.ucla.edu>