papowell@vlsi.cs.umn.edu (Patrick Powell) (01/28/90)
I have heard that there has been discussions of some problems with the lock daemon and lockf. I just found one here that appears to be very serious. Before I waste all sorts of time, could somebody please post me the current status of the lockf and other file locking problems? Patrick Powell Prof. Patrick Powell, Dept. Computer Science, EECS 4-192, University of Minnesota, Minneapolis, MN 55455 (612)625-3543/625-4002
dan@bbn.com (02/10/90)
With regard to problems with Sun's locking daemon and network locking in general, here are some problems we've found: 1. Sun's lockd and statd (3.4) are awful in the presence of other machines implementing NFS but not lockd, such as Ultrix machines for Ultrix < 3.0. If you NFS-mount such a machine's filesystem onto a Sun, then try (on the Sun) to lock a file on the Ultrix machine, your process will hang forever, unkillable. To get around this we had to write a test preceding the locking call that sends an inquiry to the host holding the file to be locked; if we learn that no lockd or statd exists (you must check for both) then we use local locking on the file instead. 2. Other vendors are no better. The Ultrix 3.0 lockd sometimes pauses for 2 minutes when you first try to use it in a process. We are still tracking this one down, but it seems to depend on configuration issues like where you're getting your hostnames. A given configuration (i.e., /etc/svcorder, /etc/hosts, etc. and the up/down status of the other machines NFS-mounted to the one in question) will either always show this problem or never show it. A trace of the process shows it repeatedly sending a message to some other host and waiting 5 seconds for a response. 3. Another problem we have seen under Ultrix 3.0 is that it often takes several (non-blocking) fcntl calls before a file lock is granted. We're not sure what precipitates this behavior. We've seen this after locking and unlocking a file with one process: when we try to lock the same file through another process, several attempts are required. (To demonstrate this bug, run a test program that simply calls fcntl in a loop, reporting the number of iterations necessary to acquire a lock.) There are patches for this bug, once you realize what's going on. (On a DECstation, you should upgrade to 3.1 before applying the patches; they don't work so well under 3.0.) 4. It's worth pointing out a SunOS fcntl locking "feature" that may not be obvious: if you open() 2 file descriptors on a single file, fd1 and fd2, establish a lock on fd1 and then close(fd2), the lock established through fd1 is lost. Mark Sommer and Dan Franklin
guy@uunet.uu.net (Guy Harris) (02/23/90)
>4. It's worth pointing out a SunOS fcntl locking "feature" that may not be >obvious: if you open() 2 file descriptors on a single file, fd1 and fd2, >establish a lock on fd1 and then close(fd2), the lock established through >fd1 is lost. That's not a "SunOS fcntl locking feature", it's a System V "fcntl" locking feature. To quote SVID Issue 2, volume 3, FCNTL(BA_OS): ...All locks associated with a file for a given process are removed when a file descriptor for that file is closed by that process or the process holding that file descriptor terminates. Note that it says "a file descriptor", not "the file descriptor with which that lock was established" or something like that. In addition, note that POSIX says the same thing; in other words, it's not specific to SunOS, and you'd better be prepared for it to work that way on *all* UNIX systems (and even non-UNIX POSIX systems).