[comp.sys.sun] Lock Daemon, lockf fails

papowell@vlsi.cs.umn.edu (Patrick Powell) (01/28/90)

I have heard that there has been discussions of some problems with the
lock daemon and lockf.  I just found one here that appears to be very
serious.  Before I waste all sorts of time,  could somebody please post me
the current status of the lockf and other file locking problems?

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, EECS 4-192,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

dan@bbn.com (02/10/90)

With regard to problems with Sun's locking daemon and network locking in
general, here are some problems we've found:

1. Sun's lockd and statd (3.4) are awful in the presence of other machines
implementing NFS but not lockd, such as Ultrix machines for Ultrix < 3.0.
If you NFS-mount such a machine's filesystem onto a Sun, then try (on the
Sun) to lock a file on the Ultrix machine, your process will hang forever,
unkillable.  To get around this we had to write a test preceding the
locking call that sends an inquiry to the host holding the file to be
locked; if we learn that no lockd or statd exists (you must check for
both) then we use local locking on the file instead.

2. Other vendors are no better.  The Ultrix 3.0 lockd sometimes pauses for
2 minutes when you first try to use it in a process.  We are still
tracking this one down, but it seems to depend on configuration issues
like where you're getting your hostnames.  A given configuration (i.e.,
/etc/svcorder, /etc/hosts, etc.  and the up/down status of the other
machines NFS-mounted to the one in question) will either always show this
problem or never show it.  A trace of the process shows it repeatedly
sending a message to some other host and waiting 5 seconds for a response.

3. Another problem we have seen under Ultrix 3.0 is that it often takes
several (non-blocking) fcntl calls before a file lock is granted.  We're
not sure what precipitates this behavior. We've seen this after locking
and unlocking a file with one process: when we try to lock the same file
through another process, several attempts are required. (To demonstrate
this bug, run a test program that simply calls fcntl in a loop, reporting
the number of iterations necessary to acquire a lock.) There are patches
for this bug, once you realize what's going on.  (On a DECstation, you
should upgrade to 3.1 before applying the patches; they don't work so well
under 3.0.)

4. It's worth pointing out a SunOS fcntl locking "feature" that may not be
obvious: if you open() 2 file descriptors on a single file, fd1 and fd2,
establish a lock on fd1 and then close(fd2), the lock established through
fd1 is lost.

Mark Sommer and Dan Franklin

guy@uunet.uu.net (Guy Harris) (02/23/90)

>4. It's worth pointing out a SunOS fcntl locking "feature" that may not be
>obvious: if you open() 2 file descriptors on a single file, fd1 and fd2,
>establish a lock on fd1 and then close(fd2), the lock established through
>fd1 is lost.

That's not a "SunOS fcntl locking feature", it's a System V "fcntl"
locking feature.  To quote SVID Issue 2, volume 3, FCNTL(BA_OS):

	...All locks associated with a file for a given process are
	removed when a file descriptor for that file is closed by that
	process or the process holding that file descriptor terminates.

Note that it says "a file descriptor", not "the file descriptor with which
that lock was established" or something like that.

In addition, note that POSIX says the same thing; in other words, it's not
specific to SunOS, and you'd better be prepared for it to work that way on
*all* UNIX systems (and even non-UNIX POSIX systems).