[comp.sys.sun] Problems gaining/releasing exclusive file locks in Sun NFS

angst@batserver.cs.uq.oz.au (10/08/90)

I have a data file that processes running on different clients (running
SunOS 4.1) must access with mutual exclusion.

I tried using lockf to gain/release exclusive locks on the file, but this
didn't work.  When a process attempted to gain a lock when another process
had access it was intended to sleep until the latter released its lock.
This it did, but when the latter tried to release its lock, both processes
hung (and eventually entered uninterruptable waits).

So I turned to fcntl.  This worked perfectly when the processes were
running on the same client.  When this wasn't the case, the blocked
process would hang indefinately; the process with access could release
okay and go on its merry way.  Other processes could gain locks even
though the process that was originally blocked should have been given
access long before.

Here is the code I have used to test this :
---fcntltest.c---------
/*
    Tests the operation of fnctl when using it to gain exclusive locks on
    remote files.
*/
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>

#define	WHOLE_FILE	0
#define	LOCK_OBJECT	"fcntlobj"

#define or      : case
#define when    break; case

void
LockError (message, filename)
	char	*message, *filename;
{
	switch (errno) {
	when EBADF :
		fprintf (stderr, "%s %s :\n\t%s\n", message, filename,
		         "not valid open file descriptor");
	when EDEADLK :
		fprintf (stderr, "%s %s :\n\t%s\n", message, filename,
		         "deadlock would have occurred");
	when EFAULT or EINVAL :
		fprintf (stderr, "%s %s :\n\t%s\n", message, filename,
			 "arg to fcntl not valid");
	when EINTR :
		fprintf (stderr, "%s %s :\n\t%s\n", message, filename,
		         "interrupted while locking");
	when ENOLCK :
		fprintf (stderr, "%s %s :\n\t%s\n", message, filename,
		         "no more file lock entries available");
	}

	exit (0);
}

void
GainLock (lock_file, filename)
	FILE		*lock_file;
	char		*filename;
/*
    Attempts to gain an exclusive lock on open file "lock_file", if there is
    contention the loser sleeps until the lock is released.
*/
{
	struct flock	*lock_str;

        lock_str = (struct flock *)malloc(sizeof(struct flock));

        lock_str->l_type = (short) F_WRLCK;
        lock_str->l_whence = (short) SEEK_SET;
        lock_str->l_start = (long) 0;
        lock_str->l_len = (long) WHOLE_FILE;
	lock_str->l_pid = (pid_t) 0;

        if (fcntl (fileno (lock_file), F_SETLKW, lock_str) < 0)
		LockError ("Error while attempting to lock", filename);
}

void
ReleaseLock (lock_file, filename)
	FILE		*lock_file;
	char		*filename;
/*
    Releases an exclusive lock gained previously in GainLock above.
*/
{
	struct flock	*lock_str;

        lock_str = (struct flock *)malloc(sizeof(struct flock));

        lock_str->l_type = (short) F_UNLCK;
        lock_str->l_whence = (short) SEEK_SET;
        lock_str->l_start = (long) 0;
        lock_str->l_len = (long) WHOLE_FILE;
	lock_str->l_pid = (pid_t) 0;

        if (fcntl (fileno (lock_file), F_SETLK, lock_str) < 0)
		LockError ("Error while attempting to unlock", filename);
}

main ()
{
	FILE		*lock_file;
	char		ch;

	lock_file = fopen (LOCK_OBJECT, "r+");

	printf ("Attempting to lock ... "); fflush (stdout);
	GainLock (lock_file, LOCK_OBJECT);
	printf ("done\n");

	printf ("\nPress a key :"); fflush (stdout);
	ch = getchar ();

	printf ("\nUnlocking ... "); fflush (stdout);
	ReleaseLock (lock_file, LOCK_OBJECT);
	printf ("done\n");

	fclose (lock_file);

	printf ("Exiting.\n");
}
---fcntltest.c---------

I have run this simultaneously on different hosts (in different windows on
the same hosts) to test the performance of fcntl.  You have just read of
the results of this testing.

I tried changing the last two lines of GainLock to this code fragment :

        if (fcntl (fileno (lock_file), F_SETLK, lock_str) < 0)
                if ((errno == EACCES) | (errno == EAGAIN)) {
                        fprintf (stderr, "it's already locked\n");
			exit (0);
		}
                else
                        LockError ("Error while attempting to lock", filename);

so that if a lock already exists a message to that effect is printed and
the process exits.  When I tried this version in a similar manner, the
"loser" still became hung; fcntl should have returned immediately with -1
and errno set to EACCES (or possibly EAGAIN).  It did not.

Can any Sun guru out there explain this ?  Has this cropped up before ?
Is there a workaround (apart from minimising the chance of access
contension) ?

dhesi%cirrusl@oliveb.atc.olivetti.com (Rahul Dhesi) (10/27/90)

Re the report from angst@batserver.cs.uq.oz.au about problems with lockf
and fcntl for locking:

Many moons ago I experimented with locking mechanisms under SunOS 3.5.  I
found that network-wide locking did not appear to work correctly, even
though the lock daemons were up and running.  I fell back to the old
standard technique of creating a lock file.  Despite rumors that that file
creation over NFS might not be truly exclusive and atomic, I have had no
problems.  Since then, I've often seen complaints posted to Usenet about
lockf and fcntl not working correctly for locking over NFS, and have
always been relieved that I decided not to use them.

Of course, when you use file creation for network-wide locking, you
must store both host name and process id in the lock file, and
supersede an existing lock if the locking process was on the local host
and has crashed.  You must also use a secondary lock before
manipulating the primary lock, so any race condition is minimized to
infinitesimal probability.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi