[comp.unix.wizards] Unix File Locks

david@bdt.UUCP (David Beckemeyer) (04/05/89)

For years I've seen a lot of Unix code that uses "lockfiles".  It is
often of the form:

	if (stat(lockpath, &statbuf) == 0) 
		return(-1);	/* lock failed */	

	if ((lock_fd = creat(lockpath, 0600)) < 0)
		return(-1);	/* error */

	/* lock succeeded */

My question is:  Isn't there a race condition between the stat and creat?

I've seen this type of code all over the place and I've never understood
why it works.

Excuse me if this has come up thousands of times before; I've never seen it.
-- 
David Beckemeyer (david@bdt.UUCP)	| "Adios amigos.  And, as they say when 
Beckemeyer Development Tools		| the boys are scratching the bad ones,
478 Santa Clara Ave. Oakland, CA 94610	| 'Stay a long time, Cowboy!'"
UUCP: {uunet,ucbvax}!unisoft!bdt!david	|                  - Jo Mora

chris@mimsy.UUCP (Chris Torek) (04/06/89)

In article <538@bdt.UUCP> david@bdt.UUCP (David Beckemeyer) writes:
>For years I've seen a lot of Unix code that uses "lockfiles".  It is
>often of the form:
>
>	if (stat(lockpath, &statbuf) == 0) 
>		return(-1);	/* lock failed */	
>	if ((lock_fd = creat(lockpath, 0600)) < 0)
>		return(-1);	/* error */
>	/* lock succeeded */
>
>My question is:  Isn't there a race condition between the stat and creat?

Yes.

>I've seen this type of code all over the place and I've never understood
>why it works.

For the same reason that code without locks `works': actual conflicts are
rare.

Two better mechanisms, which still leave dead locks behind in the
presence of system crashes (including power failures and the like),
are

	if ((lock_fd = open(lockpath, O_CREAT|O_EXCL, 0666)) < 0)
		... failed ...
	else
		... succeeded ...

and

	if (link(temppath, lockpath) < 0)
		... failed ...
	else
		... succeeded ...

A combination of either of these and advisory locks (file locks in
4BSD or byte-span locks in Other Leading Brands) works best.  (Pure
advisory locks allow anyone who can open the file to lock it, permanently.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rwhite@nusdhub.UUCP (Robert C. White Jr.) (04/07/89)

in article <538@bdt.UUCP>, david@bdt.UUCP (David Beckemeyer) says:
> My question is:  Isn't there a race condition between the stat and creat?

Shure there is, but the lockfile mechanisim is defined so that "he who
creates the lockfile first gets the device.  The stat is mostly useful
only if the code does a cleanup/verify attempt.  Since your example
is simply check1-or-exit, check2&lock-or-exit there multipath along
which the race condition produces an undefined result.  The "most
correct" example I can think of for intellegent locking is the HBD
uucp method which goes somewhat like this: (psudocode)

{
if stat succedes {
	attempt validate-and-remove by reading the contents
	of the lock file and trying to kill(0) the process
	number therein.  If the process is running {return
	failure} else {remove lockfile}}
(AT this time, if more than one program is contending, the remove may
meet with an error but we don't care, the first person to create
a valid lockfile wins.  If you get to here, it is a free-for-all; if
you didn't get to here you already failed so the race condition is moot)

if create fails { return failure }

continue processing
}


Since all resource locking is a race condition by nature this structure
makes the important part of the race occur only within the create call,
which is about the tightest granularity you can hope for.  A slightly
tighter method is to place a lock on the region of the old file before
gettign to the meat of the kill(0), then if there is no such process
you can simply palce your process number in the existing file.  The
lock and alter method removes the window of vunerability between the
subsiquent commands, and thereby tightening the algorithm.  the down side
of this is that EVERY program which contends on the resource MUST 
(either or both of) use the lock method (with manners) and/or set the
manditory locking flag durring creation, something many programmers might
not think of checking or doing without very spesific instructions.

Share and Enjoy,
Rob.

guy@auspex.auspex.com (Guy Harris) (04/07/89)

>My question is:  Isn't there a race condition between the stat and
>creat?

Yes.

>I've seen this type of code all over the place and I've never understood
>why it works.

If it works, it works because people are lucky and win the race.

A slightly better version is:

	if ((lock_fd = creat(lockpath, 0400)) < 0)
		return(-1);	/* error */

without the "stat".  This creates the file read-only, which means that a
subsequent attempt to "creat" it by anybody but the super-user will
fail, since they won't have write permission.

An even better version is:

	#include <fcntl.h>	/* yes, even on BSD, the documentation */
				/* nonwithstanding.  Trust me. */

	if ((lock_fd = open(lockpath, O_CREAT|O_EXCL|O_WRONLY, 0600)) < 0)
		return(-1);	/* error */

which works on more recent UNIX systems - System III, System V,
4.[23]BSD, and systems derived from one or more of those.  The O_CREAT
makes the "open" create the file if it doesn't exist (subsuming
"creat"); the O_EXCL makes the "open" fail if it *does* exist, even if
you're the super-user.  (The O_WRONLY is a nicer way of saying "leave
the resulting file descriptor open for writing only" than "1" is, these
days.  If you plan to read it, make it O_RDWR instead.)