david@bdt.UUCP (David Beckemeyer) (04/05/89)
For years I've seen a lot of Unix code that uses "lockfiles". It is often of the form: if (stat(lockpath, &statbuf) == 0) return(-1); /* lock failed */ if ((lock_fd = creat(lockpath, 0600)) < 0) return(-1); /* error */ /* lock succeeded */ My question is: Isn't there a race condition between the stat and creat? I've seen this type of code all over the place and I've never understood why it works. Excuse me if this has come up thousands of times before; I've never seen it. -- David Beckemeyer (david@bdt.UUCP) | "Adios amigos. And, as they say when Beckemeyer Development Tools | the boys are scratching the bad ones, 478 Santa Clara Ave. Oakland, CA 94610 | 'Stay a long time, Cowboy!'" UUCP: {uunet,ucbvax}!unisoft!bdt!david | - Jo Mora
chris@mimsy.UUCP (Chris Torek) (04/06/89)
In article <538@bdt.UUCP> david@bdt.UUCP (David Beckemeyer) writes: >For years I've seen a lot of Unix code that uses "lockfiles". It is >often of the form: > > if (stat(lockpath, &statbuf) == 0) > return(-1); /* lock failed */ > if ((lock_fd = creat(lockpath, 0600)) < 0) > return(-1); /* error */ > /* lock succeeded */ > >My question is: Isn't there a race condition between the stat and creat? Yes. >I've seen this type of code all over the place and I've never understood >why it works. For the same reason that code without locks `works': actual conflicts are rare. Two better mechanisms, which still leave dead locks behind in the presence of system crashes (including power failures and the like), are if ((lock_fd = open(lockpath, O_CREAT|O_EXCL, 0666)) < 0) ... failed ... else ... succeeded ... and if (link(temppath, lockpath) < 0) ... failed ... else ... succeeded ... A combination of either of these and advisory locks (file locks in 4BSD or byte-span locks in Other Leading Brands) works best. (Pure advisory locks allow anyone who can open the file to lock it, permanently.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
rwhite@nusdhub.UUCP (Robert C. White Jr.) (04/07/89)
in article <538@bdt.UUCP>, david@bdt.UUCP (David Beckemeyer) says: > My question is: Isn't there a race condition between the stat and creat? Shure there is, but the lockfile mechanisim is defined so that "he who creates the lockfile first gets the device. The stat is mostly useful only if the code does a cleanup/verify attempt. Since your example is simply check1-or-exit, check2&lock-or-exit there multipath along which the race condition produces an undefined result. The "most correct" example I can think of for intellegent locking is the HBD uucp method which goes somewhat like this: (psudocode) { if stat succedes { attempt validate-and-remove by reading the contents of the lock file and trying to kill(0) the process number therein. If the process is running {return failure} else {remove lockfile}} (AT this time, if more than one program is contending, the remove may meet with an error but we don't care, the first person to create a valid lockfile wins. If you get to here, it is a free-for-all; if you didn't get to here you already failed so the race condition is moot) if create fails { return failure } continue processing } Since all resource locking is a race condition by nature this structure makes the important part of the race occur only within the create call, which is about the tightest granularity you can hope for. A slightly tighter method is to place a lock on the region of the old file before gettign to the meat of the kill(0), then if there is no such process you can simply palce your process number in the existing file. The lock and alter method removes the window of vunerability between the subsiquent commands, and thereby tightening the algorithm. the down side of this is that EVERY program which contends on the resource MUST (either or both of) use the lock method (with manners) and/or set the manditory locking flag durring creation, something many programmers might not think of checking or doing without very spesific instructions. Share and Enjoy, Rob.
guy@auspex.auspex.com (Guy Harris) (04/07/89)
>My question is: Isn't there a race condition between the stat and >creat? Yes. >I've seen this type of code all over the place and I've never understood >why it works. If it works, it works because people are lucky and win the race. A slightly better version is: if ((lock_fd = creat(lockpath, 0400)) < 0) return(-1); /* error */ without the "stat". This creates the file read-only, which means that a subsequent attempt to "creat" it by anybody but the super-user will fail, since they won't have write permission. An even better version is: #include <fcntl.h> /* yes, even on BSD, the documentation */ /* nonwithstanding. Trust me. */ if ((lock_fd = open(lockpath, O_CREAT|O_EXCL|O_WRONLY, 0600)) < 0) return(-1); /* error */ which works on more recent UNIX systems - System III, System V, 4.[23]BSD, and systems derived from one or more of those. The O_CREAT makes the "open" create the file if it doesn't exist (subsuming "creat"); the O_EXCL makes the "open" fail if it *does* exist, even if you're the super-user. (The O_WRONLY is a nicer way of saying "leave the resulting file descriptor open for writing only" than "1" is, these days. If you plan to read it, make it O_RDWR instead.)