[comp.mail.misc] Will ELM ever use lockf

chip@tct.com (Chip Salzenberg) (04/09/91)

According to gemini@geminix.in-berlin.de (Uwe Doering):
>Following this patch, there is another one that shows how to convince
>Smail 3.1.20 to use lock files instead of lockf() unter SysVr3. Maybe
>this works for earlier Smail releases, too.

The patches posted by Uwe for Elm are much appreciated.

His patches for Smail, however, are *far* more than is required.
Smail 3 already has a knob that controls mailbox locking specifically.
In conf/EDITME, and in conf/os/<your-os-here>, do *not* set
FLOCK_MAILBOX.  That's it.  (See the comments in conf/os/sys5* for
further description of of FLOCK_MAILBOX.)
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>

les@chinet.chi.il.us (Leslie Mikesell) (04/09/91)

In article <2800BA22.1CAE@tct.com> chip@tct.com (Chip Salzenberg) writes:

>His patches for Smail, however, are *far* more than is required.
>Smail 3 already has a knob that controls mailbox locking specifically.
>In conf/EDITME, and in conf/os/<your-os-here>, do *not* set
>FLOCK_MAILBOX.  That's it.  (See the comments in conf/os/sys5* for
>further description of of FLOCK_MAILBOX.)

You should be aware though that the smail3 mailbox locking code is
not particularly robust when using lockfiles.  The usual problem
of removing a different lockfile than you tested is made fairly
likely in cases where the file being tested actually belonged to
an exiting smail.  Try setting smail's delivery mode to background
and arrange to have several messages delivered while you hit the
'$' in elm to re-sync, and you will have a pretty good change of
losing messages.  I tried to fix this by stat'ing the lockfile and
refusing to delete it unless it is fairly old, but I'm still not
sure it is perfect.
Failure scenario:
  Process A owns lockfile - Processes B & C are contending for one.
  B reads A's PID from lockfile.
  A finishes, removes lockfile and exits.
  B sends signal 0 to A's PID, notes that process is gone.
  C notes that no lockfile is present and creates one.
  B removes lockfile (now belonging to C) and creates one.
  At this point both B and C think they have exclusive access to
  the mailbox.
Note that the trick of sleep()ing a few seconds after deleting a
stale file does not help at all in this scenario, since the exiting
program deleted it's own file.  I recently saw HDB uucico's debug
output say something like:
mlock  tty64 succeeded
failed to lock device tty64
indicating that the lockfile code had in fact failed to keep two
processes from accessing the same device concurrently, but the
kernel lock succeeded.  Thus anyone who thinks the HDB lockfile 
locking code is foolproof is mistaken.

Les Mikesell
  les@chinet.chi.il.us

tron@Veritas.COM (Ronald S. Karr) (04/11/91)

In article <1991Apr09.160512.1300@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
>You should be aware though that the smail3 mailbox locking code is
>not particularly robust when using lockfiles.
>...
>Failure scenario:
>  Process A owns lockfile - Processes B & C are contending for one.
>  B reads A's PID from lockfile.
>  A finishes, removes lockfile and exits.
>  B sends signal 0 to A's PID, notes that process is gone.
>  C notes that no lockfile is present and creates one.
>  B removes lockfile (now belonging to C) and creates one.
>  At this point both B and C think they have exclusive access to
>  the mailbox.

However, both smail3 and elm create the files using the O_EXCL flag,
indicating that the OS should ensure that only one of the two creates
should succeed.

Thus, failure of the locking mechanism in the way that you describe
requires that the underlying operating system fail to obey the O_EXCL
semantics.

An important case to point out is that NFS does not obey the O_EXCL
semantics between two different client machines.  As such, mailbox locking
is broken on machines that do not use lockf(), which includes all sun
machines as far as I know.  The only solution is to either change smail
and all of the mail readers to use lockf(), or to never send and receive
mail from two clients sharing the same NFS /usr/mail directory.

Landon Noll and I (the two smail3 authors) carefully considered this
problem some years ago and determined that the only solution was for
sun to admit that there was a problem and for them to start using
lockf(), just like 4.3bsd does.  However, the people that we talked to
didn't believe us, always mounted their mailbox directory over NFS and
had never seen the problem.  Of course, the reason people almost never
see the problem is that the window is extremely small given that an
average mailbox receives only a few messages (up to 100) per day, and
given that the window of vulnerability is typically sub-second.
-- 
	tron |-<=>-|		ARPAnet:  veritas!tron@apple.com
      tron@veritas.com		UUCPnet:  {amdahl,apple,pyramid}!veritas!tron

les@chinet.chi.il.us (Leslie Mikesell) (04/12/91)

In article <1991Apr11.024849.29924@Veritas.COM> tron@Veritas.COM (Ronald S. Karr) writes:
>>...
>>Failure scenario:
>>  Process A owns lockfile - Processes B & C are contending for one.
>>  B reads A's PID from lockfile.
>>  A finishes, removes lockfile and exits.
>>  B sends signal 0 to A's PID, notes that process is gone.
>>  C notes that no lockfile is present and creates one.
>>  B removes lockfile (now belonging to C) and creates one.
>>  At this point both B and C think they have exclusive access to
>>  the mailbox.

>However, both smail3 and elm create the files using the O_EXCL flag,
>indicating that the OS should ensure that only one of the two creates
>should succeed.

>Thus, failure of the locking mechanism in the way that you describe
>requires that the underlying operating system fail to obey the O_EXCL
>semantics.

No, look at the scenario again.  The real problem is that the process
that decides a lockfile is stale has no way to tell that the file
that it unlink()'s is the same one that it tested.  In the scenario
above, A has removed its own lockfile (which wasn't really stale but
looked that way because the process exited before B's signal arrived)
before C creates its lockfile with no apparent conflict.  Then
B's unlink() removes the one that C created, so O_EXCL doesn't come
into play at all.  There are lots of other ways this can happen but
I suspect this is the most likely.

>An important case to point out is that NFS does not obey the O_EXCL
>semantics between two different client machines.  As such, mailbox locking
>is broken on machines that do not use lockf(), which includes all sun
>machines as far as I know.  The only solution is to either change smail
>and all of the mail readers to use lockf(), or to never send and receive
>mail from two clients sharing the same NFS /usr/mail directory.

Personally, I think the "right" solution is to deliver new mail into
individual files per message with a naming convention so incomplete
temporary files could be ignored by the mail reader.  There would be
several other advantages as well:
 You could require the destination directory to already exist, which
 would allow you to detect an unmounted NFS/RFS mount point and defer
 delivery instead of writing in a directory that will become hidden
 when the net comes back up.
 You could deliver to multiple recipients on the same machine with
 a simple link.  This would make mailing lists about as cheap as
 a newsgroup and easier to maintain since the disk space would
 automatically be released when the last recipient's copy is deleted.
This isn't likely to happen, of course, since the changes to the
user agents and the transports would have to be syncronized, but it
would not be too difficult to add this support to either one.  Ideally,
the reader would consolidate the new messages into some other format
if they are to be stored after reading.

>Of course, the reason people almost never
>see the problem is that the window is extremely small given that an
>average mailbox receives only a few messages (up to 100) per day, and
>given that the window of vulnerability is typically sub-second.

For the scenario I mentioned, you need at least three processes, so
using a delivery mode where the foreground process just queues the
file and a single daemon process handles all deliveries should make
it safe where O_EXCL works. But, since the removal of lockfile is
never safe I think it is worth testing its age anyway.  Unless something
has gone drastically wrong you should never need to remove another
process' lockfile.   

Les Mikesell
  les@chinet.chi.il.us

gemini@geminix.in-berlin.de (Uwe Doering) (04/13/91)

chip@tct.com (Chip Salzenberg) writes:

>According to gemini@geminix.in-berlin.de (Uwe Doering):
>>Following this patch, there is another one that shows how to convince
>>Smail 3.1.20 to use lock files instead of lockf() unter SysVr3. Maybe
>>this works for earlier Smail releases, too.
>
>The patches posted by Uwe for Elm are much appreciated.
>
>His patches for Smail, however, are *far* more than is required.
>Smail 3 already has a knob that controls mailbox locking specifically.
>In conf/EDITME, and in conf/os/<your-os-here>, do *not* set
>FLOCK_MAILBOX.  That's it.  (See the comments in conf/os/sys5* for
>further description of of FLOCK_MAILBOX.)

You're right. I read the comments in conf/os/template again and came
to the same conclusion. I checked conf/EDITME and conf/os/sys5.3 and
found out that by default, the FLOCK_MAILBOX define is _not_ set.
Obviously, for SysVr3, nothing needs to be changed in smail 3.1 to use
lock files for the mailbox (provided that one didn't define FLOCK_MAILBOX
deliberately). So all one has to do to use Smail 3.1 together with
Elm 2.3 is to patch Elm's lock() function. Right?

     Uwe
-- 
Uwe Doering  |  INET : gemini@geminix.in-berlin.de
Berlin       |----------------------------------------------------------------
Germany      |  UUCP : ...!unido!fub!geminix.in-berlin.de!gemini

time@ice.com (Tim Endres) (04/13/91)

In article <1991Apr11.024849.29924@Veritas.COM>, tron@Veritas.COM (Ronald S. Karr) writes:
> >Failure scenario:
> >  Process A owns lockfile - Processes B & C are contending for one.
> >  B reads A's PID from lockfile.
> >  A finishes, removes lockfile and exits.
> >  B sends signal 0 to A's PID, notes that process is gone.
> >  C notes that no lockfile is present and creates one.
> >  B removes lockfile (now belonging to C) and creates one.
> >  At this point both B and C think they have exclusive access to
> >  the mailbox.
> 
> However, both smail3 and elm create the files using the O_EXCL flag,
> indicating that the OS should ensure that only one of the two creates
> should succeed.

I do not understand this. If B is removing the lock file created by
C, how would the O_EXCL flag stop B from creating its lockfile?

tim.

-------------------------------------------------------------
Tim Endres                |  time@ice.com
ICE Engineering           |  uupsi!ice.com!time
8840 Main Street          |  Voice            FAX
Whitmore Lake MI. 48189   |  (313) 449 8288   (313) 449 9208

tron@Veritas.COM (Ronald S. Karr) (04/15/91)

In article <1991Apr11.181909.13503@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
>In article <1991Apr11.024849.29924@Veritas.COM> tron@Veritas.COM (Ronald S. Karr) writes:
>>However, both smail3 and elm create the files using the O_EXCL flag,
>>indicating that the OS should ensure that only one of the two creates
>>should succeed.
>
>No, look at the scenario again.  The real problem is that the process
>that decides a lockfile is stale has no way to tell that the file
>that it unlink()'s is the same one that it tested.

Oops.  Sorry, although I carefully checked through smail3's and elm's
locking code, I wasn't as careful with the message.  You are correct
that there is a three-process hand-shaking problem with the locking
protocol.  In general, using true inode-based locking done atomically
by the operating system, with automatic lock releases on close or
process kills, is a substantially better solution.  Barring that, locking
schemes using the link() system call generally work better than
conventions using just file creation and removal.

>Personally, I think the "right" solution is to deliver new mail into
>individual files per message with a naming convention so incomplete
>temporary files could be ignored by the mail reader.

Every modern OS has file locking capabilities which are entirely
sufficient for mail.  They are just unused for mail in UNIX OS's,
except in 4.3BSD and its derivatives.  Also, I believe that NFS lacks
a true atomic exclusive create, so NFS (but not RFS) has a window
of vulnerability that will break many (though not all) message file
naming conventions.

You are correct that a directory would solve the problem of partial
mail delivery before a crash.  The UNIX file model, with no layering of
conventions (e.g., transactions), is only reliable when files are
modified through a copy to a new file followed by a rename.  Depending
upon your file system and UNIX, even this may not be completely
reliable.

>							There would be
>several other advantages as well:
> You could require the destination directory to already exist, which
> would allow you to detect an unmounted NFS/RFS mount point and defer
> delivery instead of writing in a directory that will become hidden
> when the net comes back up.

There is no difference between directories and single files here, since
a mailer can just as easily check for existence of a file as existence
of a directory.  It is just a matter of coming up with the agreed upon
conventions to accomplish this task.

> You could deliver to multiple recipients on the same machine with
> a simple link.  This would make mailing lists about as cheap as
> a newsgroup and easier to maintain since the disk space would
> automatically be released when the last recipient's copy is deleted.

As with news, the relative efficiency of this solution is a function
of the average message size, the average number of recipients and the
overhead required for a file.  When I used mh some time ago, and stored
all of my mail in single files, I found the overhead to be far too
great to consider the solution viable.  You have to at least migrate
messages into groups stored in single files; otherwise, searches become
too slow, and the storage efficiency gets too low.

>For the scenario I mentioned, you need at least three processes, so
>using a delivery mode where the foreground process just queues the
>file and a single daemon process handles all deliveries should make
>it safe where O_EXCL works. But, since the removal of lockfile is
>never safe I think it is worth testing its age anyway.  Unless something
>has gone drastically wrong you should never need to remove another
>process' lockfile.

You are correct.  The smail3 locking strategy is probably broken in
this respect, and should introduce a wait based on file modification
time.

PS: I suppose I should stop reading news about now and start working on
my taxes.
-- 
	tron |-<=>-|		ARPAnet:  veritas!tron@apple.com
      tron@veritas.com		UUCPnet:  {amdahl,apple,pyramid}!veritas!tron

chip@tct.com (Chip Salzenberg) (04/15/91)

According to les@chinet.chi.il.us (Leslie Mikesell):
>Personally, I think the "right" solution is to deliver new mail into
>individual files per message ...
>This isn't likely to happen, of course, since the changes to the
>user agents and the transports would have to be syncronized...

The transport work is done.  Smail 3's "appendfile" transport driver
supports a "dir" attribute.  If this attribute is set to a directory
name, each delivered message will be put into a uniquely-named file in
the given directory.
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>

les@chinet.chi.il.us (Leslie Mikesell) (04/16/91)

In article <1991Apr15.085531.14547@Veritas.COM> tron@Veritas.COM (Ronald S. Karr) writes:

>>Personally, I think the "right" solution is to deliver new mail into
>>individual files per message with a naming convention so incomplete
>>temporary files could be ignored by the mail reader.

>Every modern OS has file locking capabilities which are entirely
>sufficient for mail.  They are just unused for mail in UNIX OS's,
>except in 4.3BSD and its derivatives.  Also, I believe that NFS lacks
>a true atomic exclusive create, so NFS (but not RFS) has a window
>of vulnerability that will break many (though not all) message file
>naming conventions.

Something like the time and inode number encoded into base64 should
work...  Of course you need to create the file under some other name
to get the inode number, but something based on the hostname and pid
should work for that.

>> You could require the destination directory to already exist, which
>> would allow you to detect an unmounted NFS/RFS mount point and defer
>> delivery instead of writing in a directory that will become hidden
>> when the net comes back up.

>There is no difference between directories and single files here, since
>a mailer can just as easily check for existence of a file as existence
>of a directory.  It is just a matter of coming up with the agreed upon
>conventions to accomplish this task.

Yes, I suppose you could force the UA to leave a 0 length file when the
mailbox is empty, but I was thinking of what currently happens if you
mount /usr/mail from a different machine and the other machine goes
down.  Creating /usr/mail/you succeeds anyway, but the file won't be
seen again after the network mount comes back.  Creating /usr/mail/you/tmp
would fail in a way that is easily detected.  Another side effect would
be that the mail user agent would not need any special permissions if
the directory belonged to the user.

>> You could deliver to multiple recipients on the same machine with
>> a simple link.  This would make mailing lists about as cheap as
>> a newsgroup and easier to maintain since the disk space would
>> automatically be released when the last recipient's copy is deleted.

>As with news, the relative efficiency of this solution is a function
>of the average message size, the average number of recipients and the
>overhead required for a file.  When I used mh some time ago, and stored
>all of my mail in single files, I found the overhead to be far too
>great to consider the solution viable.  You have to at least migrate
>messages into groups stored in single files; otherwise, searches become
>too slow, and the storage efficiency gets too low.

Ideally, the UA would gather up the new messages and consolidate the
ones you want to keep into a format better suited for long-term storage.
There should not be any contention for the files at this point.
For me, the most common action is to read new messages fairly frequently,
deleting most after reading and possibly printing them.  The
message-per-file model is a win when you want to delete one and leave
the rest alone.

Les Mikesell
  les@chinet.chi.il.us

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (04/18/91)

The usual technique for locking with a lock file, and storing the pid
in the file, has two problems.  One is the window such that two process
could find a dead lock file, and both could remove it and create their
own, with a race condition.  The other problem is that exclusive
creates do not work over NFS.  Solution follows.

Date:    22 Jan 91 07:41:25 GMT
From:    dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi)
Newsgroups: comp.lang.perl
Subject: Re: Locking files across NFS
References: <1991Jan15.203815.25561@uvaarpa.Virginia.EDU>
Sender:  cirrusl!news
Organization: Cirrus Logic Inc.

In <1991Jan15.203815.25561@uvaarpa.Virginia.EDU> worley@compass.com
(Dale Worley) writes:

>Beware that Sun's locking daemons don't always work correctly.

Sun's locking daemons have never worked correctly whenever I have tried
them.  I finally decided that it would be better to rely on the
standard reliable UNIX method:  create a lock file.  I used this
successfully for a while.  Then discovered with a shock that NFS has no
mechanism for ensuring exclusive creation of a file even if the O_EXCL
flag is given to open().  NFS does make symbolic links links correctly.
I think it may even make hard links correctly.  The following algorithm
assumes that hard links are correctly created atomically.

So the only reliable mechanism that exists to do file locking over NFS
is the following or its equivalent.  if you want reliable locking that
is reasonably immune to locks being held by dead processes, I see no
way of making this algorithm any simpler.

int get_a_lock()
{
     if (create(symlink called MUTEX that points anywhere) == failed) {
	die("serious problem -- can't create MUTEX");
     }
     /* reach here when gained exclusive access */
     attempts = 0;
     while (++attempts < SOME_LIMIT) {
	if (create(some unique temp file called $TMP) == succeeded) {
	   to $TEMP write our host name and pid;
	   break; /* done with while loop */
	} else {
	   sleep (a few seconds);
	}
      }
      if (attempts == SOME_LIMIT) {
	 die("serious problem -- can't create mutex");
      }
   try_again:
      {
        static int loop_breaker;
	if (++loop_breaker > SOME_OTHER_LIMIT) {
	   loop_breaker = 0;
	   unlink($TMP);
	   unlink(MUTEX);
	   return LOCK_ATTEMPT_FAILED; /* or die here */
	 }
      }
      if (create(link from $TMP to LOCK) == success) {
	 /* we have the lock!! */
	 unlink($TMP);  /* not needed, link is now LOCK */
	 unlink(MUTEX); /* not needed, done its work */
	 return GOT_A_LOCK;
      } 
      /* failed to create link;  see if it's a stray link */
      if (LOCK doesn't exist) {
	 unlink($TMP);
	 unlink(MUTEX);
	 die("serious problem, LOCK nonexistent but can't create");
      }
      if (read(contents of LOCK) == failed) {
	 unlink($TMP);
	 unlink(MUTEX);
	 die("serious problem, can't read existing LOCK");
      }
      lock_host = name of host read from LOCK;
      lock_pid = pid read from LOCK;
      if (lock_host is our current host) {
	 /* see if process still alive */
	 if (kill(pid, SIG_SEE_IF_IT'S_THERE) == ENO_SUCH_PROCESS) {
	    unlink(LOCK); /* must have been stray */
	    goto try_again;
	 } 
      }
      /* LOCK is already held by existing process on this host
      or is on some other host */
      return LOCK_ATTEMPT_FAILED;
}
--
Rahul Dhesi <dhesi@cirrus.COM>
UUCP:  oliveb!cirrusl!dhesi

berg@marvin.e17.physik.tu-muenchen.de (Stephen R. van den Berg) (04/18/91)

In article <3064@cirrusl.UUCP> Rahul Dhesi <dhesi@cirrus.COM> writes:
>Sun's locking daemons have never worked correctly whenever I have tried
>them.  I finally decided that it would be better to rely on the
>standard reliable UNIX method:  create a lock file.  I used this
>successfully for a while.  Then discovered with a shock that NFS has no
>mechanism for ensuring exclusive creation of a file even if the O_EXCL
>flag is given to open().  NFS does make symbolic links links correctly.
>I think it may even make hard links correctly.  The following algorithm
>assumes that hard links are correctly created atomically.

If hardlinks are assumed to be made correctly (which I suspect they are
indeed, why shouldn't they?), why are you then going through all this
trouble with the symlink (MUTEX)?   Why not use one hardlink to the lockfile?
--
Sincerely,                 berg@marvin.e17.physik.tu-muenchen.de
           Stephen R. van den Berg.
"I code it in 5 min, optimize it in 90 min, because it's so well optimized:
it runs in only 5 min.  Actually, most of the time I optimize programs."