[comp.mail.elm] Will ELM ever use lockf

dean@coplex.uucp (Dean Brooks) (03/26/91)

Hello,

   We have been running ELM for several years now, and are currently
at the most recent patchlevel (2.3 PL11).  However, we are having
some serious locking problems with elm.

   We are running, what is in my opinion, a very standard SYSV-3.2
system using Smail3.1.19 as our transport agent.  Eberything has
been configured correctly in ELM and Smail, but there are some
REAL discrepencies on how to do mailbox locking.

   Elm wants to use "flock()" or ".lock" files.  Smail wants to
use "flock()", "lockf()" or its own weird type of lock files.

   Now, we do not have "flock()" on our system (and many other V3.2
systems).  Since smail doesn't recognize elm's locking, and vice
versa, I have a REAL nasty problem.

   If elm were to support "lockf()" all my problems would be gone.  Are
there any hopes in the future of this being included in the elm source?

				Losing mail left and right,
					Dean Brooks

P.S.  If anyone has placed a hack for this in elm, please sendt me a
	copy if you would; I am desparate, and dont feel like hacking
	on elm much...

--
dean@coplex.UUCP   Dean A. Brooks
                   Copper Electronics, Inc.
                   Louisville, Ky
UUCP: !uunet!coplex!dean

sfreed@ariel.unm.edu (Steven Freed CIRT) (03/26/91)

In article <1991Mar25.193741.12360@coplex.uucp>, dean@coplex.uucp (Dean Brooks) writes:
->    Now, we do not have "flock()" on our system (and many other V3.2
-> systems).  Since smail doesn't recognize elm's locking, and vice
-> versa, I have a REAL nasty problem.
-> 
->    If elm were to support "lockf()" all my problems would be gone.  Are
-> there any hopes in the future of this being included in the elm source?

We have talked about this here at UNM but we have yet to take any serious
action on it. The reason we are interested in doing something is because of
the locking problems with NFS. Lockf() is the *only* way to guarantee
file locking with NFS. I can point out several scenarios where lock files
will fail. To make the change means that everything that writes any mailboxes
on the system including /bin/mail must use lockf().

Right now, mail is collected on each machine, but we would like to centralize
it to one partition on one machine. To do this either means hacking lockf()
into everything including elm or hack POP into elm. Both options are being
considered.

I would be interesed to hear any thoughts on this.


-- 

Steve.                    sfreed@ariel.unm.edu

k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (03/26/91)

sfreed@ariel.unm.edu (Steven Freed CIRT) writes:


>In article <1991Mar25.193741.12360@coplex.uucp>, dean@coplex.uucp (Dean Brooks) writes:
>->    Now, we do not have "flock()" on our system (and many other V3.2
>-> systems).  Since smail doesn't recognize elm's locking, and vice
>-> versa, I have a REAL nasty problem.
>-> 
>->    If elm were to support "lockf()" all my problems would be gone.  Are
>-> there any hopes in the future of this being included in the elm source?

>We have talked about this here at UNM but we have yet to take any serious
>action on it. The reason we are interested in doing something is because of
>the locking problems with NFS. Lockf() is the *only* way to guarantee
>file locking with NFS. I can point out several scenarios where lock files
>will fail. To make the change means that everything that writes any mailboxes
>on the system including /bin/mail must use lockf().
Sure, but a good system implementation (like RiscOS) emulates flock through
lockf.

>Right now, mail is collected on each machine, but we would like to centralize
>it to one partition on one machine. To do this either means hacking lockf()
>into everything including elm or hack POP into elm. Both options are being
>considered.

I'm very interested in this topic, because we have some machines, which
have lockf but no flock (arghhhh!), and we want to mount their /usr/mail
directory. Our other machines are using flock, but they have the flock
implemented through lockf according to the manual, so NFS will be no problem
on them. POP will also be a solution, it has the advantage, that our PC
users will be also happy.

Sincerely,
Klaus Steinberger

--
Klaus Steinberger               Beschleunigerlabor der TU und LMU Muenchen
Phone: (+49 89)3209 4287        Hochschulgelaende
FAX:   (+49 89)3209 4280        D-8046 Garching, Germany
BITNET: K2@DGABLG5P             Internet: k2@bl.physik.tu-muenchen.de

phil@ls.com (Phil Eschallier) (03/26/91)

In article <1991Mar25.193741.12360@coplex.uucp> dean@coplex.uucp (Dean Brooks) writes:
>Hello,
>
>   Elm wants to use "flock()" or ".lock" files.  Smail wants to
>use "flock()", "lockf()" or its own weird type of lock files.
>
>   Now, we do not have "flock()" on our system (and many other V3.2
>systems).  Since smail doesn't recognize elm's locking, and vice
>versa, I have a REAL nasty problem.
>
>   If elm were to support "lockf()" all my problems would be gone.  Are
>there any hopes in the future of this being included in the elm source?
>

	correct me if i am wrong but this seems like a trivial problem --
	there are several places to get code that emulates flock() using
	lockf().  the first that comes to mind is the support directory
	in the distribution of sendmail 5.65 + IDA ... the code is not
	public domain but it is freely distributed.  if using this would
	break some law that i am not aware of, i'll write a 10 line procedure
	to do the equivilent.

	then -- couldn't you just tell elm's configuration that you will
	use flock() and link in this procedure ??


-- 
Phil Eschallier     |  E-Mail to:                    US Mail to:
                    |   INET: phil@ls.com             248B Union Street
Lagniappe Systems   |   UUCP: ...!uunet!lgnp1!phil    Doylestown, PA  18901
Computer Services   |    CIS: 71076,1576              VOICE: +1 215 348 9721

chip@tct.uucp (Chip Salzenberg) (03/28/91)

According to dean@coplex.uucp (Dean Brooks):
>   Elm wants to use "flock()" or ".lock" files.  Smail wants to
>use "flock()", "lockf()" or its own weird type of lock files.

Check the Smail source code -- specifically, sysdep.c.  Smail does use
".lock" files, just like Elm.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
   "All this is conjecture of course, since I *only* post in the nude.
    Nothing comes between me and my t.b.  Nothing."   -- Bill Coderre

andy@jhunix.HCF.JHU.EDU (Andy S Poling) (03/30/91)

In article <5062@lgnp1.ls.com> phil@ls.com (Phil Eschallier) writes:
>In article <1991Mar25.193741.12360@coplex.uucp> dean@coplex.uucp (Dean Brooks) writes:
[...]
>>   If elm were to support "lockf()" all my problems would be gone.  Are
>>there any hopes in the future of this being included in the elm source?
>>
>
>	correct me if i am wrong but this seems like a trivial problem --
>	there are several places to get code that emulates flock() using
>	lockf().  
[..]
>	then -- couldn't you just tell elm's configuration that you will
>	use flock() and link in this procedure ??

Be careful - lockf requires that a file be opened for both read and write.
Really should use the fcntl(F_SETLK), etc. (which is what most lockf()s use
anyway...).

IMHO Elm should offer lockf() as one of the locking options - that's what
our SYSV local mailer uses.  It ain't gonna do much good to lock the mailbox
if the mailer and Elm are usin' different locking methods...

-Andy
--
Andy Poling                              Internet: andy@gollum.hcf.jhu.edu
UNIX Systems Programmer                  Bitnet: ANDY@JHUNIX
Homewood Academic Computing              Voice: (301)338-8096    
Johns Hopkins University                 UUCP: uunet!mimsy!aplcen!jhunix!andy

dean@coplex.uucp (Dean Brooks) (03/31/91)

chip@tct.uucp (Chip Salzenberg) writes:

>According to dean@coplex.uucp (Dean Brooks):
>>   Elm wants to use "flock()" or ".lock" files.  Smail wants to
>>use "flock()", "lockf()" or its own weird type of lock files.

>Check the Smail source code -- specifically, sysdep.c.  Smail does use
>".lock" files, just like Elm.

Yeah, but ".lock" files dont hack hack it for 20 messages all trying
to arrive in your mailbox at the same time.  The window of
vulnerability seems quite small, but after trying tests with smail3.1
and elm2.3.11, it dropped about 3 messages out of 20 into oblivion.

I basically ran 20 smail's at the same time, with the basic command
of "smail dean < /etc/termcap".  They all started immediately into the
background; on the average, three of the 20 never ever made it to my
mailbox.

Of course, this symptom probably only occurs when mail is received in
large quantities on a fast system.  

Regardless, I hacked the elm code to use "lockf()" (quite an easy
task, after I found the syntax of the flock() command), and
everything worked perfectly on our SYSV system.

					Thanks... Dean
--
dean@coplex.UUCP   Dean A. Brooks
                   Copper Electronics, Inc.
                   Louisville, Ky
UUCP: !uunet!coplex!dean

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (04/03/91)

In <1991Mar26.055600.27712@ariel.unm.edu> sfreed@ariel.unm.edu (Steven
Freed CIRT) writes:

>The reason we are interested in doing something is because of
>the locking problems with NFS. Lockf() is the *only* way to guarantee
>file locking with NFS. I can point out several scenarios where lock files
>will fail. To make the change means that everything that writes any mailboxes
>on the system including /bin/mail must use lockf().

It turns out that lockf() uses a lock daemon and, under SunOS at least,
this doesn't always work.  Lock files, on the other hand, are
guaranteed to work if you do them right.  Copy of previous Usenet
posting follows.

======
Date:    22 Jan 91 07:41:25 GMT
From:    dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi)
Newsgroups: comp.lang.perl
Subject: Re: Locking files across NFS
References: <1991Jan15.203815.25561@uvaarpa.Virginia.EDU>
Sender:  cirrusl!news
Organization: Cirrus Logic Inc.

In <1991Jan15.203815.25561@uvaarpa.Virginia.EDU> worley@compass.com
(Dale Worley) writes:

>Beware that Sun's locking daemons don't always work correctly.

Sun's locking daemons have never worked correctly whenever I have tried
them.  I finally decided that it would be better to rely on the
standard reliable UNIX method:  create a lock file.  I used this
successfully for a while.  Then discovered with a shock that NFS has no
mechanism for ensuring exclusive creation of a file even if the O_EXCL
flag is given to open().  NFS does make symbolic links links correctly.
I think it may even make hard links correctly.  The following algorithm
assumes that hard links are correctly created atomically.

So the only reliable mechanism that exists to do file locking over NFS
is the following or its equivalent.  if you want reliable locking that
is reasonably immune to locks being held by dead processes, I see no
way of making this algorithm any simpler.

int get_a_lock()
{
     if (create(symlink called MUTEX that points anywhere) == failed) {
	die("serious problem -- can't create MUTEX");
     }
     /* reach here when gained exclusive access */
     attempts = 0;
     while (++attempts < SOME_LIMIT) {
	if (create(some unique temp file called $TMP) == succeeded) {
	   to $TEMP write our host name and pid;
	   break; /* done with while loop */
	} else {
	   sleep (a few seconds);
	}
      }
      if (attempts == SOME_LIMIT) {
	 die("serious problem -- can't create mutex");
      }
   try_again:
      {
        static int loop_breaker;
	if (++loop_breaker > SOME_OTHER_LIMIT) {
	   loop_breaker = 0;
	   unlink($TMP);
	   unlink(MUTEX);
	   return LOCK_ATTEMPT_FAILED; /* or die here */
	 }
      }
      if (create(link from $TMP to LOCK) == success) {
	 /* we have the lock!! */
	 unlink($TMP);  /* not needed, link is now LOCK */
	 unlink(MUTEX); /* not needed, done its work */
	 return GOT_A_LOCK;
      } 
      /* failed to create link;  see if it's a stray link */
      if (LOCK doesn't exist) {
	 unlink($TMP);
	 unlink(MUTEX);
	 die("serious problem, LOCK nonexistent but can't create");
      }
      if (read(contents of LOCK) == failed) {
	 unlink($TMP);
	 unlink(MUTEX);
	 die("serious problem, can't read existing LOCK");
      }
      lock_host = name of host read from LOCK;
      lock_pid = pid read from LOCK;
      if (lock_host is our current host) {
	 /* see if process still alive */
	 if (kill(pid, SIG_SEE_IF_IT'S_THERE) == ENO_SUCH_PROCESS) {
	    unlink(LOCK); /* must have been stray */
	    goto try_again;
	 } 
      }
      /* LOCK is already held by existing process on this host
      or is on some other host */
      return LOCK_ATTEMPT_FAILED;
}
--
Rahul Dhesi <dhesi@cirrus.COM>
UUCP:  oliveb!cirrusl!dhesi

chip@tct.com (Chip Salzenberg) (04/04/91)

According to dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi):
>It turns out that lockf() uses a lock daemon and, under SunOS at least,
>this doesn't always work.

Okay, so SunOS is broken.  This is news?  :-(

>NFS does make symbolic links links correctly.

It makes them correctly, but not reliably.  Network lossage and/or
resource exhaustion can cause a successful symlink() call to return
failure.

My advice?  Use lockf() and avoid SunOS.
-- 
Chip Salzenberg                    <chip@tct.com>, <uunet!pdn!tct!chip>
  Brand X Industries Custodial, Refurbishing and Containment Service
            When You Never, Ever Want To See It Again [tm]

chip@tct.com (Chip Salzenberg) (04/04/91)

According to dean@coplex.uucp (Dean Brooks):
>chip@tct.uucp (Chip Salzenberg) writes:
>>Check the Smail source code -- specifically, sysdep.c.  Smail does use
>>".lock" files, just like Elm.
>
>Yeah, but ".lock" files dont hack hack it for 20 messages all trying
>to arrive in your mailbox at the same time.

Correspondence with Dean reveals that Smail seems to work okay, and
that it's Elm that's dropping the ball.

Of course, there is no such thing as reliable elimination of a stale
lock file under UNIX: how do you know that you're removing the stale
one?  So lockf() is really the best solution.
-- 
Chip Salzenberg                    <chip@tct.com>, <uunet!pdn!tct!chip>
  Brand X Industries Custodial, Refurbishing and Containment Service
            When You Never, Ever Want To See It Again [tm]

gemini@geminix.in-berlin.de (Uwe Doering) (04/05/91)

chip@tct.com (Chip Salzenberg) writes:

>According to dean@coplex.uucp (Dean Brooks):
>>chip@tct.uucp (Chip Salzenberg) writes:
>>>Check the Smail source code -- specifically, sysdep.c.  Smail does use
>>>".lock" files, just like Elm.
>>
>>Yeah, but ".lock" files dont hack hack it for 20 messages all trying
>>to arrive in your mailbox at the same time.
>
>Correspondence with Dean reveals that Smail seems to work okay, and
>that it's Elm that's dropping the ball.
>
>Of course, there is no such thing as reliable elimination of a stale
>lock file under UNIX: how do you know that you're removing the stale
>one?  So lockf() is really the best solution.

You may be right that lockf() is the best solution. However, in this
case there is simply a bug in Elm's lock() function. It happily blows
away every lock file that has a PID of another process in it, regardless
of whether this process is still alive or not. Took a while until I
found this out. Anyway, below there is the necessary patch that
corrects this problem. Following this patch, there is another one that
shows how to convince Smail 3.1.20 to use lock files instead of
lockf() unter SysVr3. Maybe this works for earlier Smail releases, too.

Have fun.

     Uwe

Here's the Elm 2.3 PL11 patch for $ELMSRC/src/leavembox.c:

---------------------------- cut here ------------------------------
*** leavembox.c.00	Sat Mar 30 04:01:47 1991
--- leavembox.c	Sat Mar 30 04:01:58 1991
***************
*** 704,711 ****
  	if (read(create_fd, pid_buffer, SHORT) > 0) {
  	  create_iteration = atoi(pid_buffer);
  	  if (create_iteration) {
! 	    if (kill(create_iteration, 0)) {
! 	      close(create_fd);
  	      if (unlink(lock_name) != 0) {
  		dprint(1, (debugfile,
  		  "Error %s (%s)\n\ttrying to unlink file %s (%s)\n", 
--- 704,710 ----
  	if (read(create_fd, pid_buffer, SHORT) > 0) {
  	  create_iteration = atoi(pid_buffer);
  	  if (create_iteration) {
! 	    if (kill(create_iteration, 0) && errno == ESRCH) {
  	      if (unlink(lock_name) != 0) {
  		dprint(1, (debugfile,
  		  "Error %s (%s)\n\ttrying to unlink file %s (%s)\n", 
***************
*** 714,719 ****
--- 713,719 ----
  		  "\n\rCouldn't remove the current lock file %s\n\r", lock_name);
  		PutLine2(LINES, 0, "** %s - %s **\n\r", error_name(errno),
  		  error_description(errno));
+ 		close(create_fd);
  		if (direction == INCOMING)
  		  leave();
  		else
***************
*** 722,727 ****
--- 722,728 ----
  	    }
  	  }
  	}
+ 	close(create_fd);
  	create_iteration = 0;
        }
  #endif
---------------------------- cut here ------------------------------

And this is the Smail 3.1.20 patch for file $SMAILSRC/conf/os/sys5.3:

---------------------------- cut here ------------------------------
*** sys5.3.00	Sat Feb 23 11:18:59 1991
--- sys5.3	Thu Mar 28 11:51:19 1991
***************
*** 21,37 ****
  OSNAMES=UNIX_SYS5_3:UNIX_SYS5:UNIX
  
  # LOCKING_PROTOCOL - macros for efficient file locking
! LOCKING_PROTOCOL="\
! #include <unistd.h>
! #define LOCK_REQUIRES_WRITE
! #define lock_fd(fd)		(lockf((fd), F_TLOCK, 0L) < 0? FAIL: SUCCEED)
! #define lock_fd_wait(fd)	(lockf((fd), F_LOCK, 0L) < 0? FAIL: SUCCEED)
! #define unlock_fd(fd)		((void) lockf((fd), F_ULOCK, 0L))
! #define unlock_fd_wait(fd)	((void) lockf((fd), F_ULOCK, 0L))
! #define USE_FCNTL_RD_LOCK
! #define lock_fd_rd_wait(fd)	(fcntl_rd_lock(fd))
! extern int fcntl_rd_lock();
! "
  
  # MAILBOX_DIR - in which directory are user mailbox files
  MAILBOX_DIR=/usr/mail
--- 21,28 ----
  OSNAMES=UNIX_SYS5_3:UNIX_SYS5:UNIX
  
  # LOCKING_PROTOCOL - macros for efficient file locking
! LOCKING_PROTOCOL=
! LOCK_BY_NAME=TRUE	# compatible with ELM
  
  # MAILBOX_DIR - in which directory are user mailbox files
  MAILBOX_DIR=/usr/mail
---------------------------- cut here ------------------------------
-- 
Uwe Doering  |  INET : gemini@geminix.in-berlin.de
Berlin       |----------------------------------------------------------------
Germany      |  UUCP : ...!unido!fub!geminix.in-berlin.de!gemini

chip@tct.com (Chip Salzenberg) (04/09/91)

According to gemini@geminix.in-berlin.de (Uwe Doering):
>Following this patch, there is another one that shows how to convince
>Smail 3.1.20 to use lock files instead of lockf() unter SysVr3. Maybe
>this works for earlier Smail releases, too.

The patches posted by Uwe for Elm are much appreciated.

His patches for Smail, however, are *far* more than is required.
Smail 3 already has a knob that controls mailbox locking specifically.
In conf/EDITME, and in conf/os/<your-os-here>, do *not* set
FLOCK_MAILBOX.  That's it.  (See the comments in conf/os/sys5* for
further description of of FLOCK_MAILBOX.)
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>