dean@coplex.uucp (Dean Brooks) (03/26/91)
Hello, We have been running ELM for several years now, and are currently at the most recent patchlevel (2.3 PL11). However, we are having some serious locking problems with elm. We are running, what is in my opinion, a very standard SYSV-3.2 system using Smail3.1.19 as our transport agent. Eberything has been configured correctly in ELM and Smail, but there are some REAL discrepencies on how to do mailbox locking. Elm wants to use "flock()" or ".lock" files. Smail wants to use "flock()", "lockf()" or its own weird type of lock files. Now, we do not have "flock()" on our system (and many other V3.2 systems). Since smail doesn't recognize elm's locking, and vice versa, I have a REAL nasty problem. If elm were to support "lockf()" all my problems would be gone. Are there any hopes in the future of this being included in the elm source? Losing mail left and right, Dean Brooks P.S. If anyone has placed a hack for this in elm, please sendt me a copy if you would; I am desparate, and dont feel like hacking on elm much... -- dean@coplex.UUCP Dean A. Brooks Copper Electronics, Inc. Louisville, Ky UUCP: !uunet!coplex!dean
sfreed@ariel.unm.edu (Steven Freed CIRT) (03/26/91)
In article <1991Mar25.193741.12360@coplex.uucp>, dean@coplex.uucp (Dean Brooks) writes:
-> Now, we do not have "flock()" on our system (and many other V3.2
-> systems). Since smail doesn't recognize elm's locking, and vice
-> versa, I have a REAL nasty problem.
->
-> If elm were to support "lockf()" all my problems would be gone. Are
-> there any hopes in the future of this being included in the elm source?
We have talked about this here at UNM but we have yet to take any serious
action on it. The reason we are interested in doing something is because of
the locking problems with NFS. Lockf() is the *only* way to guarantee
file locking with NFS. I can point out several scenarios where lock files
will fail. To make the change means that everything that writes any mailboxes
on the system including /bin/mail must use lockf().
Right now, mail is collected on each machine, but we would like to centralize
it to one partition on one machine. To do this either means hacking lockf()
into everything including elm or hack POP into elm. Both options are being
considered.
I would be interesed to hear any thoughts on this.
--
Steve. sfreed@ariel.unm.edu
k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (03/26/91)
sfreed@ariel.unm.edu (Steven Freed CIRT) writes: >In article <1991Mar25.193741.12360@coplex.uucp>, dean@coplex.uucp (Dean Brooks) writes: >-> Now, we do not have "flock()" on our system (and many other V3.2 >-> systems). Since smail doesn't recognize elm's locking, and vice >-> versa, I have a REAL nasty problem. >-> >-> If elm were to support "lockf()" all my problems would be gone. Are >-> there any hopes in the future of this being included in the elm source? >We have talked about this here at UNM but we have yet to take any serious >action on it. The reason we are interested in doing something is because of >the locking problems with NFS. Lockf() is the *only* way to guarantee >file locking with NFS. I can point out several scenarios where lock files >will fail. To make the change means that everything that writes any mailboxes >on the system including /bin/mail must use lockf(). Sure, but a good system implementation (like RiscOS) emulates flock through lockf. >Right now, mail is collected on each machine, but we would like to centralize >it to one partition on one machine. To do this either means hacking lockf() >into everything including elm or hack POP into elm. Both options are being >considered. I'm very interested in this topic, because we have some machines, which have lockf but no flock (arghhhh!), and we want to mount their /usr/mail directory. Our other machines are using flock, but they have the flock implemented through lockf according to the manual, so NFS will be no problem on them. POP will also be a solution, it has the advantage, that our PC users will be also happy. Sincerely, Klaus Steinberger -- Klaus Steinberger Beschleunigerlabor der TU und LMU Muenchen Phone: (+49 89)3209 4287 Hochschulgelaende FAX: (+49 89)3209 4280 D-8046 Garching, Germany BITNET: K2@DGABLG5P Internet: k2@bl.physik.tu-muenchen.de
phil@ls.com (Phil Eschallier) (03/26/91)
In article <1991Mar25.193741.12360@coplex.uucp> dean@coplex.uucp (Dean Brooks) writes: >Hello, > > Elm wants to use "flock()" or ".lock" files. Smail wants to >use "flock()", "lockf()" or its own weird type of lock files. > > Now, we do not have "flock()" on our system (and many other V3.2 >systems). Since smail doesn't recognize elm's locking, and vice >versa, I have a REAL nasty problem. > > If elm were to support "lockf()" all my problems would be gone. Are >there any hopes in the future of this being included in the elm source? > correct me if i am wrong but this seems like a trivial problem -- there are several places to get code that emulates flock() using lockf(). the first that comes to mind is the support directory in the distribution of sendmail 5.65 + IDA ... the code is not public domain but it is freely distributed. if using this would break some law that i am not aware of, i'll write a 10 line procedure to do the equivilent. then -- couldn't you just tell elm's configuration that you will use flock() and link in this procedure ?? -- Phil Eschallier | E-Mail to: US Mail to: | INET: phil@ls.com 248B Union Street Lagniappe Systems | UUCP: ...!uunet!lgnp1!phil Doylestown, PA 18901 Computer Services | CIS: 71076,1576 VOICE: +1 215 348 9721
chip@tct.uucp (Chip Salzenberg) (03/28/91)
According to dean@coplex.uucp (Dean Brooks): > Elm wants to use "flock()" or ".lock" files. Smail wants to >use "flock()", "lockf()" or its own weird type of lock files. Check the Smail source code -- specifically, sysdep.c. Smail does use ".lock" files, just like Elm. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "All this is conjecture of course, since I *only* post in the nude. Nothing comes between me and my t.b. Nothing." -- Bill Coderre
andy@jhunix.HCF.JHU.EDU (Andy S Poling) (03/30/91)
In article <5062@lgnp1.ls.com> phil@ls.com (Phil Eschallier) writes: >In article <1991Mar25.193741.12360@coplex.uucp> dean@coplex.uucp (Dean Brooks) writes: [...] >> If elm were to support "lockf()" all my problems would be gone. Are >>there any hopes in the future of this being included in the elm source? >> > > correct me if i am wrong but this seems like a trivial problem -- > there are several places to get code that emulates flock() using > lockf(). [..] > then -- couldn't you just tell elm's configuration that you will > use flock() and link in this procedure ?? Be careful - lockf requires that a file be opened for both read and write. Really should use the fcntl(F_SETLK), etc. (which is what most lockf()s use anyway...). IMHO Elm should offer lockf() as one of the locking options - that's what our SYSV local mailer uses. It ain't gonna do much good to lock the mailbox if the mailer and Elm are usin' different locking methods... -Andy -- Andy Poling Internet: andy@gollum.hcf.jhu.edu UNIX Systems Programmer Bitnet: ANDY@JHUNIX Homewood Academic Computing Voice: (301)338-8096 Johns Hopkins University UUCP: uunet!mimsy!aplcen!jhunix!andy
dean@coplex.uucp (Dean Brooks) (03/31/91)
chip@tct.uucp (Chip Salzenberg) writes: >According to dean@coplex.uucp (Dean Brooks): >> Elm wants to use "flock()" or ".lock" files. Smail wants to >>use "flock()", "lockf()" or its own weird type of lock files. >Check the Smail source code -- specifically, sysdep.c. Smail does use >".lock" files, just like Elm. Yeah, but ".lock" files dont hack hack it for 20 messages all trying to arrive in your mailbox at the same time. The window of vulnerability seems quite small, but after trying tests with smail3.1 and elm2.3.11, it dropped about 3 messages out of 20 into oblivion. I basically ran 20 smail's at the same time, with the basic command of "smail dean < /etc/termcap". They all started immediately into the background; on the average, three of the 20 never ever made it to my mailbox. Of course, this symptom probably only occurs when mail is received in large quantities on a fast system. Regardless, I hacked the elm code to use "lockf()" (quite an easy task, after I found the syntax of the flock() command), and everything worked perfectly on our SYSV system. Thanks... Dean -- dean@coplex.UUCP Dean A. Brooks Copper Electronics, Inc. Louisville, Ky UUCP: !uunet!coplex!dean
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (04/03/91)
In <1991Mar26.055600.27712@ariel.unm.edu> sfreed@ariel.unm.edu (Steven Freed CIRT) writes: >The reason we are interested in doing something is because of >the locking problems with NFS. Lockf() is the *only* way to guarantee >file locking with NFS. I can point out several scenarios where lock files >will fail. To make the change means that everything that writes any mailboxes >on the system including /bin/mail must use lockf(). It turns out that lockf() uses a lock daemon and, under SunOS at least, this doesn't always work. Lock files, on the other hand, are guaranteed to work if you do them right. Copy of previous Usenet posting follows. ====== Date: 22 Jan 91 07:41:25 GMT From: dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) Newsgroups: comp.lang.perl Subject: Re: Locking files across NFS References: <1991Jan15.203815.25561@uvaarpa.Virginia.EDU> Sender: cirrusl!news Organization: Cirrus Logic Inc. In <1991Jan15.203815.25561@uvaarpa.Virginia.EDU> worley@compass.com (Dale Worley) writes: >Beware that Sun's locking daemons don't always work correctly. Sun's locking daemons have never worked correctly whenever I have tried them. I finally decided that it would be better to rely on the standard reliable UNIX method: create a lock file. I used this successfully for a while. Then discovered with a shock that NFS has no mechanism for ensuring exclusive creation of a file even if the O_EXCL flag is given to open(). NFS does make symbolic links links correctly. I think it may even make hard links correctly. The following algorithm assumes that hard links are correctly created atomically. So the only reliable mechanism that exists to do file locking over NFS is the following or its equivalent. if you want reliable locking that is reasonably immune to locks being held by dead processes, I see no way of making this algorithm any simpler. int get_a_lock() { if (create(symlink called MUTEX that points anywhere) == failed) { die("serious problem -- can't create MUTEX"); } /* reach here when gained exclusive access */ attempts = 0; while (++attempts < SOME_LIMIT) { if (create(some unique temp file called $TMP) == succeeded) { to $TEMP write our host name and pid; break; /* done with while loop */ } else { sleep (a few seconds); } } if (attempts == SOME_LIMIT) { die("serious problem -- can't create mutex"); } try_again: { static int loop_breaker; if (++loop_breaker > SOME_OTHER_LIMIT) { loop_breaker = 0; unlink($TMP); unlink(MUTEX); return LOCK_ATTEMPT_FAILED; /* or die here */ } } if (create(link from $TMP to LOCK) == success) { /* we have the lock!! */ unlink($TMP); /* not needed, link is now LOCK */ unlink(MUTEX); /* not needed, done its work */ return GOT_A_LOCK; } /* failed to create link; see if it's a stray link */ if (LOCK doesn't exist) { unlink($TMP); unlink(MUTEX); die("serious problem, LOCK nonexistent but can't create"); } if (read(contents of LOCK) == failed) { unlink($TMP); unlink(MUTEX); die("serious problem, can't read existing LOCK"); } lock_host = name of host read from LOCK; lock_pid = pid read from LOCK; if (lock_host is our current host) { /* see if process still alive */ if (kill(pid, SIG_SEE_IF_IT'S_THERE) == ENO_SUCH_PROCESS) { unlink(LOCK); /* must have been stray */ goto try_again; } } /* LOCK is already held by existing process on this host or is on some other host */ return LOCK_ATTEMPT_FAILED; } -- Rahul Dhesi <dhesi@cirrus.COM> UUCP: oliveb!cirrusl!dhesi
chip@tct.com (Chip Salzenberg) (04/04/91)
According to dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi): >It turns out that lockf() uses a lock daemon and, under SunOS at least, >this doesn't always work. Okay, so SunOS is broken. This is news? :-( >NFS does make symbolic links links correctly. It makes them correctly, but not reliably. Network lossage and/or resource exhaustion can cause a successful symlink() call to return failure. My advice? Use lockf() and avoid SunOS. -- Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip> Brand X Industries Custodial, Refurbishing and Containment Service When You Never, Ever Want To See It Again [tm]
chip@tct.com (Chip Salzenberg) (04/04/91)
According to dean@coplex.uucp (Dean Brooks): >chip@tct.uucp (Chip Salzenberg) writes: >>Check the Smail source code -- specifically, sysdep.c. Smail does use >>".lock" files, just like Elm. > >Yeah, but ".lock" files dont hack hack it for 20 messages all trying >to arrive in your mailbox at the same time. Correspondence with Dean reveals that Smail seems to work okay, and that it's Elm that's dropping the ball. Of course, there is no such thing as reliable elimination of a stale lock file under UNIX: how do you know that you're removing the stale one? So lockf() is really the best solution. -- Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip> Brand X Industries Custodial, Refurbishing and Containment Service When You Never, Ever Want To See It Again [tm]
gemini@geminix.in-berlin.de (Uwe Doering) (04/05/91)
chip@tct.com (Chip Salzenberg) writes: >According to dean@coplex.uucp (Dean Brooks): >>chip@tct.uucp (Chip Salzenberg) writes: >>>Check the Smail source code -- specifically, sysdep.c. Smail does use >>>".lock" files, just like Elm. >> >>Yeah, but ".lock" files dont hack hack it for 20 messages all trying >>to arrive in your mailbox at the same time. > >Correspondence with Dean reveals that Smail seems to work okay, and >that it's Elm that's dropping the ball. > >Of course, there is no such thing as reliable elimination of a stale >lock file under UNIX: how do you know that you're removing the stale >one? So lockf() is really the best solution. You may be right that lockf() is the best solution. However, in this case there is simply a bug in Elm's lock() function. It happily blows away every lock file that has a PID of another process in it, regardless of whether this process is still alive or not. Took a while until I found this out. Anyway, below there is the necessary patch that corrects this problem. Following this patch, there is another one that shows how to convince Smail 3.1.20 to use lock files instead of lockf() unter SysVr3. Maybe this works for earlier Smail releases, too. Have fun. Uwe Here's the Elm 2.3 PL11 patch for $ELMSRC/src/leavembox.c: ---------------------------- cut here ------------------------------ *** leavembox.c.00 Sat Mar 30 04:01:47 1991 --- leavembox.c Sat Mar 30 04:01:58 1991 *************** *** 704,711 **** if (read(create_fd, pid_buffer, SHORT) > 0) { create_iteration = atoi(pid_buffer); if (create_iteration) { ! if (kill(create_iteration, 0)) { ! close(create_fd); if (unlink(lock_name) != 0) { dprint(1, (debugfile, "Error %s (%s)\n\ttrying to unlink file %s (%s)\n", --- 704,710 ---- if (read(create_fd, pid_buffer, SHORT) > 0) { create_iteration = atoi(pid_buffer); if (create_iteration) { ! if (kill(create_iteration, 0) && errno == ESRCH) { if (unlink(lock_name) != 0) { dprint(1, (debugfile, "Error %s (%s)\n\ttrying to unlink file %s (%s)\n", *************** *** 714,719 **** --- 713,719 ---- "\n\rCouldn't remove the current lock file %s\n\r", lock_name); PutLine2(LINES, 0, "** %s - %s **\n\r", error_name(errno), error_description(errno)); + close(create_fd); if (direction == INCOMING) leave(); else *************** *** 722,727 **** --- 722,728 ---- } } } + close(create_fd); create_iteration = 0; } #endif ---------------------------- cut here ------------------------------ And this is the Smail 3.1.20 patch for file $SMAILSRC/conf/os/sys5.3: ---------------------------- cut here ------------------------------ *** sys5.3.00 Sat Feb 23 11:18:59 1991 --- sys5.3 Thu Mar 28 11:51:19 1991 *************** *** 21,37 **** OSNAMES=UNIX_SYS5_3:UNIX_SYS5:UNIX # LOCKING_PROTOCOL - macros for efficient file locking ! LOCKING_PROTOCOL="\ ! #include <unistd.h> ! #define LOCK_REQUIRES_WRITE ! #define lock_fd(fd) (lockf((fd), F_TLOCK, 0L) < 0? FAIL: SUCCEED) ! #define lock_fd_wait(fd) (lockf((fd), F_LOCK, 0L) < 0? FAIL: SUCCEED) ! #define unlock_fd(fd) ((void) lockf((fd), F_ULOCK, 0L)) ! #define unlock_fd_wait(fd) ((void) lockf((fd), F_ULOCK, 0L)) ! #define USE_FCNTL_RD_LOCK ! #define lock_fd_rd_wait(fd) (fcntl_rd_lock(fd)) ! extern int fcntl_rd_lock(); ! " # MAILBOX_DIR - in which directory are user mailbox files MAILBOX_DIR=/usr/mail --- 21,28 ---- OSNAMES=UNIX_SYS5_3:UNIX_SYS5:UNIX # LOCKING_PROTOCOL - macros for efficient file locking ! LOCKING_PROTOCOL= ! LOCK_BY_NAME=TRUE # compatible with ELM # MAILBOX_DIR - in which directory are user mailbox files MAILBOX_DIR=/usr/mail ---------------------------- cut here ------------------------------ -- Uwe Doering | INET : gemini@geminix.in-berlin.de Berlin |---------------------------------------------------------------- Germany | UUCP : ...!unido!fub!geminix.in-berlin.de!gemini
chip@tct.com (Chip Salzenberg) (04/09/91)
According to gemini@geminix.in-berlin.de (Uwe Doering): >Following this patch, there is another one that shows how to convince >Smail 3.1.20 to use lock files instead of lockf() unter SysVr3. Maybe >this works for earlier Smail releases, too. The patches posted by Uwe for Elm are much appreciated. His patches for Smail, however, are *far* more than is required. Smail 3 already has a knob that controls mailbox locking specifically. In conf/EDITME, and in conf/os/<your-os-here>, do *not* set FLOCK_MAILBOX. That's it. (See the comments in conf/os/sys5* for further description of of FLOCK_MAILBOX.) -- Brand X Industries Custodial, Refurbishing and Containment Service: When You Never, Ever Want To See It Again [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>