worley@compass.com (Dale Worley) (11/08/89)
We have been having some problems with using sendmail in a network of diskless Sun workstations that all mount the same filesystems via NFS from a set of file servers. Our fundamental desire is to have all of the workstations see the same file system, and particularly, to see the same set of mailboxes. As part of this philosophy, we have centralized system management and have enabled superuser access over the network. (Of course, each workstation has the same system of user names (implelemented via Yellow Pages).) In the past we have had all the workstations mount one /usr/spool/mail directory, and each workstation's sendmail (and any mail readers) write and read the mailboxes from that directory. In practice, we haven't had any trouble. However, people have warned me that in such a case the sendmails don't interlock each other correctly, and sometimes lose mail. A torture test (46 sendmails on 46 workstations trying to send ten messages to the same mailbox simultaneously) showed that while they usually interlock correctly, they don't always. (In fact, they work better than would be expected if they didn't interlock each other at all -- why is this?) Trying to figure out the best way to solve this has raised a number of questions, which I'm trying to find answers to. Please mail replies to worley@compass.com, and I will compose a summary if people are interested. 1. One solution to the problem is to set the "OR" option in sendmail.cf. This causes each sendmail to SMTP messages to the sendmail on the disk server which provides the /usr/spool/mail directory. Sendmails on this one machine will interlock each other correctly, so it eliminates the simultaneous-access problem. Unfortunately, in SunOS 4.0.3, OR is buggy -- (1) it causes the client sendmails to send *all* mail to the sendmail on the mailbox server, not just mail to be delivered locally, and (2) it makes sendmail unable to figure out the sending user name when its stdin/etc. are pipes. 2. It appears that sendmail uses the flock() file locking mechanism rather than the lockf() mechanism. According to the manual pages, flock() only locks the file on a particular CPU -- it does not interlock across the network. On the other hand, lockf() appears to be a variant of the fcntl() locking mechanism, which does work across the network, using the services of the lockd locking daemon. Why does sendmail use flock(), rather than lockf()? How much work would it be to convert sendmail (and all the mail readers) to use lockf()? 3. I have read somewhere that sendmail delivers mail into mailboxes by exec-ing /bin/mail. If this is so, then the "local" mailer entry in sendmail.cf could be redirected to use a different program, so that the conversion of sendmail to use lockf() could be done without modifying sendmail itself. Is this really true? 4. Why are there two entirely independent locking mechanisms, and why does only one work over the network? This seems to be a very strange "feature" of NFS. Dale Worley Compass, Inc. worley@compass.com