nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/11/90)
Excerpts from internet.info-andrew: 10-Jul-90 Re: messages - a confession Bill Janssen@parc.xerox. (882) > My pet peeve is message drop-off time between `sendmessage' and > /usr/lib/sendmail. It hangs for dozens of seconds -- unlike /bin/Mail, > or GNU Emacs's version of sendmail. Why is that? And why should the > mail-reading tool be locked up during that time? Why isn't that forked > off in the background? Are there some sendmail switches set wrong? Yes -- there are sendmail switches set wrong for all other mailers in the world, and only AMS sets them right. When AMS tells you "your message has been sent" it means it. Other mailers quickly come back and pretend everything is OK when it isn't really. If you're willing to live with the risks that other mailers give you -- notably that sendmail will die right away for lack of memory and your mail will never get sent and you'll never hear about it -- it is very easy to make things work faster. Just use the "OldSendmailProgram" AndrewSetup variable to point to something other than /usr/lib/sendmail for your "sendmail" program. The thing it points to can be a shell script that calls sendmail with whatever options you like; just don't complaint to me when you find that your mail occasionally disappears into a black hole. Actually, while I'm confessing things, I think I'd like to go on record as believing that many of Messages' performance problems can be traced to the fact that we tried to make it extremely reliable, more so than other mailers. I just had an exchange of personal mail with Bill Cattey on the topic, and I'll quote from what I told him on the subject of why it sometimes takes a long time to incorporate new mail from your mailbox into the AMS database: ---------------- Begin long quotation from mail to Bill Cattey ---------------- I hate to say it, but I think the real problem is that the code is too careful, and thus tends to amplify any file system performance problems. Consider the basic "inc" scenario where you take a single piece of mail out of /usr/spool/mail/xxx and put it into .MESSAGES/yyy/+zzz (this comes from memory so it may miss a step or two): 1. Open /usr/spool/mail/xxx. 2. Lock /usr/spool/mail/xxx. 3. Open .MESSAGES/yyy/.MS_MsgDir 4. Lock .MESSAGES/yyy/.MS_MsgDir 5. Open .MESSAGES/yyy/.AMS_DIRMOD (This will be our trace if thing get aborted mid-operation) 6. Open .MESSAGES/yyy/+zzz (the body file) 7. Write .MESSAGES/yyy/+zzz (the body file) 8. Close .MESSAGES/yyy/+zzz (the body file) 9. Write .MESSAGES/yyy/.MS_MsgDir 10. Fsync .MESSAGES/yyy/.MS_MsgDir (We don't close it now because we may be processing multiple things, but we need to fsync it to make the data safe.) 11. Unlink .MESSAGES/yyy/.AMS_DIRMOD 12. Truncate /usr/spool/mail/xxx to zero length. 13. Close .MESSAGES/yyy/.MS_MsgDir Now, all of these steps can be justified as necessary in terms of reliability. But consider what a mail interface that was willing to trade off a little reliability for some performance could do instead (and I believe this is what many mailers actually do): 1. Open /usr/spool/mail/xxx. 2. Open .MESSAGES/yyy/.MS_MsgDir 3. Open .MESSAGES/yyy/+zzz (the body file) 4. Write .MESSAGES/yyy/+zzz (the body file) 5. Close .MESSAGES/yyy/+zzz (the body file) 6. Write .MESSAGES/yyy/.MS_MsgDir 7. Truncate /usr/spool/mail/xxx to zero length. 8. Close .MESSAGES/yyy/.MS_MsgDir So by eliminating all the locking and synchronization, we've also eliminated over a third of the network file system calls. Now consider an interface like MH, which (if I recall correctly) doesn't have any index files. It's file system calls could be reduced to the following: 1. Open /usr/spool/mail/xxx. 2. Open .MESSAGES/yyy/+zzz (the body file) 3. Write .MESSAGES/yyy/+zzz (the body file) 4. Close .MESSAGES/yyy/+zzz (the body file) 5. Truncate /usr/spool/mail/xxx to zero length. Now, actually I think that MH (like nearly all mailers) does lock the /usr/spool/mail file, but doesn't have any index files and doesn't lock anything else. This means that it does 6 file system operations where AMS does 13. I seriously doubt that you need to look any further to find the performance differences on the "inc" operation. ---------------- End quotation ---------------- Now, for a mail system explicitly designed for large-scale bboard support, a lack of index files would be crazy. And we were just dead serious about making the system really reliable -- possibly too serious, but I still don't believe that. The bottom line is that if you try to build a reliable database on top of a distributed UNIX file system, it's going to be VERY slow. Most mailers give up on reliability, we gave up some speed. You pays your money & you takes your choice... Now, one thing that wouldn't be too hard to add to AMS would be a "LiveDangerously" preference. It could avoid a lot of file locking and fsync'ing and could give sendmail the "fork and be happy" option. There would really be a minimal amount of coding necessary to provide such an option, and AMS would then be as unreliable as any other mailer, but I'd really hate to do it -- people would lose mail and then complain about AMS losing their mail! Well, the above diatribe may not make AMS run any faster for you, but perhaps it will give you some more insight into what it's doing when it's too slow... -- Nathaniel
nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/13/90)
Actually, if you're running a very large site with AFS and lots of users, and performance is slow, you should consider doing what I've been recommending CMU do for years, without success: Have everyone run AMS with a remote messageserver on one (or a small number of) dedicated messageserver machines. Instead of having every workstation at the site beating on the mail system, you'll have all its fs activity centralized, maybe even on the local disk of the messageserver machine. Using preferences such as AMS_RemoteServer and AMS_RemoteLogin, and the "-S" switch to Messages, you can make this very painless -- it should work just like it does now, only faster. -- NB
janssen@parc.xerox.com (Bill Janssen) (07/13/90)
I hate to say it, but between the three mailers I've really had experience with (VAX/VMS mail, GNU Emacs RMAIL, and Andrew messages), `messages' is the only one that has lost mail on me. I'm not sure the code that takes the UNIX mail spool file and puts it into ~/Mailbox is all that careful, as in one case I lost about 12 messages because I chose "Check New Messages" while my home disk was (inadvertently) full. They just disappeared somewhere, and I immediately added a new check to my trouble light in `console'. In another instance, I missed many messages because they had been appended to the preceeding message. The code that splits the mail spool file was either too picky, or not enough, I don't remember, and all the odd names in the Xerox mail messages gave it grief. Both problems were with version 6.xx of messages, I haven't had problems with 7.14 yet. Bill
nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/13/90)
Sigh... I guess I asked for that. I also guess I'd not be surprised to find that the least robust piece of AMS, as far as the posibility of losing mail goes, is the piece that takes things out of /usr/spool/mail and puts it into the Mailbox directory, since that part isn't used at CMU and was only added for the non-CMU release. It's too bad you didn't report the problem when it happened; we might have stood a better chance of tracking it down when the trail was warm. Nonetheless I'm going to go take a peek at that code and see if there's anything that might be a bug of this sort, lingering since version 6.xx...
pive@BANRUC01.BITNET (07/13/90)
Maybe this can be a sugestion for further releases: I've noticed that validating the recipients is very slow. So one can try to implement the following scenario: when I use send/post from the menu the message is saved in a directory (e.g. ~/.Mailqueue), then when leaving messages (or choosing another menuoption) a new proces is started, this proces validates the recipients and mails the message, when for one or another reason this fails, it mails this message back to the user so he/she can correct his/her mistakes. (I know, this is a very rough scenario). P. Verhaeghe University of Antwerp, RUCA Algebra / Geometry Groenenborgerlaan 171 B-2020 Antwerpen, Belgium Tel: +32 3 2180308 Fax: +32 3 2180217 Telex: RUCABI 33362 E-mail: pive@banruc01.bitnet (or pive@ccu.uia.ac.be)
Craig_Everhart@TRANSARC.COM (07/14/90)
This is how most mail systems handle user name validation: they don't validate anything at composition/submission time, and let erroneously-addressed messages get mailed back to the submitter. You can achieve these semantics with AMS by disabling the validation of addresses as you see fit, generally in one of the AndrewSetup files. The setup.help file describes the validations. In brief, there are two kinds of things getting validated: local user names and remote mail destination names. Local user names are validated according to the settings for the options: AMS_WPValidation AMS_PasswdValidation AMS_LocalDatabaseValidation AMS_AliasesValidation (Non-local) destination mail domain names are validated according to the settings for the options: AMS_ValidateDestHosts AMS_HardHostValidationErrors AMS_DeliveryViaDomainMXAddress AMS_DeliveryViaDomainAddress AMS_DeliveryViaGethostbyname AMS_DeliveryViaHostTable To turn off validation of local user names, set all the AMS_xxxValidation values to zero. To turn off validation of remote destination mail domain names, set AMS_ValidateDestHosts to zero. Once this is done, validation will happen only in the mail delivery agent, which you can generally get to mail error messages back to you. Maybe it's only one part of the recipient validation that's taking a long time, or something that's slightly mis-configured for your environment. Validation is supposed to be fast enough that you're willing to have it tell you right away if you made a typing mistake, but if it's getting in your way, turn it off. Craig
wdc@ATHENA.MIT.EDU (Bill Cattey) (07/14/90)
My recollection of the problem with getting stuff from /usr/spool/mail was that we now understood what it all was: AMS reads from the begining of the spool file, and takes all the newer messages. Then it truncates /usr/spool/mail file. The mail that went down the black hole was any mail that the user put back in the /usr/spool/mail file from previous interactions with mailers that allowed that to be done. That mail was not taken by AMS because it was old, and was not saved away, because AMS ostensibly truncated the file without looking at all of it. The solution was to make AMS not take input from /usr/spool/mail if it noticed that the hold flag was set in either the person's local or system wide mail configuration file. As a side note, a couple of new AMS users are confused when they get the error message complaining about this condition. We've been telling them to clear the hold bit. I think we may occasionally forget to mention to them that mail put back in /usr/spool will be lost if you use AMS on it. (Some history just seems to fall through the cracks... :-) ) I always felt that AMS should have been made smarter to actually understand what all it was doing to the spool file, but I didn't say anything at the time, because I didn't want to ask for work on adding hair to a subsystem that didn't affect me directly. -wdc