[comp.soft-sys.andrew] Some REAL confessions about AMS speed

nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/11/90)

Excerpts from internet.info-andrew: 10-Jul-90 Re: messages - a
confession Bill Janssen@parc.xerox. (882)

> My pet peeve is message drop-off time between `sendmessage' and
> /usr/lib/sendmail.  It hangs for dozens of seconds -- unlike /bin/Mail,
> or GNU Emacs's version of sendmail.  Why is that?  And why should the
> mail-reading tool be locked up during that time?  Why isn't that forked
> off in the background?  Are there some sendmail switches set wrong?

Yes -- there are sendmail switches set wrong for all other mailers in
the world, and only AMS sets them right.  When AMS tells you "your
message has been sent" it means it.  Other mailers quickly come back and
pretend everything is OK when it isn't really.

If you're willing to live with the risks that other mailers give you --
notably that sendmail will die right away for lack of memory and your
mail will never get sent and you'll never hear about it -- it is very
easy to make things work faster.  Just use the "OldSendmailProgram"
AndrewSetup variable to point to something other than /usr/lib/sendmail
for your "sendmail" program.  The thing it points to can be a shell
script that calls sendmail with whatever options you like; just don't
complaint to me when you find that your mail occasionally disappears
into a black hole.

Actually, while I'm confessing things, I think I'd like to go on record
as believing that many of Messages' performance problems can be traced
to the fact that we tried to make it extremely reliable, more so than
other mailers.  I just had an exchange of personal mail with Bill Cattey
on the topic, and I'll quote from what I told him on the subject of why
it sometimes takes a long time to incorporate new mail from your mailbox
into the AMS database:

---------------- Begin long quotation from mail to Bill Cattey ----------------

I hate to say it, but I think the real problem is that the code is too
careful, and thus tends to amplify any file system performance problems.
 Consider the basic "inc" scenario where you take a single piece of mail
out of /usr/spool/mail/xxx and put it into .MESSAGES/yyy/+zzz (this
comes from memory so it may miss a step or two):

    1.  Open /usr/spool/mail/xxx.

    2.  Lock  /usr/spool/mail/xxx.

    3.  Open .MESSAGES/yyy/.MS_MsgDir

    4.  Lock .MESSAGES/yyy/.MS_MsgDir

    5.  Open .MESSAGES/yyy/.AMS_DIRMOD (This will be our trace if
        thing get aborted mid-operation)

    6.  Open .MESSAGES/yyy/+zzz (the body file)

    7.  Write  .MESSAGES/yyy/+zzz (the body file)

    8.  Close  .MESSAGES/yyy/+zzz (the body file)

    9.  Write .MESSAGES/yyy/.MS_MsgDir

    10.  Fsync .MESSAGES/yyy/.MS_MsgDir  (We don't close it now
        because we may be processing multiple things, but we need to
        fsync it to make the data safe.)

    11.  Unlink .MESSAGES/yyy/.AMS_DIRMOD

    12.  Truncate /usr/spool/mail/xxx to zero length.

    13.  Close .MESSAGES/yyy/.MS_MsgDir

Now, all of these steps can be justified as necessary in terms of
reliability.  But consider what a mail interface that was willing to
trade off a little reliability for some performance could do instead
(and I believe this is what many mailers actually do):

        1.  Open /usr/spool/mail/xxx.

        2.  Open .MESSAGES/yyy/.MS_MsgDir

        3.  Open .MESSAGES/yyy/+zzz (the body file)

        4.  Write  .MESSAGES/yyy/+zzz (the body file)

        5.  Close  .MESSAGES/yyy/+zzz (the body file)

        6.  Write .MESSAGES/yyy/.MS_MsgDir

        7.  Truncate /usr/spool/mail/xxx to zero length.

        8.  Close .MESSAGES/yyy/.MS_MsgDir

So by eliminating all the locking and synchronization, we've also
eliminated over a third of the network file system calls.  Now consider
an interface like MH, which (if I recall correctly) doesn't have any
index files.  It's file system calls could be reduced to the following:

    1.  Open /usr/spool/mail/xxx.

    2.  Open .MESSAGES/yyy/+zzz (the body file)

    3.  Write  .MESSAGES/yyy/+zzz (the body file)

    4.  Close  .MESSAGES/yyy/+zzz (the body file)

    5.  Truncate /usr/spool/mail/xxx to zero length.

Now, actually I think that MH (like nearly all mailers) does lock the
/usr/spool/mail file, but doesn't have any index files and doesn't lock
anything else.  This means that it does 6 file system operations where
AMS does 13.  I seriously doubt that you need to look any further to
find the performance differences on the "inc" operation.

---------------- End quotation ----------------

Now, for a mail system explicitly designed for large-scale bboard
support, a lack of index files would be crazy.  And we were just dead
serious about making the system really reliable -- possibly too serious,
but I still don't believe that.  The bottom line is that if you try to
build a reliable database on top of a distributed UNIX file system, it's
going to be VERY slow.  Most mailers give up on reliability, we gave up
some speed.  You pays your money & you takes your choice...

Now, one thing that wouldn't be too hard to add to AMS would be a
"LiveDangerously" preference.  It could avoid a lot of file locking and
fsync'ing and could give sendmail the "fork and be happy" option.  There
would really be a minimal amount of coding necessary to provide such an
option, and AMS would then be as unreliable as any other mailer, but I'd
really hate to do it -- people would lose mail and then complain about
AMS losing their mail!

Well, the above diatribe may not make AMS run any faster for you, but
perhaps it will give you some more insight into what it's doing when
it's too slow...    -- Nathaniel

nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/13/90)

Actually, if you're running a very large site with AFS and lots of
users, and performance is slow, you should consider doing what I've been
recommending CMU do for years, without success:  Have everyone run AMS
with a remote messageserver on one (or a small number of) dedicated
messageserver machines.  Instead of having every workstation at the site
beating on the mail system, you'll have all its fs activity centralized,
maybe even on the local disk of the messageserver machine.   Using
preferences such as AMS_RemoteServer and AMS_RemoteLogin, and the "-S"
switch to Messages,  you can make this very painless -- it should work
just like it does now, only faster.  -- NB

janssen@parc.xerox.com (Bill Janssen) (07/13/90)

I hate to say it, but between the three mailers I've really had
experience with (VAX/VMS mail, GNU Emacs RMAIL, and Andrew messages),
`messages' is the only one that has lost mail on me.  I'm not sure the
code that takes the UNIX mail spool file and puts it into ~/Mailbox is
all that careful, as in one case I lost about 12 messages because I
chose "Check New Messages" while my home disk was (inadvertently) full. 
They just disappeared somewhere, and I immediately added a new check to
my trouble light in `console'.  In another instance, I missed many
messages because they had been appended to the preceeding message.  The
code that splits the mail spool file was either too picky, or not
enough, I don't remember, and all the odd names in the Xerox mail
messages gave it grief.

Both problems were with version 6.xx of messages, I haven't had problems
with 7.14 yet.

Bill

nsb@THUMPER.BELLCORE.COM (Nathaniel Borenstein) (07/13/90)

Sigh... I guess I asked for that.

I also guess I'd not be surprised to find that the least robust piece of
AMS, as far as the posibility of losing mail goes, is the piece that
takes things out of /usr/spool/mail and puts it into the Mailbox
directory, since that part isn't used at CMU and was only added for the
non-CMU release.

It's too bad you didn't report the problem when it happened; we might
have stood a better chance of tracking it down when the trail was warm. 
Nonetheless I'm going to go take a peek at that code and see if there's
anything that might be a bug of this sort, lingering since version
6.xx...

pive@BANRUC01.BITNET (07/13/90)

Maybe this can be a sugestion for further releases:
I've noticed that validating the recipients is very slow. So one can try
to implement the following scenario: when I use send/post from the menu
the message is saved in a directory (e.g. ~/.Mailqueue), then when
leaving messages (or choosing another menuoption) a new proces is
started, this proces validates the recipients and mails the message,
when for one or another reason this fails, it mails this message back to
the user so he/she can correct his/her mistakes. (I know, this is a very
rough scenario).




P. Verhaeghe
University of Antwerp, RUCA
Algebra / Geometry
Groenenborgerlaan 171
B-2020 Antwerpen, Belgium

Tel: +32 3 2180308
Fax: +32 3 2180217
Telex: RUCABI 33362

E-mail: pive@banruc01.bitnet (or pive@ccu.uia.ac.be)

Craig_Everhart@TRANSARC.COM (07/14/90)

This is how most mail systems handle user name validation: they don't
validate anything at composition/submission time, and let
erroneously-addressed messages get mailed back to the submitter.  You
can achieve these semantics with AMS by disabling the validation of
addresses as you see fit, generally in one of the AndrewSetup files. 
The setup.help file describes the validations.

In brief, there are two kinds of things getting validated: local user
names and remote mail destination names.  Local user names are validated
according to the settings for the options:
	AMS_WPValidation
	AMS_PasswdValidation
	AMS_LocalDatabaseValidation
	AMS_AliasesValidation

(Non-local) destination mail domain names are validated according to the
settings for the options:
	AMS_ValidateDestHosts
	AMS_HardHostValidationErrors
	AMS_DeliveryViaDomainMXAddress
	AMS_DeliveryViaDomainAddress
	AMS_DeliveryViaGethostbyname
	AMS_DeliveryViaHostTable

To turn off validation of local user names, set all the
AMS_xxxValidation values to zero.  To turn off validation of remote
destination mail domain names, set AMS_ValidateDestHosts to zero.

Once this is done, validation will happen only in the mail delivery
agent, which you can generally get to mail error messages back to you.

Maybe it's only one part of the recipient validation that's taking a
long time, or something that's slightly mis-configured for your
environment.  Validation is supposed to be fast enough that you're
willing to have it tell you right away if you made a typing mistake, but
if it's getting in your way, turn it off.

		Craig

wdc@ATHENA.MIT.EDU (Bill Cattey) (07/14/90)

My recollection of the problem with getting stuff from /usr/spool/mail
was that we now understood what it all was:

AMS reads from the begining of the spool file, and takes all the newer
messages.

Then it truncates /usr/spool/mail file.

The mail that went down the black hole was any mail that the user put
back in the /usr/spool/mail file from previous interactions with mailers
that allowed that to be done.  That mail was not taken by AMS because it
was old, and was not saved away, because AMS ostensibly truncated the
file without looking at all of it.

The solution was to make AMS not take input from /usr/spool/mail if it
noticed that the hold flag was set in either the person's local or
system wide mail configuration file.

As a side note, a couple of new AMS users are confused when they get the
error message complaining about this condition.  We've been telling them
to clear the hold bit.  I think we may occasionally forget to mention to
them that mail put back in /usr/spool will be lost if you use AMS on it.
 (Some history just seems to fall through the cracks...  :-)  )

I always felt that AMS should have been made smarter to actually
understand what all it was doing to the spool file, but I didn't say
anything at the time, because I didn't want to ask for  work on adding
hair to a subsystem that didn't affect me directly.

-wdc