[net.bugs.4bsd] Final word on sendmail dropping mail.

mark@tove.UUCP (Mark Weiser) (12/05/84)

I am tempting the net by claiming this to be the last word, but
lots of new information about the sendmail queuing bug has
come in, particularly from John Pierce, and I have had several requests 
for more information.  So here is the scoop:

------------------------------------------------------------------
Symptom:
Some mail (only local?) is thrown away when sendmail decides the load
is too high for immediate delivery.

--------------------------------------------------------------------
Repeat By:
Crank up the load over 12 or so.  Send local mail.  Run sendmail -q
to force delivery of queued mail.  Wait for delivery....

----------------------------------------------------------------
Fix by:
The problem is not necessarily sendmail's fault, and can be fixed either in
sendmail or in /bin/mail.  There is a possibility that your system
does not have this bug: some /bin/mail's out there seem to have been fixed
and no one knows how, who, when, or where.  The symptom of the bug
is that local mail vanishes during the period of high load.

One fix is to set the threshold for queuing so
high it never happens.  This is easy to do (but probably requires a sendmail
recompilation), but has the disadvantage that there will be moments
of high load when an overactive sendmail will send your load still higher.
The fix is to change line 110 (your numbers may vary) of sendmail/src/conf.c.
Mine now reads:

int	QueueLA =	99;	/* load avg > QueueLA -> just queue */

For binary only sites:  supposedly one can accomplish this same thing
by placing the following line in your sendmail.cf file:

Ox99

I never actually got this to work for me.

The second fix comes for John Pierce at ucsd.
To quote from his unix-wizards posting (in case you missed it):

> In the function bulkmail():
> 
> 	/*
> 	 * When we fall out of this, argv[1] should be first name,
> 	 * argc should be number of names + 1.
> 	 */
> 
> 	while (argc > 1 && *argv[1] == '-') {
> 		cp = *++argv;
> 		argc--;
> 		switch (cp[1]) {
> 		case 'r':
> 			if (argc <= 0) {
> 				usage("bulkmail, case r, first");
> 				done();
> 			}
> /*
>  * If this code is left in, local mail passed on by 'sendmail' after it has
>  * been queued fails because then 'sendmail' passes "-r jwp -d jwp".  I don't
>  * understand why, since for non-queued mail it passes only "-d jwp" (as
>  * nearly as I can tell).  Probably the correct fix would be to get the
>  * effective UID at the time 'my_name' is established and check that against
>  * root's UID (and possibly others'), but it's too late to think about that
>  * now.
>  *							--jwp, 20Jul84, 0320
>  *			if (strcmp(my_name, "root") &&
>  *			    strcmp(my_name, "uucp") &&
>  *			    strcmp(my_name, "daemon") &&
>  *			    strcmp(my_name, "network")) {
>  *				usage(my_name ? my_name : "NULL");
>  *				done();
>  *			}
>  */
> 			gaver++;
> 			strcpy(truename, argv[1]);
> 			fgets(line, LSIZE, stdin);
> 			if (strcmpn("From", line, 4) == 0)
> 				line[0] = '\0';
> 			argv++;
> 			argc--;
> 			break;
> 
> 		case 'h':
> -----------------------------------------------------------------------------
> 
> I don't know of a way for binary-only sites to fix this.  It's possible that
> it could be fixed in sendmail.cf, but I make no pretense of understanding
> that very well and I leave it alone as much as possible.  Getting from a
> sendmail.cf to sendmail.fc is sufficiently buggy that I never know whether
> I've screwed something up or it just doesn't work right.
-- 
Spoken: Mark Weiser 	ARPA:	mark@maryland	Phone: (301) 454-7817
CSNet:	mark@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!mark
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

dave@uwvax.UUCP (Dave Cohrs) (12/06/84)

> --------------------------------------------------------------------
> Repeat By:
> Crank up the load over 12 or so.  Send local mail.  Run sendmail -q
> to force delivery of queued mail.  Wait for delivery....
> 
> ----------------------------------------------------------------

	will a load of 50 do (recorded today)?

> Re: taking out the 'user' checks.

This isn't the best thing to do.  The problem here really is that
/bin/mail (as all good [grrrr] BSD programs) does a getlogin() instead
of a getpwuid(getuid()).  If this is done, the code functions fine
as is.  If these checks are taken out, *ANY USER* can send mail
and say it's from WHOMEVER THEY WANT to say it's from (Note: sites
that have implemented this 'fix' now have very insecure mail systems).
/bin/mail has enough holes in it, it doesn't need more!

-- 
(Bug?  What bug?  That's a feature!)

Dave Cohrs
...!{allegra,heurikon,ihnp4,seismo,uwm-evax}!uwvax!dave
dave@wisc-rsch.arpa