[comp.mail.sendmail] Heavily loaded hosts

lindberg@cs.chalmers.se (Gunnar Lindberg) (07/21/89)

At times our mail gateway, chalmers.se [129.16.1.1] a VAX 11/750,
gets very heavily loaded by people on other (faster, :-) machines
sending large volumes of data via:

    foreach f ($FILES)
	mail foo%bar.se@chalmers.se
    end

Now, that's completely legal, still I would like to make chalmers.se
say "Wait, I'm too busy" at such times. Well, I looked into "sendmail"
code and basically it does:

    if (load > OX)			/* getla() > RefuseLA */
	deny_all_connections_for_5_seconds();

I don't want to offend anyone, but definately that code isn't very
"polite", :-). Besides, "getla()" didn't work on a VAX with a frozen
configuration. We corrected that and changed the code into:

    accept();				/* never deny connection */
    if ( ! fork())
	if (load > OX)
	    exit(message("421", "I'm too busy"));

Using this code on a Sequent we found that it counts in milli-jobs
(possibly that's why it never gets overloaded, :-) so getla() had
to to do some extra tricks (#ifdef sequent).

All the diffs follow below.

	Gunnar Lindberg

sendmail 5.60:
===================================================================
RCS file: conf.c,v
retrieving revision 1.2
diff -c -r1.2 conf.c
*** /tmp/,RCSt1000790	Thu Jul 13 16:06:11 1989
--- conf.c	Thu Jul 13 16:03:45 1989
***************
*** 429,434 ****
--- 429,445 ----
  		if (Nl[0].n_type == 0)
  			return (-1);
  	}
+ 	/*
+ 	 * Gunnar Lindberg, lindberg@cs.chalmers.se:
+ 	 * When using ".fc" ("-bz") files all sorts of funny things may
+ 	 * happen, e.g. you may find "kmem > 0", but no longer a valid
+ 	 * file descriptor for "/dev/kmem". If you close it you will block
+ 	 * acess to its current file (whatever *that* may be). The only
+ 	 * reasonable(?) thing to do is just to re-open "/dev/kmem".
+ 	 */
+ 	if (lseek(kmem, (off_t) Nl[X_AVENRUN].n_value, 0) == -1)
+ 		kmem = open("/dev/kmem", 0, 0);
+ 
  	if (lseek(kmem, (off_t) Nl[X_AVENRUN].n_value, 0) == -1 ||
  	    read(kmem, (char *) avenrun, sizeof(avenrun)) < sizeof(avenrun))
  	{
===================================================================
RCS file: conf.c,v
retrieving revision 1.3
diff -c -r1.3 conf.c
*** /tmp/,RCSt1a01977	Fri Jul 21 09:03:01 1989
--- conf.c	Thu Jul 13 17:55:52 1989
***************
*** 412,418
  getla()
  {
  	static int kmem = -1;
! # ifdef sun
  	long avenrun[3];
  # else
  	double avenrun[3];

--- 412,418 -----
  getla()
  {
  	static int kmem = -1;
! # if	defined(sun) || defined sequent
  	long avenrun[3];
  # else
  	double avenrun[3];
***************
*** 425,430
  		if (kmem < 0)
  			return (-1);
  		(void) ioctl(kmem, (int) FIOCLEX, (char *) 0);
  		nlist("/vmunix", Nl);
  		if (Nl[0].n_type == 0)
  			return (-1);

--- 425,433 -----
  		if (kmem < 0)
  			return (-1);
  		(void) ioctl(kmem, (int) FIOCLEX, (char *) 0);
+ # ifdef	sequent
+ 		nlist("/dynix", Nl);
+ # else	sequent
  		nlist("/vmunix", Nl);
  # endif	sequent
  		if (Nl[0].n_type == 0)
***************
*** 426,431
  			return (-1);
  		(void) ioctl(kmem, (int) FIOCLEX, (char *) 0);
  		nlist("/vmunix", Nl);
  		if (Nl[0].n_type == 0)
  			return (-1);
  	}

--- 429,435 -----
  		nlist("/dynix", Nl);
  # else	sequent
  		nlist("/vmunix", Nl);
+ # endif	sequent
  		if (Nl[0].n_type == 0)
  			return (-1);
  	}
***************
*** 449,454
  # ifdef sun
  	return ((int) (avenrun[0] + FSCALE/2) >> FSHIFT);
  # else
  	return ((int) (avenrun[0] + 0.5));
  # endif
  }

--- 453,461 -----
  # ifdef sun
  	return ((int) (avenrun[0] + FSCALE/2) >> FSHIFT);
  # else
+ # ifdef sequent
+ 	return ((int) ((avenrun[0] + 500)/1000));
+ # else	sequent
  	return ((int) (avenrun[0] + 0.5));
  # endif	sequent
  # endif
***************
*** 450,455
  	return ((int) (avenrun[0] + FSCALE/2) >> FSHIFT);
  # else
  	return ((int) (avenrun[0] + 0.5));
  # endif
  }
  

--- 457,463 -----
  	return ((int) ((avenrun[0] + 500)/1000));
  # else	sequent
  	return ((int) (avenrun[0] + 0.5));
+ # endif	sequent
  # endif
  }
===================================================================
RCS file: srvrsmtp.c,v
retrieving revision 1.3
diff -c -r1.3 srvrsmtp.c
*** /tmp/,RCSt1000790	Thu Jul 13 16:06:18 1989
--- srvrsmtp.c	Thu Jul 13 16:02:41 1989
***************
*** 144,149 ****
--- 144,152 ----
  		/* this must be us!! */
  		CurHostName = MyHostName;
  	}
+ 
+ 	hostbusy();	/* Accept connection? Non-returning if not. */
+ 
  	expand("\001e", inp, &inp[sizeof inp], CurEnv);
  	message("220", inp);
  	SmtpPhase = "startup";
***************
*** 532,537 ****
--- 535,558 ----
  			break;
  		}
  	}
+ }
+ 
+ /*
+  * Gunnar Lindberg, lindberg@cs.chalmers.se:
+  * Check that we are not too busy to accept the connection (OXnn).
+  * If we are, we just say "421 I'm too busy" and close. Now, that's
+  * not very polite, but we have to show him we mean it.
+  */
+  static
+ hostbusy()
+ {
+     if (getla() > RefuseLA)
+     {
+ 	message("421", "%s too busy, please try later", MyHostName);
+ 	if (InChild)
+ 	    ExitStat = EX_QUIT;
+ 	finis();
+     }
  }
  /*
  **  SKIPWORD -- skip a fixed word.
===================================================================
RCS file: daemon.c,v
retrieving revision 1.4
diff -c -r1.4 daemon.c
*** /tmp/,RCSt1000790	Thu Jul 13 16:06:25 1989
--- daemon.c	Thu Jul 13 16:06:04 1989
***************
*** 179,187 ****
--- 179,195 ----
  		struct sockaddr_in otherend;
  		extern int RefuseLA;
  
+ 		/*
+ 		 * Gunnar Lindberg, lindberg@cs.chalmers.se:
+ 		 * Since we now test load average in the child and reply
+ 		 * "421 I'm too busy" if if we are, we dont have to reject
+ 		 * connections here any more.
+ 		 */
+ #ifdef	notdef
  		/* see if we are rejecting connections */
  		while (getla() > RefuseLA)
  			sleep(5);
+ #endif	notdef
  
  		/* wait for a connection */
  		do
===================================================================

paul@uxc.cso.uiuc.edu (07/25/89)

Re: modifying sendmail to return 421 too busy

This is not a good idea.  On machines where fork() takes significant 
resources, having the child return the 421 means that the process
image has already been duplicated just to return a error message.

It then gets worse.  When the sending sendmail gets an open time-out
from a loaded remote machine with vanilla sendmail, it skips the remaining
messages to the same site during that queue run.  Returning a 421
error causes the sending sendmail to skip to the next message instead.
Thus the receiver will fork(), issue 421, and exit() for each message
in the sender's queue.  This can bring a loaded uni-processor VAX to 
its knees.


Paul Pomes
Univ of Illinois, CSO

lindberg@cs.chalmers.se (Gunnar Lindberg) (07/25/89)

To summarize: There are a number of good reason *not* to implement my
"brilliant" idea of sendmail replying "421 I'm too busy":

    1)	Paul Pomes <paul@uxc.cso.uiuc.edu>:
	Thus the receiver will fork(), issue 421, and exit() for
	each message in the sender's queue.  This can bring a loaded
	uni-processor VAX to its knees.
    
    2)	Brian Kantor <brian@ucsd.edu>:
	If you have multiple mail servers (i.e., more than one MX host),
	refusing connections on one of them will cause incoming mail to
	be redirected to another of them. If instead you accept the
	connection and "421" it, the mail gets requeued on the sender
	and the other MX hosts are not tried.
    
    3)	Both:
	Current sendmails note that a host is refusing connections and
	on the current queue run will avoid trying to make additional
	connections to it.

Anyway, forget about my changes to "srvrsmtp.c" and "daemon.c" - the
original code does a much better job than I did!

However, I do think the changes to "conf.c", to make "getla()" work on
hosts with a frozen configuration and on Sequent are still valid.

	Gunnar Lindberg