[comp.bugs.4bsd] sendmail 5 times faster!?!?

sow@cad.luth.se (Sven-Ove Westberg) (10/30/87)

I have found a big performance pig in sendmail. This patch works
fine on a Sun3. But it is hard too test the patch since the changes
only affect the timeout handling. My code is 4-5 times faster on 
big mails.


Description of the problem.

Orginal code:

collect()
{
	for all lines in message {
		sfgets
		write line
	}
}

sfgets()
{
	set up read time out
	read
	reset read time out
}


The pig is the set and reset of timeout for EACH line in the message.

My code:

collect()
{
	set up readtime out
	for all lines in message {
		qsfgets
		write line
	}
	reset read time out
}

qsfgets()
{
	readline
}


I would be very pleased if someone could comment this patch. If it
is dangerous in some case. Yes I now that I redefine the meaning of the
read time out parameter in sendmail.cf. But how many knows that it
is the time out for each line that is defined in the sendmail.cf.

For non source sites that will do a quick test. Set the read time
out to 0 in sendmail.cf. Don't blame me if your sendmail didn't
time out if you have bad SMTP connections.
eg.
Or0m


Sven-Ove Westberg, CAD, University of Lulea, S-951 87 Lulea, Sweden.
Tel:     +46-920-91677  (work)                 +46-920-48390  (home)
UUCP:    {uunet,mcvax}!enea!cad.luth.se!sow
Internet: sow@cad.luth.se


------- collect.c -------
*** /tmp/da4921	Tue Oct 27 20:30:36 1987
--- collect.c	Tue Oct 27 20:19:01 1987
***************
*** 34,39 ****
--- 34,46 ----
  **		The from person may be set.
  */
  
+ static	jmp_buf	CtxCollectTimeout;
+ 
+ #ifndef ETIMEDOUT
+ #define ETIMEDOUT EINTR
+ #endif
+ 
+ 
  collect(sayok)
  	bool sayok;
  {
***************
*** 41,46 ****
--- 48,55 ----
  	char buf[MAXFIELD+2];
  	register char *p;
  	extern char *hvalue();
+ 	register EVENT *CollectEvent = NULL;
+ 	extern collecttimeout();
  
  	/*
  	**  Create the temp file name and create the file.
***************
*** 62,78 ****
  	if (sayok)
  		message("354", "Enter mail, end with \".\" on a line by itself");
  
  	/*
  	**  Try to read a UNIX-style From line
  	*/
  
! 	(void) sfgets(buf, sizeof buf, InChannel);
  	fixcrlf(buf, FALSE);
  # ifndef NOTUNIX
  	if (!SaveFrom && strncmp(buf, "From ", 5) == 0)
  	{
  		eatfrom(buf);
! 		(void) sfgets(buf, sizeof buf, InChannel);
  		fixcrlf(buf, FALSE);
  	}
  # endif NOTUNIX
--- 71,100 ----
  	if (sayok)
  		message("354", "Enter mail, end with \".\" on a line by itself");
  
+     /*
+ 	**  If we are reading from a smtp connection setup read timeout.
+ 	*/
+ 
+ 	if(ReadTimeout != 0 && OpMode == MD_SMTP) {
+ 		if (setjmp(CtxCollectTimeout) != 0) {
+ 			errno = ETIMEDOUT;
+ 			syserr("net timeout");
+ 			goto  NetTimeout;  /* Do the net error routine */
+ 		}
+ 		CollectEvent = setevent((time_t) ReadTimeout, collecttimeout, 0);
+ 	}
+ 
  	/*
  	**  Try to read a UNIX-style From line
  	*/
  
! 	(void) qsfgets(buf, sizeof buf, InChannel);
  	fixcrlf(buf, FALSE);
  # ifndef NOTUNIX
  	if (!SaveFrom && strncmp(buf, "From ", 5) == 0)
  	{
  		eatfrom(buf);
! 		(void) qsfgets(buf, sizeof buf, InChannel);
  		fixcrlf(buf, FALSE);
  	}
  # endif NOTUNIX
***************
*** 113,119 ****
  			p = &buf[strlen(buf)];
  			*p++ = '\n';
  			*p++ = c;
! 			if (sfgets(p, MAXFIELD - (p - buf), InChannel) == NULL)
  				break;
  			fixcrlf(p, TRUE);
  		}
--- 135,141 ----
  			p = &buf[strlen(buf)];
  			*p++ = '\n';
  			*p++ = c;
! 			if (qsfgets(p, MAXFIELD - (p - buf), InChannel) == NULL)
  				break;
  			fixcrlf(p, TRUE);
  		}
***************
*** 128,134 ****
  
  		if (bitset(H_EOH, chompheader(buf, FALSE)))
  			break;
! 	} while (sfgets(buf, MAXFIELD, InChannel) != NULL);
  
  # ifdef DEBUG
  	if (tTd(30, 1))
--- 150,156 ----
  
  		if (bitset(H_EOH, chompheader(buf, FALSE)))
  			break;
! 	} while (qsfgets(buf, MAXFIELD, InChannel) != NULL);
  
  # ifdef DEBUG
  	if (tTd(30, 1))
***************
*** 137,143 ****
  
  	/* throw away a blank line */
  	if (buf[0] == '\0')
! 		(void) sfgets(buf, MAXFIELD, InChannel);
  
  	/*
  	**  Collect the body of the message.
--- 159,165 ----
  
  	/* throw away a blank line */
  	if (buf[0] == '\0')
! 		(void) qsfgets(buf, MAXFIELD, InChannel);
  
  	/*
  	**  Collect the body of the message.
***************
*** 167,173 ****
  		fputs("\n", tf);
  		if (ferror(tf))
  			tferror(tf);
! 	} while (sfgets(buf, MAXFIELD, InChannel) != NULL);
  	if (fflush(tf) != 0)
  		tferror(tf);
  	(void) fclose(tf);
--- 189,195 ----
  		fputs("\n", tf);
  		if (ferror(tf))
  			tferror(tf);
! 	} while (qsfgets(buf, MAXFIELD, InChannel) != NULL);
  	if (fflush(tf) != 0)
  		tferror(tf);
  	(void) fclose(tf);
***************
*** 175,180 ****
--- 197,203 ----
  	/* An EOF when running SMTP is an error */
  	if ((feof(InChannel) || ferror(InChannel)) && OpMode == MD_SMTP)
  	{
+ NetTimeout:
  		syserr("collect: unexpected close, from=%s", CurEnv->e_from.q_paddr);
  
  		/* don't return an error indication */
***************
*** 185,190 ****
--- 208,217 ----
  		finis();
  	}
  
+ 	/* clear the net read event if it has not sprung */
+ 	if(OpMode == MD_SMTP)
+ 		clrevent(CollectEvent);
+ 
  	/*
  	**  Find out some information from the headers.
  	**	Examples are who is the from person & the date.
***************
*** 218,223 ****
--- 245,257 ----
  	if ((CurEnv->e_dfp = fopen(CurEnv->e_df, "r")) == NULL)
  		syserr("Cannot reopen %s", CurEnv->e_df);
  }
+ 
+ static
+ collecttimeout()
+ {
+ 	longjmp(CtxCollectTimeout, 1);
+ }
+ 
  /*
  **  TFERROR -- signal error on writing the temporary file.
  **

------- util.c -------
*** /tmp/da4924	Tue Oct 27 20:30:38 1987
--- util.c	Tue Oct 27 20:06:46 1987
***************
*** 763,768 ****
--- 763,814 ----
  	longjmp(CtxReadTimeout, 1);
  }
  /*
+ **  QSFGETS -- "quick safe" fgets -- ignores random interrupts.
+ **
+ **	Parameters:
+ **		buf -- place to put the input line.
+ **		siz -- size of buf.
+ **		fp -- file to read from.
+ **
+ **	Returns:
+ **		NULL on error.  This will also leave buf containing a null string.
+ **		buf otherwise.
+ **
+ **	Side Effects:
+ **		none.
+ */
+ 
+ char *
+ qsfgets(buf, siz, fp)
+ 	char *buf;
+ 	int siz;
+ 	FILE *fp;
+ {
+ 	register char *p;
+ 
+ 	/* try to read */
+ 	p = NULL;
+ 	while (p == NULL && !feof(fp) && !ferror(fp))
+ 	{
+ 		errno = 0;
+ 		p = fgets(buf, siz, fp);
+ 		if (errno == EINTR)
+ 			clearerr(fp);
+ 	}
+ 
+ 	/* clean up the books and exit */
+ 	LineNumber++;
+ 	if (p == NULL)
+ 	{
+ 		buf[0] = '\0';
+ 		return (NULL);
+ 	}
+ 	for (p = buf; *p != '\0'; p++)
+ 		*p &= ~0200;
+ 	return (buf);
+ }
+ 
+ /*
  **  FGETFOLDED -- like fgets, but know about folded lines.
  **
  **	Parameters:

roy@phri.UUCP (Roy Smith) (11/01/87)

In article <751@cad.luth.se> Sven-Ove Westberg <sow@cad.luth.se> writes:
> I have found a big performance pig in sendmail.  This patch works fine on
> a Sun3.  But it is hard too test the patch since the changes only affect
> the timeout handling.  My code is 4-5 times faster on big mails.

	Not being a sendmail guru (I shudder everytime I contemplate editing
a sendmail.cf file) I can't say if Sven-Ove's patch is good or bad, but I
thought I'd mention a (possibly) realated sendmail problem that has been
bugging us, namely mail storms (akin to the famous IP broadcast storms).

	The basic setup is a 4.3 Vax-11/750 doing uucp for a bunch of (mostly
diskless) 3.[02] Sun-3's.  Mail for root, news, usenet, postmaster, etc is
all forwarded to the same two people (I'm one of them).  We each get our mail
on different diskless 3/50, but with both / and /usr/spool/mail directories
on the same server.

	The problem crops up when a whole bunch of mail comes in at once; the
most common cause being some downstream news site running out of disk space
and dumping 20 or 30 "rnews: execution failed" messages on us in a single
uucp connection.  Each one generates two sendmail connections (one for my
copy and one for the other person's copy) driving the loads on the recieving
workstations (and their joint server) through the roof.  Eventually, the
workstations can't keep up and the sendmail, ND, and/or NFS connections start
to time out.  Somebody's probably doing a lot of YP too.

	Now the real fun begins.  Each timed-out connection generates another
error message mailed to root (or maybe postmaster?), which in turn gets
forwarded to both of us on our already over-loaded machines.  At this point,
the system has become unstable, with error messages (times 2) being generated
faster than they can be delivered.  Usually, the end result is the load on
the beleagured Suns going up and up and up until they crash (often with
"panic: sbflush 2").  Ever see a perfmeter on a 3/50 roll over to the 0-32
load scale?  It's not a pretty sight.  Once the clients crash, things tend to
quiet down; I think when the vax sendmail tries to connect to a machine that
is down, it just queues the message without generating an mailed error
message; only when it gets the unexpected errors from the receiving deamon in
the middle of a connection does it freak out and generate more mail.

	Of course, by the time the server has been floundering for 10
minutes, other people think their diskless workstations have crashed and try
rebooting; all those nodes screaming for tftp connections and rarping to find
out their names doesn't help the situtation, but that's a different story.

	I don't know what the proper solution is, but something has to be
added somewhere to keep sendmail from going super-critical like this.  I note
with mild interest that this is the only time I've ever seen our 750 do
something which our Suns couldn't keep up with.  Maybe if I ran YP on the vax
I could slow down sendmail enough to provide the needed damping? :-)
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

sch@sequent.UUCP (Steve Hemminger) (11/02/87)

[Description of how collect() in sendmail uses sfgets() which sets a timeout]

Also, this code causes local mail input to timeout (which I consider a bug).

One change I have seen is to use pseudo code like:

	char (*getline)() = IsSmtp ? sfgets : fgets;