[net.unix-wizards] 4.2 lost mail

Alan Parker <parker@nrl-css.ARPA> (11/26/84)

Is anyone aware of a problem that causes local mail not to be delivered
on a 4.2 system.    Once a week or so, I receive a report from someone
who is sure that local mail was not delivered.  No error indication is
returned to the sender and no trace of the message can be found.   I've
confirmed a few cases, so I'm sure that all cases are not user errors.
Any ideas?

-Alan

greenber@acf4.UUCP (11/27/84)

<>

As a side thought, what the heck caused this???:
      .-----------------------------------------
      |
      |
     \ /
From ??? Mon Nov 26 20:11:58 1984
Received: by NYU-ACF4.ARPA; Mon, 26 Nov 84 20:11:55 est
Date: Mon, 26 Nov 84 20:11:55 est
From: lbw9518
To: greenber
Status: RO



Ross M. Greenberg  @ NYU   ---->  allegra!cmcl2!acf4!greenber  <----

kolling@magic.ARPA (11/29/84)

>Is anyone aware of a problem that causes local mail not to be delivered
>on a 4.2 system.    Once a week or so, I receive a report from someone
>who is sure that local mail was not delivered.  No error indication is
>returned to the sender and no trace of the message can be found.   I've
>confirmed a few cases, so I'm sure that all cases are not user errors.
>Any ideas?

Both mh and mhe users can lose mail if they use emacs to compose
messages and write the message to a file before sending it.  The write
breaks the connection between the buffer and the file mh/mhe expects
the message to be in, so an essentially empty buffer is mailed, and even
the empty buffer doesn't get to the intended recipient since the To field
is empty also.  However, this does "leave a trace".  In the
log files you'll find a message-id line and a From line, but no To
line.

Karen (kolling@decwrl.arpa or circus::kolling)

mark@tove.UUCP (Mark Weiser) (12/01/84)

	Is anyone aware of a problem that causes local mail not to be delivered
	on a 4.2 system.    Once a week or so, I receive a report from someone
	who is sure that local mail was not delivered.  No error indication is
	returned to the sender and no trace of the message can be found.   I've
	confirmed a few cases, so I'm sure that all cases are not user errors.
	Any ideas?
	-Alan

Sendmail checks the load average and queues up message for later
delivery if the load is over some N (around 8 or so I think).
This mode does not work--the messages are mostly lost for ever
when the load goes over N.  You can change this magic constant
in one of the .h files to 9999, make a new sendmail, and delivery
on busy Vaxes gets much more reliable.

I find the presence of this bug to be amazing, considering all
the sendmails that are out there.  I posted this bug report a 
few months ago.  I guess most people have quiet machines.
-- 
Spoken: Mark Weiser 	ARPA:	mark@maryland	Phone: (301) 454-7817
CSNet:	mark@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!mark
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

pedz@smu.UUCP (12/01/84)

Look in /usr/src/usr.lib/sendmail/src/conf.c (the first few
links may be different on your system) and there are two
define's called QueueLA and RefuseLA which are set to some "low"
constants (like 10 and 16).  These are used as trigger points.
When the system load average gets above these numbers then
messages are queued (in the case of the QueueLA constant) or
SMTP connections are refused (in the case of the RefuseLA
constant).  The problem you are experiencing is that when
messages are queued, they eventually just get lost.  The only
way I know of to fix this problem is to bump the QueueLA up to
some unreal number (like 1000).  It may be that the problem is
caused becuase the queueing option has not been compiled in or
it may be that the code is actually bad.  I do not know and I
have not had time to fully explore what is wrong.  I just know
that bumping up the constant cures the problem.

The reason I wrote this as a response is basically to get feed
back from the (implied) questions above.  Is there a better fix
to the problem?

Perry
convex!smu!pedz

chuqui@nsc.UUCP (Cheshire Chuqui) (12/02/84)

In article <53@tove.UUCP> mark@tove.UUCP (Mark Weiser) writes:
>Sendmail checks the load average and queues up message for later
>delivery if the load is over some N (around 8 or so I think).
>This mode does not work--the messages are mostly lost for ever
>when the load goes over N.
>
>I find the presence of this bug to be amazing, considering all
>the sendmails that are out there.

I find the presence of this bug to be amazing also, considering that I've
never seen it on either of my vaxes. Both are running mostly standard 4.2
with no changes to sendmail (that I know of... :-). If you are losing stuff
into the mailqueue then the chances are that you don't have a sendmail
clearning the queue out on a regular basis in your /etc/rc. Ours looks like
this:

if [ -f /usr/lib/sendmail ]; then
	(cd /usr/spool/mqueue; rm -f lf*)
	/usr/lib/sendmail -bd -q30m & echo -n ' sendmail'	>/dev/console
fi

This will spawn a sendmail that sits sleeping in the background. It will
wake up every half an hour (-q30m) and flush out anything in the queue. It
IS a misfeature that once something goes into the sendmail queue this is
the only sendmail that will dequeue it (actually, typing '/usr/lib/sendmail
-q ' will, too, I believe if you have the proper priviledges) but it is
definitely a feature, not a bug.

You really don't want to set things up to always send mail-- I've seen a
780 go to its knees because of sendmail (a loadaverage of 7 went to a
loadaverage of about 25 because of 4 simultaneous sendmails to large
mailing lists). I DO wish that the people who wrote sendmail had decided
that any sendmail could flush out the queue if the loadaverage was
appropriate rather than having to set one up in /etc/rc.

chuq (sendmail is a hog, but a useful one...)

-- 
From the center of a Plaid pentagram:		Chuq Von Rospach
{cbosgd,decwrl,fortune,hplabs,ihnp4,seismo}!nsc!chuqui  nsc!chuqui@decwrl.ARPA

  ~But you know, monsieur, that as long as she wears the claw of the dragon
  upon her breast you can do nothing-- her soul belongs to me!~

wls@astrovax.UUCP (William L. Sebok) (12/03/84)

> Sendmail checks the load average and queues up message for later
> delivery if the load is over some N (around 8 or so I think).
> This mode does not work--the messages are mostly lost for ever
> when the load goes over N.  You can change this magic constant
> in one of the .h files to 9999, make a new sendmail, and delivery
> on busy Vaxes gets much more reliable.
> 
> I find the presence of this bug to be amazing, considering all
> the sendmails that are out there.  I posted this bug report a 
> few months ago.  I guess most people have quiet machines.
>
> Spoken: Mark Weiser 	ARPA:	mark@maryland	Phone: (301) 454-7817

We have anything but a quiet machine: with load average often over 10.  However
I haven't seen sendmail lose mail. The mail does get queued.  At least the
above statement about messages being mostly lost would have to false here.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls

jim@haring.UUCP (12/04/84)

    > You really don't want to set things up to always send mail-- I've seen a
    > 780 go to its knees because of sendmail (a loadaverage of 7 went to a
    > loadaverage of about 25 because of 4 simultaneous sendmails to large
    > mailing lists).
One thing which helps make this true is the 'sync' which the syslog daemon
would do after receiving every message, which zapped the buffer cache. It
should be taken out.

    > chuq (sendmail is a hog, but a useful one...)
Agreed.

Jim McKie    Centrum voor Wiskunde en Informatica, Amsterdam    mcvax!jim

holmes@dalcs.UUCP (Ray Holmes) (12/07/84)

[]
	We also have experienced the "lost mail" syndrome. It only happens
on queued mail. I, however, have noticed the following events:

Mail is sent to multiple recipients (both local and remote).
The mail is queued (a single copy, of course) due to the load average.
The mail is received by all non-local recipients and is never seen again
locally.

Its got to be the gremlins eh?

				Ray

loverso@sunybcs.UUCP (John Robert LoVerso) (12/09/84)

> Look in /usr/src/usr.lib/sendmail/src/conf.c (the first few
> links may be different on your system) and there are two
> define's called QueueLA and RefuseLA which are set to some "low"
> constants (like 10 and 16).  These are used as trigger points.
> When the system load average gets above these numbers then
> messages are queued (in the case of the QueueLA constant) or
> SMTP connections are refused (in the case of the RefuseLA
> constant).
> 
> Is there a better fix to the problem?
--
If you look thru the Version.c file; delta 372 refers to the options
"x" and "X", which alter the values of QueueLA and RefuseLA respectively.
Other than those 4 lines, they are undocumented.  At least you dont have
to go recompiling the code to change them (i put them in the config file).

Its agreed sendmail is a dog.  John Quarterman at UT told me about an
alternate mailer called "SM" (small mailer) which functionally replaces
most of sendmail and is written in peices (rather than one huge program).
Its from Lucasfilm.  Does anybody know more about it (as - where/how can
we get it??)

..John
--
John Robert LoVerso @ SUNY Buffalo (716-636-3004)
	LoVerso%Buffalo@CSNET-RELAY
-or-	CSDJLV@SUNYABVA.BITNET
-or-	..!{decvax|watmath|rocksanne}!sunybcs!loverso

"Home is where you can where your hat..."

pedz@smu.UUCP (12/14/84)

There seems to be a question left unanswered.  At our site and at
the site who originated the base note to this response, the queue
system for the mail simply does not work.  (We do have sendmail
in the daemon mode in the background.)  But at other sites the
queue seems to work.  So the question is why?  Any ideas?  I
checked and the queue option at our site is on.

Perry

yenbut@uw-beaver (Voradesh Yenbut) (12/21/84)

> []
> 	We also have experienced the "lost mail" syndrome. It only happens
> on queued mail. I, however, have noticed the following events:
> 
> Mail is sent to multiple recipients (both local and remote).
> The mail is queued (a single copy, of course) due to the load average.
> The mail is received by all non-local recipients and is never seen again
> locally.
> 
> Its got to be the gremlins eh?
> 
> 				Ray

I found that sendmail, compiled with DBM defined, does not call initaliases()
before it processes queued mails.  So, it will fail when looking for an alias
name of a local recipient.  As quoted from Version.c of sendmail,

> D 4.16	83/10/16 16:08:08	eric	382	381
> MRs:	
> Postpone opening the alias DBM file until after the fork in srvrsmtp so
> that the alias database is as current as possible; thanks to dagobah!efo
> (Eben Ostby) for this one.

it looks like sendmail version 4.16 and above may have this kind of problem.
My fix in queue.c is as follows:

==================================================================
  SCCSID(@(#)queue.c	4.2		3/11/84	(no queueing));

*** /tmp/,RCSt1002663	Fri Dec 21 12:02:11 1984
--- queue.c	Mon Dec  3 09:35:01 1984
***************
*** 256,261
  		/* child -- double fork */
  		if (fork() != 0)
  			exit(EX_OK);
  	}
  # ifdef LOG
  	if (LogLevel > 11)

--- 258,266 -----
  		/* child -- double fork */
  		if (fork() != 0)
  			exit(EX_OK);
+ 
+ 		/* open the alias database */
+ 		initaliases(AliasFile, FALSE);
  	}
  # ifdef LOG
  	if (LogLevel > 11)

crp@ccivax.UUCP (Chuck Privitera) (01/16/85)

I know this article shouldn't go to net.unix-wizards, but the
original was posted there. 

In article <276@uw-beaver> Voradesh Yenbut writes:

> I found that sendmail, compiled with DBM defined, does not call
> initaliases() before it processes queued mails.  So, it will fail
> when looking for an alias name of a local recipient.  As quoted from
> Version.c of sendmail,

>> D 4.16	83/10/16 16:08:08	eric	382	381
>> MRs:	
>> Postpone opening the alias DBM file until after the fork in srvrsmtp so
>> that the alias database is as current as possible; thanks to dagobah!efo
>> (Eben Ostby) for this one.

> it looks like sendmail version 4.16 and above may have this kind of
> problem.

I don't know about you, but my sendmail does alias expansion when
the mail is queued. In the mainline code for sendmail, there is a
switch which does alias initialization only if:
	1. You are running as newaliases (or -bi option given)
	2. You are not running in daemon mode.
Thus, alias expansions are done when you submit mail, so the
only time the daemon has to worry about doing alias expansions
is when mail is submitted via SMTP. And as the note in Version.c
says, initaliases is called in srvrsmtp.

My question is this: has anybody installed this and is it really
necessary?

crp@ccivax.UUCP (Chuck Privitera) (04/11/85)

Description:
	A couple months back in article <276@uw-beaver>, Voradesh
	Yenbut stated:

>	I found that sendmail, compiled with DBM defined, does not
>	call initaliases() before it processes queued mails.  So,
>	it will fail when looking for an alias name of a local
>	recipient.

	Shortly thereafter in article <230@ccivax.UUCP> I rebutted:

>	I don't know about you, but my sendmail does alias expansion when
>	the mail is queued. In the mainline code for sendmail, there is a
>	switch which does alias initialization only if:
>		1. You are running as newaliases (or -bi option given)
>		2. You are not running in daemon mode.
>	Thus, alias expansions are done when you submit mail, so the
>	only time the daemon has to worry about doing alias expansions
>	is when mail is submitted via SMTP. And as the note in Version.c
>	says, initaliases is called in srvrsmtp.

	I ended this article with the question: Is Voradesh's
	fix really necessary? Nobody has responded and I have
	been bitten by NOT installing his fix.

	We have implemented a hashed password file as described in
	the Toronto Usenix conference talk on improving system perform-
	ance.  To make sendmail use the hashed password file, we used
	mdbm distributed on the net (so you can have more than
	one dbm database open at one time). With mdbm, you reference
	a database with a pointer to a structure mdbm (much like
	stdio's FILE pointers), this is how I got bitten. The database
	was never opened so the pointer to the structure mdbm was
	null. While my argument above is entirely valid, sendmail
	still tries to do alias expansion on queue runs. So if you
	use plain ol' dbm, accesses to the database will quietly
	fail but the mail will still be delivered because the
	expansion is already complete. But for future considerations
	(I think 4.2+, 4.3 or whatever will use a hashed password
	 file), you should install Voradesh's fix.
Repeat-by:
	If you really want, use mdbm instead of dbm in sendmail
	and queue a mail message (to anybody, not just an alias).
Fix:
	For those of you who missed it:


> ==================================================================
>   SCCSID(@(#)queue.c	4.2		3/11/84	(no queueing));
> 
> *** /tmp/,RCSt1002663	Fri Dec 21 12:02:11 1984
> --- queue.c	Mon Dec  3 09:35:01 1984
> ***************
> *** 256,261
>   		/* child -- double fork */
>   		if (fork() != 0)
>   			exit(EX_OK);
>   	}
>   # ifdef LOG
>   	if (LogLevel > 11)
> 
> --- 258,266 -----
>   		/* child -- double fork */
>   		if (fork() != 0)
>   			exit(EX_OK);
> + 
> + 		/* open the alias database */
> + 		initaliases(AliasFile, FALSE);
>   	}
>   # ifdef LOG
>   	if (LogLevel > 11)
>