[comp.mail.sendmail] Sendmail 5.6[45] WARNING

piet@NIC.EU.net (Piet Beertema) (02/07/91)

After installing 5.64, I ran into the problem that on some
systems sendmail would dump core when using the frozen config
file. I made some patches that appeared to cure the problem
and posted them to this newsgroup.

After installing 5.65, I encountered the same problem, even
though I found that my previous patches to 5.64 were present
in 5.65. I found another bug and posted a patch for it.

Thereafter I didn't have problems anymore, although I had the
strong feeling that all the patches I had posted sofar were
only workarounds and that there was something very basically
wrong in sendmail, introduced in 5.64, that made the malloc()
structure mess up under seemingly random conditions when the
frozen config is used.

Today the problem hit real hard on this system, mcsun.EU.net,
which is the central backbone site of EUnet and where sendmail
is very heavily used. The symptom was that the load had gone
up to 170 (ever seen a SUN-4/280 with a load of 170 ?!?),
caused by numerous *running* sendmail processes, all STMP
sessions and all except a few in the "result wait" state.
Finally it was found that sendmail was looping in malloc().
What initiated the problem was a small and in itself harmless
change in a file that is used to construct a class in my
sendmail.cf. The problem went away when I removed the frozen
config. And it didn't come back when I made another minor
change in said file and then used the frozen config again.

This makes it clear that my suspicion was right: there IS
indeed something very basically wrong in sendmail 5.6[45]
that under circumstances causes the malloc() structure to
be messed up when using the frozen config. Usually this will
lead to core dumps, but in this particular case it led to
numerous looping sendmails.

So be warned that this can happen if you run 5.6[45]!

Afterwards it appeared that the problem had also affected
other machines: since all those sendmails kept running, the
remote sendmails they were talking to stayed "active" too
and thus kept all their files open; that caused at least
one machine to run out of file descriptors....

Until now I haven't found what it is that is so basically
wrong, so I still don't have a real patch for it either.
Needless to say that I would appreciate any pointers (or
better of course: a patch that really cures it!).


-- 
	Piet Beertema, EUnet-NIC, Amsterdam   (piet@mcsun.EU.net)

smart@manta.mel.dit.CSIRO.AU (Robert Smart) (02/12/91)

In article <2420@mcsun.eu.net> piet@NIC.EU.net (Piet Beertema) writes:
>
>Until now I haven't found what it is that is so basically
>wrong, so I still don't have a real patch for it either.
>Needless to say that I would appreciate any pointers (or
>better of course: a patch that really cures it!).
>
Me too. We find 5.6[45] of Sendmail behaves better if we don't
run the queue in daemon mode. I.e. we do sendmail -bd and run
the queue from crontab. But this is obviously a hack. [Mind you
I never quite understood why sendmail isn't run from inetd with
the queue run from cron: that would seem more unix style than
a monolithic program sitting there doing everything.]

Bob Smart