piet@NIC.EU.net (Piet Beertema) (02/07/91)
After installing 5.64, I ran into the problem that on some systems sendmail would dump core when using the frozen config file. I made some patches that appeared to cure the problem and posted them to this newsgroup. After installing 5.65, I encountered the same problem, even though I found that my previous patches to 5.64 were present in 5.65. I found another bug and posted a patch for it. Thereafter I didn't have problems anymore, although I had the strong feeling that all the patches I had posted sofar were only workarounds and that there was something very basically wrong in sendmail, introduced in 5.64, that made the malloc() structure mess up under seemingly random conditions when the frozen config is used. Today the problem hit real hard on this system, mcsun.EU.net, which is the central backbone site of EUnet and where sendmail is very heavily used. The symptom was that the load had gone up to 170 (ever seen a SUN-4/280 with a load of 170 ?!?), caused by numerous *running* sendmail processes, all STMP sessions and all except a few in the "result wait" state. Finally it was found that sendmail was looping in malloc(). What initiated the problem was a small and in itself harmless change in a file that is used to construct a class in my sendmail.cf. The problem went away when I removed the frozen config. And it didn't come back when I made another minor change in said file and then used the frozen config again. This makes it clear that my suspicion was right: there IS indeed something very basically wrong in sendmail 5.6[45] that under circumstances causes the malloc() structure to be messed up when using the frozen config. Usually this will lead to core dumps, but in this particular case it led to numerous looping sendmails. So be warned that this can happen if you run 5.6[45]! Afterwards it appeared that the problem had also affected other machines: since all those sendmails kept running, the remote sendmails they were talking to stayed "active" too and thus kept all their files open; that caused at least one machine to run out of file descriptors.... Until now I haven't found what it is that is so basically wrong, so I still don't have a real patch for it either. Needless to say that I would appreciate any pointers (or better of course: a patch that really cures it!). -- Piet Beertema, EUnet-NIC, Amsterdam (piet@mcsun.EU.net)
smart@manta.mel.dit.CSIRO.AU (Robert Smart) (02/12/91)
In article <2420@mcsun.eu.net> piet@NIC.EU.net (Piet Beertema) writes: > >Until now I haven't found what it is that is so basically >wrong, so I still don't have a real patch for it either. >Needless to say that I would appreciate any pointers (or >better of course: a patch that really cures it!). > Me too. We find 5.6[45] of Sendmail behaves better if we don't run the queue in daemon mode. I.e. we do sendmail -bd and run the queue from crontab. But this is obviously a hack. [Mind you I never quite understood why sendmail isn't run from inetd with the queue run from cron: that would seem more unix style than a monolithic program sitting there doing everything.] Bob Smart