[comp.mail.sendmail] sendmail 5.64 hangs

woods@ncar.ucar.edu (Greg Woods) (07/03/90)

I have a problem with the newly-announced sendmail 5.64 from Berkeley.
I FTPed the code, fixed the usual timezone brain damage (so that the
timezone prints as "MDT" instead of "-0600") in arpadate.c (why doesn't
that code work on Suns? It looks like they went through a fair amount
of trouble to make it portable). Then I installed it on our Sun servers
here (mostly 3/280's and 4/280's running Sun OS 4.0.3 if it makes a difference)
What happens is that after a while, all mail coming into the machine hangs.
If I telnet to the SMTP port on one of these machines, I can type the HELO
command, and RCPT To, and I never get a response to the RCPT TO command.
This is also borne out by ps(1) which shows many sendmail processes,
all but the daemon in the HELO or RCPT TO states. Since one of the
things they did to improve efficiency was to go to flock(2) calls instead
of lock files (locks are used to prevent simultaneous processing of the
same queued message by multiple instances of the queue daemon, a very common
occurrence on our central post office machine which processes about 3000
messages a day) I was most anxious to install this new version. I also
suppose that all the sendmail processes are hung flock(2)ing some file
(probably the aliases database since the only flock calls I see in the
source involve the queue file or the aliases file).

In order to debug this it will be necessary to reinstall the 5.64 version
on at least one machine and wait for it to hang so I can get some core
files and determine where it was hanging. Then go through the source and
try to come up with a fix. Before I do this, has anyone else seen this problem
and/or come up with a fix?

--Greg

amanda@mermaid.intercon.com (Amanda Walker) (07/03/90)

In article <7852@ncar.ucar.edu>, woods@ncar.ucar.edu (Greg Woods) writes:
> after a while, all mail coming into the machine hangs.
> If I telnet to the SMTP port on one of these machines, I can type the HELO
> command, and RCPT To, and I never get a response to the RCPT TO command.

This is exactly the problem I run into on a DG AViiON, without even having
to wait a while :-(.  I haven't tracked it down yet, but my current theory
is that it's hanging in a flock() somewhere...

--
Amanda Walker
InterCon Systems Corporation
--
"I can only assume this is not the first-class compartment."
	--Hitchhiker's Guide to the Galaxy