woods@ncar.ucar.edu (Greg Woods) (07/03/90)
I have a problem with the newly-announced sendmail 5.64 from Berkeley. I FTPed the code, fixed the usual timezone brain damage (so that the timezone prints as "MDT" instead of "-0600") in arpadate.c (why doesn't that code work on Suns? It looks like they went through a fair amount of trouble to make it portable). Then I installed it on our Sun servers here (mostly 3/280's and 4/280's running Sun OS 4.0.3 if it makes a difference) What happens is that after a while, all mail coming into the machine hangs. If I telnet to the SMTP port on one of these machines, I can type the HELO command, and RCPT To, and I never get a response to the RCPT TO command. This is also borne out by ps(1) which shows many sendmail processes, all but the daemon in the HELO or RCPT TO states. Since one of the things they did to improve efficiency was to go to flock(2) calls instead of lock files (locks are used to prevent simultaneous processing of the same queued message by multiple instances of the queue daemon, a very common occurrence on our central post office machine which processes about 3000 messages a day) I was most anxious to install this new version. I also suppose that all the sendmail processes are hung flock(2)ing some file (probably the aliases database since the only flock calls I see in the source involve the queue file or the aliases file). In order to debug this it will be necessary to reinstall the 5.64 version on at least one machine and wait for it to hang so I can get some core files and determine where it was hanging. Then go through the source and try to come up with a fix. Before I do this, has anyone else seen this problem and/or come up with a fix? --Greg
amanda@mermaid.intercon.com (Amanda Walker) (07/03/90)
In article <7852@ncar.ucar.edu>, woods@ncar.ucar.edu (Greg Woods) writes: > after a while, all mail coming into the machine hangs. > If I telnet to the SMTP port on one of these machines, I can type the HELO > command, and RCPT To, and I never get a response to the RCPT TO command. This is exactly the problem I run into on a DG AViiON, without even having to wait a while :-(. I haven't tracked it down yet, but my current theory is that it's hanging in a flock() somewhere... -- Amanda Walker InterCon Systems Corporation -- "I can only assume this is not the first-class compartment." --Hitchhiker's Guide to the Galaxy