dplatt@teknowledge-vaxc.UUCP (02/19/87)
I'm running into a strange sendmail abort and haven't been able to pin it down... can anybody give me a hint? The situation is as follows: I'm on a Sun 3/52 workstation, running SunOS 3.2. My sendmail daemon has been invoked with the "-bd -q15m" options. Occasionally, the queue-running daemon aborts (the forked child, not the parent). The conditions appear to be the following: 1) I've sent a message to a host that is down or unreachable. 2) sendmail has made repeated tries to deliver the message. 3) The system has been up (without a reboot) for at least a day. 4) The abort typically occurs between midnight and 8 AM. The symptoms appear to be: a) sendmail aborts quietly and dumps core; it doesn't generate a message into the system log or to the console. b) the "d" (data) file of the undelivered mailgram remains in the /usr/spool/mqueue directory. c) The "l" (lock) file is apparently being left in the mqueue directory, as the next queue run generates an "id: locked" message in the syslog. This happens only once, though... the queue run 30 minutes after the abort does not report "id: locked", so it appears that somebody is deleting the lock file. d) the "q" (control) file is being deleted at some point, although I'm not sure when; it's gone when I come to work. My sendmail.cf is derived from the "sendmail.cf.subsidiary" file that came with SunOS 3.2, with a couple of mods: - I use the "or10m" option to cause SMTP connections to time out if the foreign host doesn't respond within 10 minutes. - I have two mailers ("ether" and "localether") which are defined with the P=[IPC], A=IPC options. Ruleset 0 selects the "localether" mailer for outbound mail being sent to hosts that don't have a domain specification (i.e. are on our local Ethernet), and "ether" for hosts with a domain spec. The "localether" mailer delivers mail directly; the "ether" mailer passes the mail to our local Internet relay host for delivery, and hacks the "From:" address to include the relay host's name rather than the sending Sun's name (which isn't registered on the Internet). - I have a "frozen" sendmail.fc, derived from the sendmail.cf after the last set of changes were made. Any ideas what might be going on here? I've seen some symptoms in the past that lead me to suspect that the SunOS 3.2 sendmail may begin to suffer from "bit decay" after the system has been up for a prolonged period of time [strange aborts, curable only by a reboot... killing all copies of sendmail and restarting the daemon does NOT cure the problem... sticky-pages damaged, perhaps?). Anybody else seen these symptoms, or have a cure or a diagnosis procedure? As a possible workaround, I've removed the "-q15m" from the daemon invokation in /etc/rc.local, and have added a queue-running command in crontab. It'll be interesting to see if the problem goes away! Dave Platt Internet: dplatt@teknowledge-vaxc.arpa Usenet: {hplabs|sun|ucbvax}!dplatt%teknowledge-vaxc.arpa Voice: (415) 424-0500 USnail: Teknowledge, Inc. 1850 Embarcadero Road Palo Alto, CA 94303
mouse@mcgill-vision.UUCP (03/01/87)
In article <9721@teknowledge-vaxc.ARPA>, dplatt@teknowledge-vaxc.ARPA (Dave Platt) writes: > I'm running into a strange sendmail abort and haven't been able to pin > it down... can anybody give me a hint? > The situation is as follows: I'm on a Sun 3/52 workstation, [...] > My sendmail.cf is derived from the "sendmail.cf.subsidiary" file that > came with SunOS 3.2, with a couple of mods: [...] > - I have a "frozen" sendmail.fc, derived from the sendmail.cf after > the last set of changes were made. I thought Sun sendmail didn't work with frozen config files (maybe this has been fixed in 3.2 - Guy?). Try removing the frozen config and see if it quits dying (when running with the unfrozen config). der Mouse USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!musocs!mcgill-vision!mouse think!mosart!mcgill-vision!mouse Europe: mcvax!decvax!utcsri!musocs!mcgill-vision!mouse ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu