[comp.sys.sun] Incoming mail running wild

pcl%robots.oxford.ac.uk@nsfnet-relay.ac.uk (Paul Leyland) (03/02/90)
Can anyone explain the problem described below, and suggest a fix for it,
rather than the palliative I knocked together this morning?

We have a 4/390 running 4.0.3 and including all sendmail patches that were
available as at the end of February.  We used to see this on our old 3/280
though, so I don't think it has anything to do with the CPU architecture.

Every so often, a process runs wild and takes all available CPU time until
killed, thus preventing highly-niced background jobs from doing anything
useful.  Last night, one ran for 352 minutes before being stopped by hand.
The process name, as shown by "ps ax", is of the form "-AA#####" where
##### is the PID.  Sometimes, but not always, there is a /usr/lib/sendmail
running full-tilt as well.  In /var/spool/mqueue, there are corresponding
spool files.  They're normally empty, but occasionally they have a
fragment of incoming mail headers.

I've not noticed any correlation with other system activity, nor with the
state of health of other machines on the local ethernet.

I've taken to running the following script from cron every 15 minutes
which, while being mildly gruesome, does at least let us get some work
done.

Paul Leyland
8<----------------------------------------------------------------
#! /usr/bin/csh -fb
#  Flush free-running mail queue items.   2-Mar-1990 by P.C. Leyland

if ($#argv != 1) then
	echo Usage $0 time
	exit 1
endif

if ($1 <= 20) then
	echo Time must be greater than 20 seconds, you gave $1
	exit 1
endif

while (1)
	set COMMAND = `ps ax | grep 'AA[0-9][0-9][0-9][0-9][0-9]' | head -1`
	if ($#COMMAND == 0) exit 0

# There is one.  Now, has it been running for too long?

	set noglob
	set RUNTIME = `echo $COMMAND[3] | sed -e 's?:? * 60 + ?'`
	unset noglob

	@ SECONDS = $RUNTIME
	if ($SECONDS >= $1) then
		kill -9 $COMMAND[1]
		rm -f /var/spool/mqueue/*{$COMMAND[1]}
	endif
end