jf@ap.co.umist.ac.uk (John Forrest) (03/02/91)
We're finally seeing one of those problems I knew was there but I hoped would never occur in real life. Basically all the machines in our cluster share a common /usr/spool/mailq - not as bad as it seems because it's on the same disk as the maildrops. Anyway, there is a finite problem that occurs if two sendmail processes on different machines have the same pid. This appears to have happened. One way round is to give each machine a separate mailq, but I don't really want to do this because we will have to run a queue flush process periodically on each. Another way might be to add the creating node's name to the file. Has anybody got a fix that does the latter, before I try to write one? I am guessing this is the fault, but it does ring true - one machine was trying to add something to the queue while another complained it couldn't with the same queue numbers. Another way might be to check the software that does not duplicate an existing mail number - I presume this exists for when mail fills up. We run Sendmailv5.65a (IDA version) on a network of Apollo's OS10.1. John Forrest Dept of Computation UMIST