alexis@panix.uucp (Alexis Rosen) (12/07/90)
I've just spent an hour tracking down a truly nasty problem in A/UX. I haven't seen it before, and given its flaky behavior, it's lucky that I even noticed it (thanks to a note to "postmaster" from a user on another system). The problem is this: Every once in a while, incoming mail won't work. The person sending the mail will get a bounce that looks like this: From: panix!MAILER-DAEMON@cmcl2.NYU.EDU (Mail Delivery Subsystem) Subject: Returned mail: unknown mailer error 2 Message-Id: <9012061822.AA02026@panix> To: liberty.cs.columbia.edu!travis@cmcl2.NYU.EDU ----- Transcript of session follows ----- mail: /usr/mail/mara.lock not creatable after 10 tries 554 mara... unknown mailer error 2 ----- Unsent message follows ----- [etc.] Now it seems to me that only a few things could cause this error, if the message is correct. 1) Permissions are wrong on /usr/mail. 2) mail can't create new files. 3) the lock file already exists and can't be overwritten. I'm sure #1 isn't the cause- mail works 99% of the time. #3 seems impossible to me- if that were so, it wouldn't be an error, it would be a lock. This leaves us only with option #2. And now, another interesting bit of info: Cnews has been throwing up batches of news, too. It complains about there not being anough space on the device. Which is bull- there's at lest 25MB free for non-root users, 40MB free for root. BUT- that cnews complaint is _also_ the symptom of a different problem: a lack of available free file descriptors. I used to see this all the time, because of the known bug in A/UX's uuxqt, which forgets to close a file for every job it does. However, I fixed this (with a script I posted quite a while ago) and I haven't lost a newsbatch in months. Until last night. After I got the note about the mail bounce, I looked through the uucp logfile, and I noticed that at least 9 more messages had been bounced in the last 24 hours. I verified that at least the last four (the ones that were still in the spool) were in fact the same kind of bounce. So it seems very likely to me that the problem is that things (like rmail and cunbatch) are running out of file descriptors. What could be causing this??? One other note- the system is now busier than it ever has been before, and it's handling more mail than before. That probably explains why life is getting more difficult... Thanks, --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NY {cmcl2,apple}!panix!alexis
alexis@panix.uucp (Alexis Rosen) (12/10/90)
I've gotten some mail about this too... It seems I was right in my guess that they were running out of file descriptors. Changing the appropriate kernel parameters fixed things. What I'm really curious about is why we didn't get messages on the console immediately, as soon as we ran out the first time (we did start getting them eventually). (As this is now an A/UX topic only, followups to comp.unix.aux) --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NY {cmcl2,apple}!panix!alexis