[comp.unix.aux] Big, nasty bug in A/UX rmail and/or uuxqt

alexis@panix.uucp (Alexis Rosen) (12/07/90)

I've just spent an hour tracking down a truly nasty problem in A/UX. I
haven't seen it before, and given its flaky behavior, it's lucky that I
even noticed it (thanks to a note to "postmaster" from a user on another
system).

The problem is this: Every once in a while, incoming mail won't work. The
person sending the mail will get a bounce that looks like this:

From: panix!MAILER-DAEMON@cmcl2.NYU.EDU (Mail Delivery Subsystem)
Subject: Returned mail: unknown mailer error 2
Message-Id: <9012061822.AA02026@panix>
To: liberty.cs.columbia.edu!travis@cmcl2.NYU.EDU
 
   ----- Transcript of session follows -----
mail: /usr/mail/mara.lock not creatable after 10 tries
554 mara... unknown mailer error 2
 
   ----- Unsent message follows -----
[etc.]

Now it seems to me that only a few things could cause this error, if the
message is correct.
1) Permissions are wrong on /usr/mail.
2) mail can't create new files.
3) the lock file already exists and can't be overwritten.

I'm sure #1 isn't the cause- mail works 99% of the time.
#3 seems impossible to me- if that were so, it wouldn't be an error, it would
be a lock.

This leaves us only with option #2. And now, another interesting bit of info:
Cnews has been throwing up batches of news, too. It complains about there
not being anough space on the device. Which is bull- there's at lest 25MB
free for non-root users, 40MB free for root. BUT- that cnews complaint is
_also_ the symptom of a different problem: a lack of available free file
descriptors. I used to see this all the time, because of the known bug in
A/UX's uuxqt, which forgets to close a file for every job it does. However,
I fixed this (with a script I posted quite a while ago) and I haven't lost
a newsbatch in months. Until last night.

After I got the note about the mail bounce, I looked through the uucp logfile,
and I noticed that at least 9 more messages had been bounced in the last 24
hours. I verified that at least the last four (the ones that were still in
the spool) were in fact the same kind of bounce.

So it seems very likely to me that the problem is that things (like rmail
and cunbatch) are running out of file descriptors. What could be causing
this???

One other note- the system is now busier than it ever has been before, and
it's handling more mail than before. That probably explains why life is
getting more difficult...

Thanks,
---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
{cmcl2,apple}!panix!alexis

alexis@panix.uucp (Alexis Rosen) (12/10/90)

I've gotten some mail about this too...

It seems I was right in my guess that they were running out of file
descriptors. Changing the appropriate kernel parameters fixed things.

What I'm really curious about is why we didn't get messages on the
console immediately, as soon as we ran out the first time (we did start
getting them eventually).

(As this is now an A/UX topic only, followups to comp.unix.aux)

---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
{cmcl2,apple}!panix!alexis