roy@phri.UUCP (Roy Smith) (12/01/88)
We're trying to get sendmail 5.59 running here. We thought we
had it all going when Bill Russell pointed out to me that if you send
mail to "roy@garbage", sendmail claims to deliver it OK and the mail
gets filed in that great mailbox in the sky (needless to say there is
no host named "garbage"). After some dbxing, I tracked the problem down
to the following call near the top of getmxrr() in domain.c of the sendmail
source:
n = res_search(host, C_IN, T_MX, (char *)&answer, sizeof(answer));
if (n < 0) {
#ifdef DEBUG
if (tTd(8, 1))
printf("getmxrr: res_search failed (errno=%d, h_errno=%d)\n",
errno, h_errno);
#endif
switch(h_errno) {
case NO_DATA:
case NO_RECOVERY:
goto punt;
case HOST_NOT_FOUND:
*rcode = EX_NOHOST;
break;
case TRY_AGAIN:
*rcode = EX_TEMPFAIL;
break;
}
return(-1);
}
the problem is that res_search, when handed a non-existant name (perhaps
only a non-exstant non-qualified name?) returns -1, like it should, but
also leaves h_error set to 0, like it shouldn't. Since there is no default:
clause in the switch, rcode never to be set. This in turn causes deliver()
(which called getmxrr()) to mess up; it sees that getmxrr() returned -1
so it doesn't deliver the mail, later it sees that rcode is 0, so it
claims to have delivered it properly; bad coding in both deliver and getmxrr
in my opinion. Of course, res_search is not documented on the resolver(3)
man page, so a good case could be made that it should never have been
called directly, but instead sendmail should have called some other
documented resolver interface.
Anyway, the problem in res_search seems to be as follows. Near the top
of res_search h_errno is set to HOST_NOT_FOUND. A few lines later it does
a res_querydomain() which fails (returning -1). It tests to see if it
simply couldn't connect to the name server, then if it could, test to
see what error the name server returned in h_error. The test is a bit
complex:
if ((h_errno != HOST_NOT_FOUND && h_errno != NO_DATA) ||
(_res.options & RES_DNSRCH) == 0)
break;
h_errno = 0;
At this point h_errno is 1 (HOST_NOT_FOUND) so the test becomes:
if ((0 && xxx) || (_res.options & RES_DNSRCH) == 0)
break;
h_errno = 0;
which in turn reduces to:
if (_res.options & RES_DNSRCH == 0)
break;
h_errno = 0;
at this point, _res.options does have the RES_DNSRCH bit set (it is
actually 0x2C1 and RES_DNSRCH is 0x200) so h_error gets cleared, and it
stays that way until eventually res_search returns -1.
That's as far as I go. It's pretty clear to me that this has to be
a bug in the resolver. The behaviour just doesn't make any sense, but
since I'm far from an expert on resover/name-server issues (a rank novice
is more like it) I won't suggest any specific fixes. I also don't really
understand what the _res.options all do. It's also clear that even after
the resolver problem gets solved, the sendmail code really should be fixed.
In two different levels in the code, a problem at a lower level wasn't
caught simply becuase a switch() didn't have a default case. Sloppy,
sloppy, sloppy. Either just fold all unknown error returns into
HOST_NOT_FOUND, or at least issue a syslog warning about it. But for
chrisake, don't just assume it can't happen!
--
Roy Smith, System Administrator
Public Health Research Institute
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net
"The connector is the network"