[comp.mail.sendmail] Problem with res_search

roy@phri.UUCP (Roy Smith) (12/01/88)

	We're trying to get sendmail 5.59 running here.  We thought we
had it all going when Bill Russell pointed out to me that if you send
mail to "roy@garbage", sendmail claims to deliver it OK and the mail
gets filed in that great mailbox in the sky (needless to say there is
no host named "garbage").  After some dbxing, I tracked the problem down
to the following call near the top of getmxrr() in domain.c of the sendmail
source:

	n = res_search(host, C_IN, T_MX, (char *)&answer, sizeof(answer));
        if (n < 0) {
#ifdef DEBUG
                if (tTd(8, 1))
                        printf("getmxrr: res_search failed (errno=%d, h_errno=%d)\n",
                            errno, h_errno);
#endif
                switch(h_errno) {
                case NO_DATA:
                case NO_RECOVERY:
                        goto punt;
                case HOST_NOT_FOUND:
                        *rcode = EX_NOHOST;
                        break;
                case TRY_AGAIN:
                        *rcode = EX_TEMPFAIL;
                        break;
                }
                return(-1);
        }

the problem is that res_search, when handed a non-existant name (perhaps
only a non-exstant non-qualified name?) returns -1, like it should, but
also leaves h_error set to 0, like it shouldn't.  Since there is no default:
clause in the switch, rcode never to be set.  This in turn causes deliver()
(which called getmxrr()) to mess up; it sees that getmxrr() returned -1
so it doesn't deliver the mail, later it sees that rcode is 0, so it
claims to have delivered it properly; bad coding in both deliver and getmxrr
in my opinion.  Of course, res_search is not documented on the resolver(3)
man page, so a good case could be made that it should never have been
called directly, but instead sendmail should have called some other
documented resolver interface.

Anyway, the problem in res_search seems to be as follows.  Near the top
of res_search h_errno is set to HOST_NOT_FOUND.  A few lines later it does
a res_querydomain() which fails (returning -1).  It tests to see if it
simply couldn't connect to the name server, then if it could, test to
see what error the name server returned in h_error.  The test is a bit
complex:

	if ((h_errno != HOST_NOT_FOUND && h_errno != NO_DATA) ||
	    (_res.options & RES_DNSRCH) == 0)
		break;
	h_errno = 0;

At this point h_errno is 1 (HOST_NOT_FOUND) so the test becomes:

	if ((0 && xxx) || (_res.options & RES_DNSRCH) == 0)
		break;
	h_errno = 0;

which in turn reduces to:

	if (_res.options & RES_DNSRCH == 0)
		break;
	h_errno = 0;

at this point, _res.options does have the RES_DNSRCH bit set (it is
actually 0x2C1 and RES_DNSRCH is 0x200) so h_error gets cleared, and it
stays that way until eventually res_search returns -1.

	That's as far as I go.  It's pretty clear to me that this has to be
a bug in the resolver.  The behaviour just doesn't make any sense, but
since I'm far from an expert on resover/name-server issues (a rank novice
is more like it) I won't suggest any specific fixes.  I also don't really
understand what the _res.options all do.  It's also clear that even after
the resolver problem gets solved, the sendmail code really should be fixed.
In two different levels in the code, a problem at a lower level wasn't
caught simply becuase a switch() didn't have a default case.  Sloppy,
sloppy, sloppy.  Either just fold all unknown error returns into
HOST_NOT_FOUND, or at least issue a syslog warning about it.  But for
chrisake, don't just assume it can't happen!
-- 
Roy Smith, System Administrator
Public Health Research Institute
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net
"The connector is the network"