roy@phri.UUCP (Roy Smith) (12/01/88)
We're trying to get sendmail 5.59 running here. We thought we had it all going when Bill Russell pointed out to me that if you send mail to "roy@garbage", sendmail claims to deliver it OK and the mail gets filed in that great mailbox in the sky (needless to say there is no host named "garbage"). After some dbxing, I tracked the problem down to the following call near the top of getmxrr() in domain.c of the sendmail source: n = res_search(host, C_IN, T_MX, (char *)&answer, sizeof(answer)); if (n < 0) { #ifdef DEBUG if (tTd(8, 1)) printf("getmxrr: res_search failed (errno=%d, h_errno=%d)\n", errno, h_errno); #endif switch(h_errno) { case NO_DATA: case NO_RECOVERY: goto punt; case HOST_NOT_FOUND: *rcode = EX_NOHOST; break; case TRY_AGAIN: *rcode = EX_TEMPFAIL; break; } return(-1); } the problem is that res_search, when handed a non-existant name (perhaps only a non-exstant non-qualified name?) returns -1, like it should, but also leaves h_error set to 0, like it shouldn't. Since there is no default: clause in the switch, rcode never to be set. This in turn causes deliver() (which called getmxrr()) to mess up; it sees that getmxrr() returned -1 so it doesn't deliver the mail, later it sees that rcode is 0, so it claims to have delivered it properly; bad coding in both deliver and getmxrr in my opinion. Of course, res_search is not documented on the resolver(3) man page, so a good case could be made that it should never have been called directly, but instead sendmail should have called some other documented resolver interface. Anyway, the problem in res_search seems to be as follows. Near the top of res_search h_errno is set to HOST_NOT_FOUND. A few lines later it does a res_querydomain() which fails (returning -1). It tests to see if it simply couldn't connect to the name server, then if it could, test to see what error the name server returned in h_error. The test is a bit complex: if ((h_errno != HOST_NOT_FOUND && h_errno != NO_DATA) || (_res.options & RES_DNSRCH) == 0) break; h_errno = 0; At this point h_errno is 1 (HOST_NOT_FOUND) so the test becomes: if ((0 && xxx) || (_res.options & RES_DNSRCH) == 0) break; h_errno = 0; which in turn reduces to: if (_res.options & RES_DNSRCH == 0) break; h_errno = 0; at this point, _res.options does have the RES_DNSRCH bit set (it is actually 0x2C1 and RES_DNSRCH is 0x200) so h_error gets cleared, and it stays that way until eventually res_search returns -1. That's as far as I go. It's pretty clear to me that this has to be a bug in the resolver. The behaviour just doesn't make any sense, but since I'm far from an expert on resover/name-server issues (a rank novice is more like it) I won't suggest any specific fixes. I also don't really understand what the _res.options all do. It's also clear that even after the resolver problem gets solved, the sendmail code really should be fixed. In two different levels in the code, a problem at a lower level wasn't caught simply becuase a switch() didn't have a default case. Sloppy, sloppy, sloppy. Either just fold all unknown error returns into HOST_NOT_FOUND, or at least issue a syslog warning about it. But for chrisake, don't just assume it can't happen! -- Roy Smith, System Administrator Public Health Research Institute {allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net "The connector is the network"