markw@airgun.wg.waii.com (Mark Whetzel) (04/16/91)
I am working with another programmer on porting the IDA sendmail to the IBM RT running AIX 2.2.1. (yes its yucky IBM, but it works and it's paid for :-) So far so good on making it work, but we may have found a bug with the AIX at the latest maint level, and code that works on one RT (2705 level) won't work on another RT (1773 level) at a higher maint level. The problem area of code is dealing with domain server queries in the routine domain.c, in particular it is using the res_search system subroutine. I can't find any documentation about this routine, and referencing both AIX V3 (RS6000) and SUNOS (4.0.1) and CONVEXOS, none of these systems also have documentation about this system subroutine. I can find res_init, res_mkquery, res_send, but not res_search. What is happening is res_search, called for looking for MX records is returning a -1 return code and h_errno is set with TRY_AGAIN (value 2) rather than NO_DATA (value 4). The NO_DATA value indicates that the host record is valid, but no records of the requested type could be found. The nameserver is reachable, and a piece of test code that queries with type = T_A work properly and return a valid query record, but types of T_MX fail, with this TRY_AGAIN failure. This causes sendmail to defer the mail, waiting for a nameserver positive response. We currently do not have many MX records in our nameserver and the system name that is being queried, does not have any MX records on file. Here is the code fragment from domain.c from the IDA sendmail: [some code deleted] typedef union { HEADER qb1; char qb2[PACKETSZ]; } querybuf; extern int h_errno; querybuf answer; [some code deleted] errno = 0; n = res_search(host, C_IN, T_MX, (char *)&answer, sizeof(answer)); if (n < 0) { if (tTd(8, 1)) printf("getmxrr: res_search failed (errno=%d, h_errno=%d)\n", errno, h_errno); switch (h_errno) { # ifndef NO_DATA # define NO_DATA NO_ADDRESS # endif /* NO_DATA */ case NO_DATA: case NO_RECOVERY: /* no MX data on this host */ goto punt; case HOST_NOT_FOUND: /* the host just doesn't exist */ *rcode = EX_NOHOST; break; case TRY_AGAIN: /* couldn't connect to the name server */ if (!UseNameServer && errno == ECONNREFUSED) goto punt; /* it might come up later; better queue it up */ *rcode = EX_TEMPFAIL; break; } Any pointers on what may be wrong? Where is this routine discussed, is all these systems documentation lacking? I am going to report this to IBM, but with an undocumented routine, it may be tricky. As I indicate, on a different system, all works ok. PS. I have verified the /etc/resolv.conf file to verify proper contents, it is identical to other systems at our site, and other hostname lookups are correctly working (telnet, rlogin, host, ect..). I have tested this on the RS6000 and also get the h_errno=4 just like the 2705 level RT. Thanks for any light you can shed on this funny routine and its orgins. Markw -- Mark Whetzel My comments are my own, not my company's. Western Geophysical - A division of Western Atlas International, A Litton/Dresser Company DOMAIN addr: markw@airgun.wg.waii.com UUNET address: uunet!airgun!markw
jackv@turnkey.tcc.com (Jack F. Vogel) (04/17/91)
In article <934@airgun.wg.waii.com> markw@airgun.wg.waii.com (Mark Whetzel) writes: >I am working with another programmer on porting the IDA sendmail >to the IBM RT running AIX 2.2.1. (yes its yucky IBM, but it works and >it's paid for :-) >The problem area of code is dealing with domain server queries in the >routine domain.c, in particular it is using the res_search system >subroutine. I can't find any documentation about this routine, and referencing >both AIX V3 (RS6000) and SUNOS (4.0.1) and CONVEXOS, none of these systems >also have documentation about this system subroutine. Hmmm, this is interesting I didn't think AIX on the RT or RS6000 even had the res_search() code. res_search() was introduced in BIND 4.8 if memory serves and the code on the RT or 6000 as shipped is much earlier than that. And, by the way, the simplest way to get the information you want is to get the BIND distribution, its available from Berkeley or most any other of the Internet archive sites. AIX 1.2 on the PS/2 or 370 has this code since I ported it, but since it was done in the service stream it is not in the documentation. This will be remedied in the AIX 1.2.1 docs. Is it possible that someone has hacked your system and added the later version of BIND (and from the sound of it got it wrong :-})?? You might try running sendmail with debug on and take a look at the info from the nameserver (that assumes of course that your resolver was built with DEBUG defined). Disclaimer: I don't speak for the company. -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM