[comp.mail.sendmail] Proper handling of MX records

rickert@mp.cs.niu.edu (Neil Rickert) (08/11/90)

 There seem to be some problems with the handling of MX records in sendmail.
My references are to sendmail-5.64.  Both the standard version and the IDA
version have this problem.  But the problem is really with the correct
interpretation of MX records.

 Suppose my domain name is:  MY.DOMAIN and my machine name is ME.MY.DOMAIN.
Further, and suppose there is a wild card MX record in the domain database -
	*	IN	MX	10	ME.MY.DOMAIN

 Suppose now that I wish to send mail to person@YOU.YOUR.DOMAIN.

 Sendmail looks up the domain database.  It discovers the MX record, and
returns ME.MY.DOMAIN for the FQDN of YOU.YOUR.DOMAIN.MY.DOMAIN.

 Sendmail then discards the record, since the preference is for the local
host, and then instead searches for an A record.  (Or more accurately,
that is the current design intent.  Due to a bug we are investigating it
does not always do this).

 -------------------- 

 The effect is this:

  I can send mail to YOU.YOUR.DOMAIN provided you are directly on Internet.
But if your address is an MX only address, sendmail is incapable of sending
you mail - at least in the presence of wild card MX records in my domain.

  Here are some possible solutions:

	(a) Don't use MX records.  (Clearly not acceptable).

	(b) If the the domain contains periods (as in YOU.YOUR.DOMAIN), 
	    sendmail should treat the name as fully qualified, and not
	    allow qualification in the local domain.  This means it is
	    up to sendmail.cf to ensure that the domain is fully qualified
	    before the TCP (or ether or ddn mailer, depending on your version)
	    is selected.

	    I suspect this would break some existing versions of sendmail.cf.

	(c) If the MX lookup fails (due to best preference being local),
	    then before looking up the A type address, try a second time
	    to find an MX record, but this time don't allow qualification in
	    the local domain.

	    This may be the most satisfactory.  But it does mean that another
	    local machine ANOTHER.MY.DOMAIN, in sending mail to YOU.YOUR.DOMAIN
	    will qualify the name as local, matching the wildcard, and forward
	    it to ME.MY.DOMAIN.  Then ME.MY.DOMAIN having the best preference
	    for the wildcard will fail to find the MX record, and do the second
	    MX search assuming fully qualified.  In other words there will be
	    additional hops in the mail delivery which should not have been
	    necessary.
-- 

=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Sci Dept, Northern Illinois U., DeKalb IL 60115
  InterNet, unix: rickert@cs.niu.edu              Bitnet, VM: T90NWR1@NIUCS

rickert@mp.cs.niu.edu (Neil Rickert) (08/12/90)

In article <1990Aug10.192705.8072@mp.cs.niu.edu> I wrote:
> There seem to be some problems with the handling of MX records in sendmail.
>My references are to sendmail-5.64.  Both the standard version and the IDA

 Here is some more information on the original problem which led to my earlier
comments.  At the end is the patch used.

  There is an address for HOST2, with two MX records.  Preference 10
  selects HOST1, while preference 100 selects HOST3.  HOST1 is trying
  to send mail to HOST2.  deliver.c calls getmxrr() to find the MX
  records, if any.  Consider the following two possible cases:

    Case 1:  The DNS resolver library returns the two MX records in the
	     order HOST1, HOST3.  getmxrr discovers that HOST1 is itself,
	     discards the record after setting 'localpref' to the preference.

	     getmxrr now examines the record for HOST3.  Since the preference
	     is >= localpref, it discards this record also.

	     getmxrr no decides that there are no valid MX records.  It then
	     returns HOST2, with a count of 1.

	     deliver.c now finds the Internet address of HOST2, and correctly
	     sends the mail.

	     NOTE: If there is only one MX record the same behavior occurs.

     Case 2: The DNS returns the records in the order HOST3, HOST1.  Here
	     getmxrr first sees HOST3, and places it in its response array.
	     Next it sees HOST1, recognizes it as itself, discards it after
	     setting 'localpref' to 10.

	     Now getmxrr() sorts the response array, and rejects any host
	     (HOST3) whose preference is >= localpref.  Since the response
	     array is now empty it returns an error indicator, and deliver.c
	     aborts with a 'configuration error' message.

 Clearly something is wrong here.  The behavior of sendmail should not depend
on the order in which the DNS resolver happens to return MX records.  The
enclosed patch changes the behavior so that the result of Case2 is the same
as that of Case 1.  It is my understanding that this is what the RFCs call for.
Namely: Only MX records with preference < local host preference are considered;
	If there are no MX records, and A record is sought.

 In the actual case where the problem arose, HOST2 was HOST3, but the MX record
returned was a bogus MX record due to qualifying HOST2 in the local domain
and matching a wildcard MX record.

 This solved the immediate problem.  But actually it is still technically wrong
since HOST2 does have its own MX records which are not in the local domain,
but they can never be found by the current code in getmxrr() if a wildcard
MX record exists.

 I suspect the correct behavior is that when no MX records are found, a second
attempt should be made, this time inhibiting qualification in the local domain.
Only after the second attempt should getmxrr() indicate no MX records found.

 Here is the patch which eliminates case 2 above.
           ------
*** /tmp/,RCSt1031327	Sat Aug 11 14:34:01 1990
--- domain.c	Sat Aug 11 14:33:03 1990
***************
*** 158,167 ****
  			 * the best choice left, we should have realized
  			 * awhile ago that this was a local delivery.
  			 */
! 			if (i == 0) {
! 				*rcode = EX_CONFIG;
! 				return(-1);
! 			}
  			nmx = i;
  			break;
  		}
--- 158,165 ----
  			 * the best choice left, we should have realized
  			 * awhile ago that this was a local delivery.
  			 */
! 			if (i == 0)
! 				goto punt;
  			nmx = i;
  			break;
  		}
-- 

=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Sci Dept, Northern Illinois U., DeKalb IL 60115
  InterNet, unix: rickert@cs.niu.edu              Bitnet, VM: T90NWR1@NIUCS

del@thrush.mlb.semi.harris.com (Don Lewis) (08/17/90)

In article <1990Aug11.200440.27899@mp.cs.niu.edu> rickert@mp.cs.niu.edu (Neil Rickert) writes:
>In article <1990Aug10.192705.8072@mp.cs.niu.edu> I wrote:
>> There seem to be some problems with the handling of MX records in sendmail.
>>My references are to sendmail-5.64.  Both the standard version and the IDA
>
> Here is some more information on the original problem which led to my earlier
>comments.  At the end is the patch used.
>
>  There is an address for HOST2, with two MX records.  Preference 10
>  selects HOST1, while preference 100 selects HOST3.  HOST1 is trying
>  to send mail to HOST2.  deliver.c calls getmxrr() to find the MX
>  records, if any.  Consider the following two possible cases:
>
>    Case 1:  The DNS resolver library returns the two MX records in the
>	     order HOST1, HOST3.  getmxrr discovers that HOST1 is itself,
>	     discards the record after setting 'localpref' to the preference.
>
>	     getmxrr now examines the record for HOST3.  Since the preference
>	     is >= localpref, it discards this record also.
>
>	     getmxrr no decides that there are no valid MX records.  It then
>	     returns HOST2, with a count of 1.
>
>	     deliver.c now finds the Internet address of HOST2, and correctly
>	     sends the mail.
>
>	     NOTE: If there is only one MX record the same behavior occurs.
>
>     Case 2: The DNS returns the records in the order HOST3, HOST1.  Here
>	     getmxrr first sees HOST3, and places it in its response array.
>	     Next it sees HOST1, recognizes it as itself, discards it after
>	     setting 'localpref' to 10.
>
>	     Now getmxrr() sorts the response array, and rejects any host
>	     (HOST3) whose preference is >= localpref.  Since the response
>	     array is now empty it returns an error indicator, and deliver.c
>	     aborts with a 'configuration error' message.
>
> Clearly something is wrong here.  The behavior of sendmail should not depend
>on the order in which the DNS resolver happens to return MX records.  The
>enclosed patch changes the behavior so that the result of Case2 is the same
>as that of Case 1.  It is my understanding that this is what the RFCs call for.
>Namely: Only MX records with preference < local host preference are considered;
>	If there are no MX records, and A record is sought.

Well, the RFCs (specifically RFC974) anticpate this problem, but leave the
actual behavior up to the implementor.  Either action mentioned above is
technically legal, but it is probably a bug that the action taken depends
on the order of the MX records.  Here is the relevent section from RFC976
(note that the use of WKS is currently not recommended by RFC1123):

]Interpreting the List of MX RRs
]
]   [Note deleted]
]
]   It is possible that the list of MXs in the response to the query will
]   be empty.  This is a special case.  If the list is empty, mailers
]   should treat it as if it contained one RR, an MX RR with a preference
]   value of 0, and a host name of REMOTE.  (I.e., REMOTE is its only
]   MX).  In addition, the mailer should do no further processing on the
]   list, but should attempt to deliver the message to REMOTE.  The idea
]   here is that if a domain fails to advertise any information about a
]   particular name we will give it the benefit of the doubt and attempt
]   delivery.
]
]   If the list is not empty, the mailer should remove irrelevant RR's
]   from the list according to the following steps.  Note that the order
]   is significant.
]
]      For each MX, a WKS query should be issued to see if the domain
]      name listed actually supports the mail service desired.  MX RRs
]      which list domain names which do not support the service should be
]      discarded.  This step is optional, but strongly encouraged.
]
]      If the domain name LOCAL is listed as an MX RR, all MX RRs with a
]      preference value greater than or equal to that of LOCAL's must be
]      discarded.
]
]   After removing irrelevant RRs, the list can again be empty.  This is
]   now an error condition and can occur in several ways.  The simplest
]   case is that the WKS queries have discovered that none of the hosts
]   listed supports the mail service desired.  The message is thus deemed
]   undeliverable, though extremely persistent mail systems might want to
]   try a delivery to REMOTE's address (if it exists) before returning
]   the message. Another, more dangerous, possibility is that the domain
]   system believes that LOCAL is handling message for REMOTE, but the
]   mailer on LOCAL is not set up to handle mail for REMOTE.  For
]   example, if the domain system lists LOCAL as the only MX for REMOTE,
]   LOCAL will delete all the entries in the list.  But LOCAL is
]   presumably querying the domain system because it didn't know what to
]   do with a message addressed to REMOTE. Clearly something is wrong.
]   How a mailer chooses to handle these situations is to some extent
]   implementation dependent, and is thus left to the implementor's
]   discretion.

> In the actual case where the problem arose, HOST2 was HOST3, but the MX record
>returned was a bogus MX record due to qualifying HOST2 in the local domain
>and matching a wildcard MX record.
>
> This solved the immediate problem.  But actually it is still technically wrong
>since HOST2 does have its own MX records which are not in the local domain,
>but they can never be found by the current code in getmxrr() if a wildcard
>MX record exists.
>
> I suspect the correct behavior is that when no MX records are found, a second
>attempt should be made, this time inhibiting qualification in the local domain.
>Only after the second attempt should getmxrr() indicate no MX records found.

My reading of RFC1123 suggests that the name used to query for MX records
should already be canonicalized (fully qualified, not a nickname or
abbreviation).   In that case the name should never be qualifed with the
local domain.  If the implementation tacks on the local domain when
searching for MX records, the following wierdness can happen:

    Assume that there is a wildcard MX record for *.DOMAIN1.COM that
    points to HOST1.DOMAIN1.COM.  If HOST2.DOMAIN1.COM wants to send
    mail to DOMAIN2.EDU, it's query for MX records will find an MX
    record for DOMAIN2.EDU.DOMAIN1.COM, therefore, HOST2 to will
    forward the mail to HOST1.

My personal opinion is that wildcard MX records are not appropriate
when they are visible to hosts within the domain.  It just seems
to cause problems and makes automatic canonicalization of host names
much more difficult.  The only appropriate use for wildcard MX records
is for destinations off the internet where it is too cumbersome
to keep the zone file up to date with the current set of hosts in
the domain.  Although it is more cumbersome, explicit MX records for
each host listed in the zone file is the appropriate approach.  I
believe that this would be a lot more palatable (and less error prone)
if there was a means of indicating this with a wildcard-like notation
in the zone file (it would act like a wildcard MX, but only for actual
hosts listed in the zone file).  And since there should also be MX
records with lower preference values pointing to the hosts themselves,
it would be nice to have a wildcard-like way of doing this.  I realize
that both of these goals may be realized more or less automatically
with commonly availble tools, but it would still be nice if this were
built in, and if the name server were smart it would have less to store.
--
Don "Truck" Lewis                      Harris Semiconductor
Internet:  del@mlb.semi.harris.com     PO Box 883   MS 62A-028
Phone:     (407) 729-5205              Melbourne, FL  32901