[comp.mail.headers] Xerox experience with broken mailer behaviour

JLarson.pa@Xerox.COM (02/15/88)

We at Xerox were badly burned recently by sites which; 

1)  Do not use the domain name system to look up IP addresses

2)  Pick the "best" address (usually net 10) from the host table, 
    and never try any other address in the list.  

Our story should explain why I now consider 2) to be severely broken mailer
behaviour.  We installed an IP gateway at our PSN where the Xerox.Com mail
gateway used to be, so the address for Xerox.Com changed from a net 10 address
to a net 13 subnet address.  The Xerox.Com domain name servers had the correct
information, so those sites properly using the domain name system did not have a
problem.  But our update to the host table (to remove the net 10 address from
Xerox.Com) got stalled in DDN red tape for several weeks (even though that same
net net 10 address was already in the host table in the GATEWAY entry!).  So,
although we were allowed to place the correct net 13 subnet address as the first
address in the host table, the bogus net 10 address (now the IP gateway) could
not be removed from the host table for those several weeks.  

You guessed it;  those mailers in 2) picking the "best" address from the host
table were picking a BOGUS address, mail failed to go through for several weeks
from these sites, and the result was irate many mail users and flurries of mail
between postmasters.  Note that the equivalent problem would have resulted if
that net 10 interface went down for an extended period, and there were an
alternate route to our net 13 subnet.  (Yes folks, network interfaces do go
down, sometimes even for extended periods.)  

There is little excuse for this broken mailer behaviour since it should be easy
to fix, even for mailers which are very heavily loaded.  

The solution is quite simple;  using that "best" address is fine for the FIRST
attempt, but on later retries the BEST address to use is a DIFFERENT address in
the address list.  Simply rotate the previously used address to the end of the
list, and use the next address in the address list on the next retry attempt.
Note that you need only try one address each attempt, so there is no detrimental
impact on a busy mailer. (The Xerox.com mailer implementation in fact knows how
busy it is and adjusts timeout values and multiple address attempts accordingly
to keep keep sick sites from affecting throughput when busy.  When not busy, it
tries all the addresses, and uses longer timeout values.)

Hopefully this message will prompt certain mail implementers to fix their broken
mailers, and prompt sites to start using the domain name system so others can
avoid our painful transition experience.  


Cheers,

John Larson
Xerox PARC