[mod.protocols.tcp-ip] Domain host TTL fields

tcs@USNA.ARPA.UUCP (02/26/87)

Something that I've not seen discussed on this list is the optimum TTL
values for host entries, etc in the domain database.  Anyone have any
numbers on the amount of traffic this stuff generates and the affect this
number has on network traffic?

For most large hosts, a minimum of 24 hours seems reasonable and 7
days would not be out of line.  Small workstations that move around a
lot could have smaller values.  Some hosts have a TTL value of 4 hours.
Is this really necessary?  Administrators of zones should be able to
identify major hosts which don't change very often and increase the TTL
accordingly.  The value could be reduced to a few hours prior to a change.

Also, what should be a good value for the nameserver timeout in waiting
for a reply.  I've found that we typically time out two or three times
before receiving a reply.  This means that several extraneous packets
are injected into the network each time we attempt to resolve an
address.

Am I'm missing some issues regarding these values?

	-tcs
	Terry Slattery	  U.S. Naval Academy	301-267-4413

jb@cs.brown.edu.UUCP (02/26/87)

Over time, my idea of what the optimum time should be has been increasing.
In general, I feel that 24 hours is about the correct value.  One major
issue is how long various other software will wait for a change.  Sendmail
will attempt to deliver a message for 3 days (as distributed).  One would
like to have any changes seen in less than 3 days.

There are a couple reasons for data to change.  First, a planned change to
the network configuration.  This can be planned for in advance by reducing
the TTL.  Don't forget that the reduction must be made at a time longer
than the TTL in advance.  Consider how long in advance you would be planning
a move.  Another reason for a change is due to an unanticipated failure.
If one of your primary machines (such as a mail forwarder) goes down
for a few days, attempts to bypass the failure require the length of the
TTL to be fully realized.

Coming from Berkeley and being involved with some of the early distributions
of BIND, I'll admit we made a mistake in what we had in the sample files.
Many people just copied our samples and did not analyze the situation.  Our
samples should have had TTL's that were longer than 1 hour.  We did not
realize this originally ourselves and were guilty of using too short
of a TTL for a long time.  These problems take time to work out.

As far as the question of what should be used as the timeout waiting for
a reply, I'm not sure of what is the correct answer.  There are 3 timeouts
to consider in this case.  First, total time to wait for any response before
indicating a failure.  Second, the time between trying different servers
for the domain.  And third, the time between tries to the same server.

The first of these is a user interface question on one hand, and a performance
issue on the other.  How long should a user who tries to telnet to some host
have to wait before being told that the host is unknown (possibly only
temporarily)?  I don't like to wait a long time, but on the other hand,
the longer the wait the more likely to succeed.  BIND is currently using
about one minute for this.

The other two are intertwined and also are a part of the first one.  UDP
which is used primarily for queries is not reliable.  If one knows that
the original packet was lost, then a retry to one of the servers is in
order.  If the delay is in network round trip time (RTT), then the time
between the retries should be lengthened.  

To decide what these times should be, several questions to be answered.
How long should the user wait for a response?  How many queries total
should be sent out in trying to resolve the name?  How many queries should
be made to each server for the domain?  What should the retry algorithm
be (linear, exponential, something else)?  If recursion is being done
by another process, how does that affect these values?

I'm not sure what is being used in BIND at the moment.  It actually uses
two different algorithms.  One for talking to the local server, and another
for dealing with recursion.  Some work on the algorithms has been done for
the most recent release and I haven't had a chance to look at the code.

					Jim Bloom