tcs@USNA.ARPA.UUCP (02/26/87)
Something that I've not seen discussed on this list is the optimum TTL values for host entries, etc in the domain database. Anyone have any numbers on the amount of traffic this stuff generates and the affect this number has on network traffic? For most large hosts, a minimum of 24 hours seems reasonable and 7 days would not be out of line. Small workstations that move around a lot could have smaller values. Some hosts have a TTL value of 4 hours. Is this really necessary? Administrators of zones should be able to identify major hosts which don't change very often and increase the TTL accordingly. The value could be reduced to a few hours prior to a change. Also, what should be a good value for the nameserver timeout in waiting for a reply. I've found that we typically time out two or three times before receiving a reply. This means that several extraneous packets are injected into the network each time we attempt to resolve an address. Am I'm missing some issues regarding these values? -tcs Terry Slattery U.S. Naval Academy 301-267-4413
jb@cs.brown.edu.UUCP (02/26/87)
Over time, my idea of what the optimum time should be has been increasing. In general, I feel that 24 hours is about the correct value. One major issue is how long various other software will wait for a change. Sendmail will attempt to deliver a message for 3 days (as distributed). One would like to have any changes seen in less than 3 days. There are a couple reasons for data to change. First, a planned change to the network configuration. This can be planned for in advance by reducing the TTL. Don't forget that the reduction must be made at a time longer than the TTL in advance. Consider how long in advance you would be planning a move. Another reason for a change is due to an unanticipated failure. If one of your primary machines (such as a mail forwarder) goes down for a few days, attempts to bypass the failure require the length of the TTL to be fully realized. Coming from Berkeley and being involved with some of the early distributions of BIND, I'll admit we made a mistake in what we had in the sample files. Many people just copied our samples and did not analyze the situation. Our samples should have had TTL's that were longer than 1 hour. We did not realize this originally ourselves and were guilty of using too short of a TTL for a long time. These problems take time to work out. As far as the question of what should be used as the timeout waiting for a reply, I'm not sure of what is the correct answer. There are 3 timeouts to consider in this case. First, total time to wait for any response before indicating a failure. Second, the time between trying different servers for the domain. And third, the time between tries to the same server. The first of these is a user interface question on one hand, and a performance issue on the other. How long should a user who tries to telnet to some host have to wait before being told that the host is unknown (possibly only temporarily)? I don't like to wait a long time, but on the other hand, the longer the wait the more likely to succeed. BIND is currently using about one minute for this. The other two are intertwined and also are a part of the first one. UDP which is used primarily for queries is not reliable. If one knows that the original packet was lost, then a retry to one of the servers is in order. If the delay is in network round trip time (RTT), then the time between the retries should be lengthened. To decide what these times should be, several questions to be answered. How long should the user wait for a response? How many queries total should be sent out in trying to resolve the name? How many queries should be made to each server for the domain? What should the retry algorithm be (linear, exponential, something else)? If recursion is being done by another process, how does that affect these values? I'm not sure what is being used in BIND at the moment. It actually uses two different algorithms. One for talking to the local server, and another for dealing with recursion. Some work on the algorithms has been done for the most recent release and I haven't had a chance to look at the code. Jim Bloom