hagan@DCCS.UPENN.EDU (John Dotts Hagan) (05/30/90)
This is not directly about bind itself, and is not actually a bug. So right of, my subject is bad - sorry! Anyways, it think it would be real neat of the resolver did some kind of performance/reliability remembering when going at its list of possible name servers to use. As it is now, we have three name servers for our campus (one is primary, and two secondaries). One of the secondaries is listed first in everyone's resolv.conf (or equivilent list of servers), and then the primary, and then the second secondary. When the first listed secondary dies (either named dumps core and leaves, or the system is toasted), everyone's resolver gets slow since it patiently tries to query the first listed name server, then after a timeout moves on the the next one, and so forth. However, it does not remember that it just had some trouble with the first server, and tries it again for the next request. It would be great if the first user who tries a telnet (or whatever) suffered the hit once for that host, then other tries would quickly just go at a working name server. Perhaps dead name servers could be routinely retried and some stats kept on them (I think bind already does this sort of thing when dealing with the list of root servers, so at least there is some precedent for this kind of behavior). Any thoughts? --Kid.
del@thrush.mlb.semi.harris.com (Don Lewis) (05/30/90)
In article <25358@netnews.upenn.edu> hagan@DCCS.UPENN.EDU (John Dotts Hagan) writes: > >Anyways, it think it would be real neat of the resolver did some kind of >performance/reliability remembering when going at its list of possible name >servers to use. > >As it is now, we have three name servers for our campus (one is primary, and >two secondaries). One of the secondaries is listed first in everyone's >resolv.conf (or equivilent list of servers), and then the primary, and then >the second secondary. > >When the first listed secondary dies (either named dumps core and leaves, or >the system is toasted), everyone's resolver gets slow since it patiently tries >to query the first listed name server, then after a timeout moves on the the >next one, and so forth. However, it does not remember that it just had some >trouble with the first server, and tries it again for the next request. You might want to list each of these first in one third of the hosts in order to better distribute the load. This way, only 1/3rd of the hosts will slow down when one of the servers dies (but this will happen three times as often). > >It would be great if the first user who tries a telnet (or whatever) suffered >the hit once for that host, then other tries would quickly just go at a working >name server. Perhaps dead name servers could be routinely retried and some >stats kept on them (I think bind already does this sort of thing when dealing >with the list of root servers, so at least there is some precedent for this >kind of behavior). > Well, there is sort of a problem here. You probably have no such thing as *the* resolver. Everything that you run that wants to do host<->address translation uses the resolver library routines and is a separate instance of a resolver. Each time you fire up telnet, it starts up from scratch and has no history available concerning the status of the various servers. If a particular process does a lot of host<->address translations, then it probably could figure out what was going on and make use of this information, but if it only does one translation, by the time it figures out which server is the hot one to use, it has no further need to use it. I suppose that you could read this information from a file and update it, but then you have to be able to handle multiple simultaneous accesses and updates to this file 8-( If you are running a somewhat modern BIND (>4.8?), then you can run it on each host and configure it to forward all its queries to the campus servers. BIND appears not to keep track of the performance of its forwarders, so I suppose that would be better if it did something like what it does for the root servers. Running BIND on each host also has the advantage that the answers to frequently asked questions are cached locally on the host which will reduce the load on the campus servers. Be forwarned that the forwarding stuff doesn't quite work right even in 4.8.1. Hopefully there will be a cleaner release soon. -- Don "Truck" Lewis Harris Semiconductor Internet: del@mlb.semi.harris.com PO Box 883 MS 62A-028 Phone: (407) 729-5205 Melbourne, FL 32901
philipp@GIPSI.GIPSI.FR (Philippe Prindeville) (06/01/90)
The resolver shouldn't be a system call (since it is session layer), but at the same time it shouldn't be linked into every applicaton, since you might want to upgrade it or add functionality. Really, the best thing to do is to RPC to a local resolver daemon that can do things like measure which servers give timely responses, and which ones are south. Also, things like address sorting based on policy constraints could be done there also. If this sounds like ypbind to you, Don't Panic. I'm not advocating yellow pages. Just saying there should be more thread/RPC type design in UNIX. It would be nice if the IAB would say type-X RPC (XDR, etc) will be the official protocol of the Internet, so we could get on with the design of system servers... -Philip
del@mlb.semi.harris.com (Don Lewis) (06/01/90)
On May 31, 4:16pm, Philippe Prindeville wrote: } Subject: Re: BIND bug list > The resolver shouldn't be a system call (since it is session layer), > but at the same time it shouldn't be linked into every applicaton, > since you might want to upgrade it or add functionality. If your machine has shared libraries (ala Sun), you can put the resolver code there. You can then upgrade the resolver code and all your applications will see the changes. > Really, > the best thing to do is to RPC to a local resolver daemon that can > do things like measure which servers give timely responses, and > which ones are south. Also, things like address sorting based on > policy constraints could be done there also. If you get a BIND version >= 4.8, you can do this now if you don't list any name server addresses in /etc/resolve.conf, and list the real name servers as forwarders in /etc/named.boot. BIND is also kind enough to do caching for you, which is a definite win. The only thing lacking is that BIND doesn't keep track of performance data, but that should probably not be too hard to add. The problems with this approach are: * If the local BIND process dies, your host will be very unhappy * The local BIND process is not able to distinguish between a forwarder being down and a query that the forwarder is having trouble resolving (due to the nameservers it is querying not responding). If you list only one forwarder, you are killed by the first case. If you list multiple forwarders, you may time out the first query and try the next forwarder, which will not have any better luck. This is a good way to thrash the internet, and it will also muck up your performance data. Basically recursive queries and UDP don't mix very well. Recursive queries work fine with TCP, but then you are limited as far as the number of simulataneous queries that you can support. > > If this sounds like ypbind to you, Don't Panic. I'm not advocating > yellow pages. Just saying there should be more thread/RPC type > design in UNIX. > > It would be nice if the IAB would say type-X RPC (XDR, etc) will be > the official protocol of the Internet, so we could get on with the > design of system servers... I don't think there needs to be a new protocol, maybe just some enhancements to the existing protocol to better support recursive queries.
philipp@GIPSI.GIPSI.FR (Philippe Prindeville) (06/02/90)
From: del@mlb.semi.harris.com (Don Lewis) Date: Thu, 31 May 1990 16:13:20 EDT Subject: Re: BIND bug list If your machine has shared libraries (ala Sun), you can put the resolver code there. You can then upgrade the resolver code and all your applications will see the changes. Yes, but there is more than just Suns out there. A more portable solution is needed. > Really, > the best thing to do is to RPC to a local resolver daemon that can > do things like measure which servers give timely responses, and > which ones are south. Also, things like address sorting based on > policy constraints could be done there also. If you get a BIND version >= 4.8, you can do this now if you don't list any name server addresses in /etc/resolve.conf, and list the real name servers as forwarders in /etc/named.boot. BIND is also kind enough to do caching for you, which is a definite win. The only thing lacking is that BIND doesn't keep track of performance data, but that should probably not be too hard to add. One of us is confused here: I thought we were talking solely about the resolver, not the nameserver as well. I don't think installing the nameserver on every workstation is a solution. It would give a rich local cache, but it would be better to have one or two forwarding nameservers for the local site, and let them accumulate the cache. The problems with this approach are: * If the local BIND process dies, your host will be very unhappy * The local BIND process is not able to distinguish between a forwarder being down and a query that the forwarder is having trouble resolving (due to the nameservers it is querying not responding). If you list only one forwarder, you are killed by the first case. If you list multiple forwarders, you may time out the first query and try the next forwarder, which will not have any better luck. This is a good way to thrash the internet, and it will also muck up your performance data. Yes, another reason not to run a nameserver on every host. Basically recursive queries and UDP don't mix very well. Recursive queries work fine with TCP, but then you are limited as far as the number of simulataneous queries that you can support. Yes, so? Apropos of what? > If this sounds like ypbind to you, Don't Panic. I'm not advocating > yellow pages. Just saying there should be more thread/RPC type > design in UNIX. > > It would be nice if the IAB would say type-X RPC (XDR, etc) will be > the official protocol of the Internet, so we could get on with the > design of system servers... I don't think there needs to be a new protocol, maybe just some enhancements to the existing protocol to better support recursive queries. Eh, no: I meant a protocol for RPC between processes, such as a client program (maybe telnet) wanting to talk to a server (such as a local resolver daemon). -Philip
david@twg.com (David S. Herron) (06/06/90)
In article <25358@netnews.upenn.edu> hagan@DCCS.UPENN.EDU (John Dotts Hagan) writes: [Deleted tale of woe involving having only 3 nameservers on campus and the slowness that results in client resolvers when one of the nameservers dies..] >It would be great if the first user who tries a telnet (or whatever) suffered >the hit once for that host, then other tries would quickly just go at a working >name server. Why not go ahead and run nameservers on every machine capable of it? You'd (of course) set up the nameservers so they're slaves and forward to the primaries for your campus. Then set up the resolv.conf so that it first queries the local server then goes to others on campus. This way if one of the busy nameservers dies the answers will more than likely be cached in the local nameserver. Your local nameserver is less likely to die since it's less busy (it doesn't have to service everybody y'see). > Perhaps dead name servers could be routinely retried and some >stats kept on them (I think bind already does this sort of thing when dealing >with the list of root servers, so at least there is some precedent for this >kind of behavior). You already have the answer... BIND does what you want already so use it. -- <- David Herron, an MMDF weenie, <david@twg.com> <- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu> <- <- Sign me up for one "I survived Jaka's Story" T-shirt!