[comp.protocols.tcp-ip.domains] BIND bug list

hagan@DCCS.UPENN.EDU (John Dotts Hagan) (05/30/90)

This is not directly about bind itself, and is not actually a bug.  So right
of, my subject is bad - sorry!

Anyways, it think it would be real neat of the resolver did some kind of
performance/reliability remembering when going at its list of possible name
servers to use.

As it is now, we have three name servers for our campus (one is primary, and
two secondaries).  One of the secondaries is listed first in everyone's
resolv.conf (or equivilent list of servers), and then the primary, and then
the second secondary.

When the first listed secondary dies (either named dumps core and leaves, or
the system is toasted), everyone's resolver gets slow since it patiently tries
to query the first listed name server, then after a timeout moves on the the
next one, and so forth.  However, it does not remember that it just had some
trouble with the first server, and tries it again for the next request.

It would be great if the first user who tries a telnet (or whatever) suffered
the hit once for that host, then other tries would quickly just go at a working
name server.  Perhaps dead name servers could be routinely retried and some
stats kept on them (I think bind already does this sort of thing when dealing
with the list of root servers, so at least there is some precedent for this
kind of behavior).

Any thoughts?

--Kid.

del@thrush.mlb.semi.harris.com (Don Lewis) (05/30/90)

In article <25358@netnews.upenn.edu> hagan@DCCS.UPENN.EDU (John Dotts Hagan) writes:
>
>Anyways, it think it would be real neat of the resolver did some kind of
>performance/reliability remembering when going at its list of possible name
>servers to use.
>
>As it is now, we have three name servers for our campus (one is primary, and
>two secondaries).  One of the secondaries is listed first in everyone's
>resolv.conf (or equivilent list of servers), and then the primary, and then
>the second secondary.
>
>When the first listed secondary dies (either named dumps core and leaves, or
>the system is toasted), everyone's resolver gets slow since it patiently tries
>to query the first listed name server, then after a timeout moves on the the
>next one, and so forth.  However, it does not remember that it just had some
>trouble with the first server, and tries it again for the next request.

You might want to list each of these first in one third of the hosts in
order to better distribute the load.  This way, only 1/3rd of the hosts
will slow down when one of the servers dies (but this will happen three
times as often).

>
>It would be great if the first user who tries a telnet (or whatever) suffered
>the hit once for that host, then other tries would quickly just go at a working
>name server.  Perhaps dead name servers could be routinely retried and some
>stats kept on them (I think bind already does this sort of thing when dealing
>with the list of root servers, so at least there is some precedent for this
>kind of behavior).
>
Well, there is sort of a problem here.  You probably have no such thing
as *the* resolver.  Everything that you run that wants to do host<->address
translation uses the resolver library routines and is a separate instance
of a resolver.  Each time you fire up telnet, it starts up from scratch
and has no history available concerning the status of the various servers.
If a particular process does a lot of host<->address translations, then it
probably could figure out what was going on and make use of this
information, but if it only does one translation, by the time it figures
out which server is the hot one to use, it has no further need to use it.
I suppose that you could read this information from a file and update it,
but then you have to be able to handle multiple simultaneous accesses and
updates to this file 8-(

If you are running a somewhat modern BIND (>4.8?), then you can run it
on each host and configure it to forward all its queries to the campus
servers.  BIND appears not to keep track of the performance of its
forwarders, so I suppose that would be better if it did something like
what it does for the root servers.  Running BIND on each host also has
the advantage that the answers to frequently asked questions are cached
locally on the host which will reduce the load on the campus servers.
Be forwarned that the forwarding stuff doesn't quite work right even in
4.8.1.  Hopefully there will be a cleaner release soon.
--
Don "Truck" Lewis                      Harris Semiconductor
Internet:  del@mlb.semi.harris.com     PO Box 883   MS 62A-028
Phone:     (407) 729-5205              Melbourne, FL  32901

philipp@GIPSI.GIPSI.FR (Philippe Prindeville) (06/01/90)

The resolver shouldn't be a system call (since it is session layer),
but at the same time it shouldn't be linked into every applicaton,
since you might want to upgrade it or add functionality.  Really,
the best thing to do is to RPC to a local resolver daemon that can
do things like measure which servers give timely responses, and
which ones are south.  Also, things like address sorting based on
policy constraints could be done there also.

If this sounds like ypbind to you, Don't Panic.  I'm not advocating
yellow pages.  Just saying there should be more thread/RPC type
design in UNIX.

It would be nice if the IAB would say type-X RPC (XDR, etc) will be
the official protocol of the Internet, so we could get on with the
design of system servers...

-Philip

del@mlb.semi.harris.com (Don Lewis) (06/01/90)

On May 31,  4:16pm, Philippe Prindeville wrote:
} Subject: Re: BIND bug list
> The resolver shouldn't be a system call (since it is session layer),
> but at the same time it shouldn't be linked into every applicaton,
> since you might want to upgrade it or add functionality.
If your machine has shared libraries (ala Sun), you can put the
resolver code there.  You can then upgrade the resolver code and
all your applications will see the changes.

> Really,
> the best thing to do is to RPC to a local resolver daemon that can
> do things like measure which servers give timely responses, and
> which ones are south.  Also, things like address sorting based on
> policy constraints could be done there also.
If you get a BIND version >= 4.8, you can do this now if you don't list
any name server addresses in /etc/resolve.conf, and list the real name
servers as forwarders in /etc/named.boot.  BIND is also kind enough
to do caching for you, which is a definite win.  The only thing lacking
is that BIND doesn't keep track of performance data, but that should
probably not be too hard to add.

The problems with this approach are:
  * If the local BIND process dies, your host will be very unhappy
  * The local BIND process is not able to distinguish between a
    forwarder being down and a query that the forwarder is having
    trouble resolving (due to the nameservers it is querying not
    responding).  If you list only one forwarder, you are killed
    by the first case.  If you list multiple forwarders, you may
    time out the first query and try the next forwarder, which will
    not have any better luck.  This is a good way to thrash the
    internet, and it will also muck up your performance data.

Basically recursive queries and UDP don't mix very well.  Recursive
queries work fine with TCP, but then you are limited as far as the
number of simulataneous queries that you can support.

> 
> If this sounds like ypbind to you, Don't Panic.  I'm not advocating
> yellow pages.  Just saying there should be more thread/RPC type
> design in UNIX.
> 
> It would be nice if the IAB would say type-X RPC (XDR, etc) will be
> the official protocol of the Internet, so we could get on with the
> design of system servers...
I don't think there needs to be a new protocol, maybe just some
enhancements to the existing protocol to better support recursive
queries.

philipp@GIPSI.GIPSI.FR (Philippe Prindeville) (06/02/90)

	From: del@mlb.semi.harris.com (Don Lewis)
	Date: Thu, 31 May 1990 16:13:20 EDT
	Subject: Re: BIND bug list
	
	If your machine has shared libraries (ala Sun), you can put the
	resolver code there.  You can then upgrade the resolver code and
	all your applications will see the changes.

Yes, but there is more than just Suns out there.  A more portable
solution is needed.
	
	> Really,
	> the best thing to do is to RPC to a local resolver daemon that can
	> do things like measure which servers give timely responses, and
	> which ones are south.  Also, things like address sorting based on
	> policy constraints could be done there also.
	If you get a BIND version >= 4.8, you can do this now if you don't list
	any name server addresses in /etc/resolve.conf, and list the real name
	servers as forwarders in /etc/named.boot.  BIND is also kind enough
	to do caching for you, which is a definite win.  The only thing lacking
	is that BIND doesn't keep track of performance data, but that should
	probably not be too hard to add.

One of us is confused here:  I thought we were talking solely about
the resolver, not the nameserver as well.  I don't think installing
the nameserver on every workstation is a solution.  It would give
a rich local cache, but it would be better to have one or two forwarding
nameservers for the local site, and let them accumulate the cache.
	
	The problems with this approach are:
	  * If the local BIND process dies, your host will be very unhappy
	  * The local BIND process is not able to distinguish between a
	    forwarder being down and a query that the forwarder is having
	    trouble resolving (due to the nameservers it is querying not
	    responding).  If you list only one forwarder, you are killed
	    by the first case.  If you list multiple forwarders, you may
	    time out the first query and try the next forwarder, which will
	    not have any better luck.  This is a good way to thrash the
	    internet, and it will also muck up your performance data.

Yes, another reason not to run a nameserver on every host.

	Basically recursive queries and UDP don't mix very well.  Recursive
	queries work fine with TCP, but then you are limited as far as the
	number of simulataneous queries that you can support.

Yes, so?  Apropos of what?

	> If this sounds like ypbind to you, Don't Panic.  I'm not advocating
	> yellow pages.  Just saying there should be more thread/RPC type
	> design in UNIX.
	> 
	> It would be nice if the IAB would say type-X RPC (XDR, etc) will be
	> the official protocol of the Internet, so we could get on with the
	> design of system servers...
	I don't think there needs to be a new protocol, maybe just some
	enhancements to the existing protocol to better support recursive
	queries.

Eh, no:  I meant a protocol for RPC between processes, such as a
client program (maybe telnet) wanting to talk to a server (such as
a local resolver daemon).

-Philip

david@twg.com (David S. Herron) (06/06/90)

In article <25358@netnews.upenn.edu> hagan@DCCS.UPENN.EDU (John Dotts Hagan) writes:
[Deleted tale of woe involving having only 3 nameservers on campus and the
 slowness that results in client resolvers when one of the nameservers dies..]

>It would be great if the first user who tries a telnet (or whatever) suffered
>the hit once for that host, then other tries would quickly just go at a working
>name server.


Why not go ahead and run nameservers on every machine capable of it?
You'd (of course) set up the nameservers so they're slaves and forward
to the primaries for your campus.  Then set up the resolv.conf so that
it first queries the local server then goes to others on campus.

This way if one of the busy nameservers dies the answers will more
than likely be cached in the local nameserver.  Your local nameserver
is less likely to die since it's less busy (it doesn't have to service
everybody y'see).

>  Perhaps dead name servers could be routinely retried and some
>stats kept on them (I think bind already does this sort of thing when dealing
>with the list of root servers, so at least there is some precedent for this
>kind of behavior).

You already have the answer... BIND does what you want already
so use it.


-- 
<- David Herron, an MMDF weenie, <david@twg.com>
<- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
<-
<- Sign me up for one "I survived Jaka's Story" T-shirt!