[comp.unix.aix] Caching only nameserver breaking

ced@bcstec.uucp (Charles Derykus) (05/16/91)

Our caching only nameserver running on an RS6000, rev 3005 has started to
break intermittently. It is also a resolver. /etc/resolv.conf is listed
below:  
nameserver	127.0.0.1
nameserver	128.207.254.223
nameserver	128.207.254.44
nameserver	136.240.1.21
domain	ca.boeing.com

When the nameserver breaks, it stops resolving from the local nameserver
cache and looks only to /etc/hosts.  It also refuses to query other
nameservers.

The really weird thing is that it seems to get well after an indeterminate
amount of time.  Suddenly, the local nameserver starts resolving, apparently
none the worse for wear.

Sometimes I can stop/restart named to make it well, but this doesn't always
work.

Has anyone seen this or have any theories?

Any help greatly appreciated.

Charles DeRykus				Internet:   ced@bcstec.boeing.com
Boeing Computer Services		UUCP:	    ...!uunet!bcstec!ced
Renton, WA.  M/S 6R-37			(206) 234-9223

jackv@turnkey.tcc.com (Jack F. Vogel) (05/16/91)

In article <856@bcstec.boeing.com> ced@bcstec.uucp (Charles Derykus) writes:
|
|Our caching only nameserver running on an RS6000, rev 3005 has started to
|break intermittently. It is also a resolver. /etc/resolv.conf is listed
|below:  
|nameserver	127.0.0.1
|nameserver	128.207.254.223
|nameserver	128.207.254.44
|nameserver	136.240.1.21
|domain	ca.boeing.com
|
|When the nameserver breaks, it stops resolving from the local nameserver
|cache and looks only to /etc/hosts.  It also refuses to query other
|nameservers.
 
You're a bit confused here, at least terminology wise, the nameserver
NEVER looks at /etc/hosts, neither does the resolver. It is only the
routine gethostbyname() which does this, and normally only when a query
to the nameserver fails to resolve an address.

You also don't say WHY you believe it is "broken". How do you know it has
quit resolving names, what are the visible symptoms, knowing that may
help figuring out what is really wrong.

>Has anyone seen this or have any theories?
 
There just isn't enough info here for theories. What you might find useful
is when you believe the nameserver is in this non-functional state, turn
on its debugging log (read the man page on named for details), try some
queries and then look at the log for what it is seeing. I suppose its
possible that the named process has gone off to sleep() somewhere below
PZERO and simply isn't responding, in which case you won't get a log file
either, but that seems unlikely. Even better, would be to use nslookup
(which IBM doesn't ship so you would have to 'roll your own') to directly
interrogate the server when its in this state. For those unfamiliar with
it, 'nslookup' is a utility provided in the Berkeley BIND distribution,
anyone administering a network using a nameserver should have it around
for debugging problems.

Good Luck!

Disclaimer: I don't speak for my employer or IBM.

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

reilly@scotty.dccs.upenn.edu (G. Brendan Reilly) (06/02/91)

Actually, I just found that the 3005 nameserver will do a
partial zone load and still use the data.  When I teach the
DNS course it is emphasized over and over not to do this.

Am in the process of documenting this bug and submitting it
to IBM.  If you want to see the proof please send mail to reilly@sec.com.