[comp.sys.pyramid] Strange behavior of named

hassler@asd.wpafb.af.MIL (Barry D. Hassler) (08/25/88)

First of all, lets get the versions out of the way: I am  running
BIND 4.8 on Pyramid 9010's (98Xe's in reality). I have two ident-
ically configured systems operating as redundant network gateways
between  MILNET  and 8 local networks, and also acting as primary
domain servers for the WPAFB.AF.MIL domain.  These two hosts  are
NAP1.ARPA (26.4.0.176) and NAP2.ARPA (26.18.0.124).

For the past two days, I have been having very regular crashes of
named on one of these systems (NAP1.ARPA, NAP2 remains operation-
al just fine).  The curious thing is, is that it  crashes  ALWAYS
after  receiving  and updating the database with information from
NS.NASA.GOV. Previous to yesterday, I had NS.NASA.GOV  listed  in
my cache as a server for the root domain. After seeing this error
and examining the named.run file, I didn't see NS.NASA.GOV listed
in  the  information  being returned from SRI-NIC.ARPA concerning
the root servers, so I took it out of my cache. I restarted named
and  everything ran fine and dandy until today. This morning when
I checked, I had again crashed. After watching the named.run file
again,  I  noticed that now I was receiving NS.NASA.GOV as a root
server from the NIC, so I added it again to my cache and  started
things up, but to no avail.

Anyway, I am unable to keep named running on  NAP1.ARPA  at  all.
Since  the  named.run  file is so large, I haven't included it in
this message, but if anyone can help me on this, and needs to see
it,  it  is  available  via  anonymous  FTP  on  ASD.WPAFB.AF.MIL
(129.48.1.13) in pub/nap1.named.run.

I'd appreciate any ideas any might have.

Relatedly, about three weeks ago, there  were  messages  sent  to
INFO-PYRAMID concerning the fact that pyramid was distributing an
OLD version of BIND, and that it was causing  several  people  to
have  difficulties in receiving information from the WPAFB.AF.MIL
domain. I have since upgraded to 4.8, and with the  exception  of
this problem, it has been running fine on the Pyramids.

Many thanks in advance,

Barry D. Hassler
Control Data Corporation
Integration Services Division

hassler@ASD.WPAFB.AF.MIL (Barry D. Hassler) (08/28/88)

[Charles Hedrick types:]
>  I just asked sri-nic.arpa who the root servers are.  The list
>  includes ns.nasa.gov.  We got tired of losing track of the
>  root servers (a common problem under previous versions of
>  named, though it may have been fixed in 4.8).  So we just
>  hardcode the list.  I'm fairly sure that the way we do it
>  causes us to ignore anything we hear from the net.  This might
>  provide a workaround for you.  We have a file /etc/named.root
>  that lists the root servers.  We then put
>    primary . /etc/named.root
>  in named.boot.  The problem with this is that if your list is
>  wrong, you can end up giving people wrong information.  Make
>  very sure that your named.root doesn't have an SOA record in it,
>  or you'll be claiming to be authoritative.  (We retrieve the
>  current list of servers nightly from SRI-NIC, just to make sure
>  our information is always up to date.)

I too have had the entire list hardcoded in my named.ca file like such:

; $Header: named.ca,v 1.3 88/08/17 13:09:33 root Exp $
;
;
; Initial cache data for root domain servers
.			99999999	IN	NS	brl-aos.arpa.
			99999999	IN	NS	sri-nic.arpa.
			99999999	IN	NS	a.isi.edu.
			99999999	IN	NS	gunter-adam.arpa.
			99999999	IN	NS	c.nyser.net.
			99999999	IN	NS	terp.umd.edu.

; Prep the cache
sri-nic.arpa.		99999999	IN	A	26.0.0.73
					IN	A	10.0.0.51
a.isi.edu.		99999999	IN	A	26.3.0.103
brl-aos.arpa.		99999999	IN	A	128.20.1.2
					IN	A	192.5.25.82
gunter-adam.arpa.	99999999	IN	A	26.1.0.13
c.nyser.net.		99999999	IN	A	192.33.4.12
terp.umd.edu.		99999999	IN	A	10.1.0.17
					IN	A	128.8.10.90	

nap2.arpa.		99999999	IN	A	26.18.0.124
nap1.arpa.		99999999	IN	A	26.4.0.176

I have taken ns.nasa.gov out of here because of my problems (note that I am
NOT implying there is anything wrong with NS.NASA.GOV). Altough I did notice
that one of the first things named does after loading the initial configurations is to send a query to one of the listed root servers asking about all the
root servers. I see this in the named.run file like such:

	sysquery: send -> 128.20.1.2 5 (53), nsid=1 id=0 0ms

	datagram from 128.20.1.2 port 53, fd 5, len 327
	ns_req()
	HEADER:
		opcode = QUERY, id = 1, rcode = NOERROR
		header flags:  qr aa ra
		qdcount = 1, ancount = 7, nscount = 0, arcount = 9

	QUESTIONS:
		., type = NS, class = IN

	ANSWERS:
		.
		type = NS, class = IN, ttl = 518400, dlen = 14
		domain name = SRI-NIC.ARPA

		.
		type = NS, class = IN, ttl = 518400, dlen = 13
		domain name = AOS.BRL.MIL

		.
		type = NS, class = IN, ttl = 518400, dlen = 11
		domain name = A.ISI.EDU

		.
		type = NS, class = IN, ttl = 518400, dlen = 14
		domain name = GUNTER-ADAM.ARPA

		.
		type = NS, class = IN, ttl = 518400, dlen = 13
		domain name = C.NYSER.NET

		.
		type = NS, class = IN, ttl = 518400, dlen = 11
		domain name = TERP.UMD.EDU

		.
		type = NS, class = IN, ttl = 518400, dlen = 13
		domain name = NS.NASA.GOV

	ADDITIONAL RECORDS:
		SRI-NIC.ARPA
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 26.0.0.73

		SRI-NIC.ARPA
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 10.0.0.51

		AOS.BRL.MIL
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 128.20.1.2

		AOS.BRL.MIL
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 192.5.25.82

		A.ISI.EDU
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 26.3.0.103

		GUNTER-ADAM.ARPA
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 26.1.0.13

		C.NYSER.NET
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 192.33.4.12

		TERP.UMD.EDU
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 10.1.0.17

		TERP.UMD.EDU
		type = A, class = IN, ttl = 518400, dlen = 4
		internet address = 128.8.10.90

What I think is actually causing the crash, comes at some point shortly
after logging the following lines in named.run:

	resp: nlookup(NS.NASA.GOV) type=1
	resp: found 'NS.NASA.GOV' as 'NS.NASA.GOV' (cname=0)
	wanted(40af8, 1, 1) 1, 1
	stale: ttl 589122321 518400 (x0)
	make_rr(NS.NASA.GOV, 40af8, 111, -1072910097, 1) 4 zone 0 ttl 589122321

That -1072910097 is a length?! Obviously, I haven't spent any time trying
to track this down further. If I have to, I will, altough I'm hoping someone
else has at least seen this problem.

-BDH