hassler@asd.wpafb.af.MIL (Barry D. Hassler) (08/25/88)
First of all, lets get the versions out of the way: I am running BIND 4.8 on Pyramid 9010's (98Xe's in reality). I have two ident- ically configured systems operating as redundant network gateways between MILNET and 8 local networks, and also acting as primary domain servers for the WPAFB.AF.MIL domain. These two hosts are NAP1.ARPA (26.4.0.176) and NAP2.ARPA (26.18.0.124). For the past two days, I have been having very regular crashes of named on one of these systems (NAP1.ARPA, NAP2 remains operation- al just fine). The curious thing is, is that it crashes ALWAYS after receiving and updating the database with information from NS.NASA.GOV. Previous to yesterday, I had NS.NASA.GOV listed in my cache as a server for the root domain. After seeing this error and examining the named.run file, I didn't see NS.NASA.GOV listed in the information being returned from SRI-NIC.ARPA concerning the root servers, so I took it out of my cache. I restarted named and everything ran fine and dandy until today. This morning when I checked, I had again crashed. After watching the named.run file again, I noticed that now I was receiving NS.NASA.GOV as a root server from the NIC, so I added it again to my cache and started things up, but to no avail. Anyway, I am unable to keep named running on NAP1.ARPA at all. Since the named.run file is so large, I haven't included it in this message, but if anyone can help me on this, and needs to see it, it is available via anonymous FTP on ASD.WPAFB.AF.MIL (129.48.1.13) in pub/nap1.named.run. I'd appreciate any ideas any might have. Relatedly, about three weeks ago, there were messages sent to INFO-PYRAMID concerning the fact that pyramid was distributing an OLD version of BIND, and that it was causing several people to have difficulties in receiving information from the WPAFB.AF.MIL domain. I have since upgraded to 4.8, and with the exception of this problem, it has been running fine on the Pyramids. Many thanks in advance, Barry D. Hassler Control Data Corporation Integration Services Division
hassler@ASD.WPAFB.AF.MIL (Barry D. Hassler) (08/28/88)
[Charles Hedrick types:] > I just asked sri-nic.arpa who the root servers are. The list > includes ns.nasa.gov. We got tired of losing track of the > root servers (a common problem under previous versions of > named, though it may have been fixed in 4.8). So we just > hardcode the list. I'm fairly sure that the way we do it > causes us to ignore anything we hear from the net. This might > provide a workaround for you. We have a file /etc/named.root > that lists the root servers. We then put > primary . /etc/named.root > in named.boot. The problem with this is that if your list is > wrong, you can end up giving people wrong information. Make > very sure that your named.root doesn't have an SOA record in it, > or you'll be claiming to be authoritative. (We retrieve the > current list of servers nightly from SRI-NIC, just to make sure > our information is always up to date.) I too have had the entire list hardcoded in my named.ca file like such: ; $Header: named.ca,v 1.3 88/08/17 13:09:33 root Exp $ ; ; ; Initial cache data for root domain servers . 99999999 IN NS brl-aos.arpa. 99999999 IN NS sri-nic.arpa. 99999999 IN NS a.isi.edu. 99999999 IN NS gunter-adam.arpa. 99999999 IN NS c.nyser.net. 99999999 IN NS terp.umd.edu. ; Prep the cache sri-nic.arpa. 99999999 IN A 26.0.0.73 IN A 10.0.0.51 a.isi.edu. 99999999 IN A 26.3.0.103 brl-aos.arpa. 99999999 IN A 128.20.1.2 IN A 192.5.25.82 gunter-adam.arpa. 99999999 IN A 26.1.0.13 c.nyser.net. 99999999 IN A 192.33.4.12 terp.umd.edu. 99999999 IN A 10.1.0.17 IN A 128.8.10.90 nap2.arpa. 99999999 IN A 26.18.0.124 nap1.arpa. 99999999 IN A 26.4.0.176 I have taken ns.nasa.gov out of here because of my problems (note that I am NOT implying there is anything wrong with NS.NASA.GOV). Altough I did notice that one of the first things named does after loading the initial configurations is to send a query to one of the listed root servers asking about all the root servers. I see this in the named.run file like such: sysquery: send -> 128.20.1.2 5 (53), nsid=1 id=0 0ms datagram from 128.20.1.2 port 53, fd 5, len 327 ns_req() HEADER: opcode = QUERY, id = 1, rcode = NOERROR header flags: qr aa ra qdcount = 1, ancount = 7, nscount = 0, arcount = 9 QUESTIONS: ., type = NS, class = IN ANSWERS: . type = NS, class = IN, ttl = 518400, dlen = 14 domain name = SRI-NIC.ARPA . type = NS, class = IN, ttl = 518400, dlen = 13 domain name = AOS.BRL.MIL . type = NS, class = IN, ttl = 518400, dlen = 11 domain name = A.ISI.EDU . type = NS, class = IN, ttl = 518400, dlen = 14 domain name = GUNTER-ADAM.ARPA . type = NS, class = IN, ttl = 518400, dlen = 13 domain name = C.NYSER.NET . type = NS, class = IN, ttl = 518400, dlen = 11 domain name = TERP.UMD.EDU . type = NS, class = IN, ttl = 518400, dlen = 13 domain name = NS.NASA.GOV ADDITIONAL RECORDS: SRI-NIC.ARPA type = A, class = IN, ttl = 518400, dlen = 4 internet address = 26.0.0.73 SRI-NIC.ARPA type = A, class = IN, ttl = 518400, dlen = 4 internet address = 10.0.0.51 AOS.BRL.MIL type = A, class = IN, ttl = 518400, dlen = 4 internet address = 128.20.1.2 AOS.BRL.MIL type = A, class = IN, ttl = 518400, dlen = 4 internet address = 192.5.25.82 A.ISI.EDU type = A, class = IN, ttl = 518400, dlen = 4 internet address = 26.3.0.103 GUNTER-ADAM.ARPA type = A, class = IN, ttl = 518400, dlen = 4 internet address = 26.1.0.13 C.NYSER.NET type = A, class = IN, ttl = 518400, dlen = 4 internet address = 192.33.4.12 TERP.UMD.EDU type = A, class = IN, ttl = 518400, dlen = 4 internet address = 10.1.0.17 TERP.UMD.EDU type = A, class = IN, ttl = 518400, dlen = 4 internet address = 128.8.10.90 What I think is actually causing the crash, comes at some point shortly after logging the following lines in named.run: resp: nlookup(NS.NASA.GOV) type=1 resp: found 'NS.NASA.GOV' as 'NS.NASA.GOV' (cname=0) wanted(40af8, 1, 1) 1, 1 stale: ttl 589122321 518400 (x0) make_rr(NS.NASA.GOV, 40af8, 111, -1072910097, 1) 4 zone 0 ttl 589122321 That -1072910097 is a length?! Obviously, I haven't spent any time trying to track this down further. If I have to, I will, altough I'm hoping someone else has at least seen this problem. -BDH