srp@babar.mmwb.ucsf.edu (Scott R. Presnell) (11/16/90)
Hi folks. I've got a curious problem with the name service that I can't seem to pin down. So I thought I'd ask to see if anyone else is having this problem. We run named out of the box on IRIX 3.3.1, our resolver is set to look to the /etc/hosts file first, then named. We upgraded to 3.3.1 about 6 weeks ago. Over the last week, I've had several cases of the named process "running away:" gaining inordinate amounts of CPU time (in the thousands of minutes as opposed to the normal one or two minues), and essentially becomming useless for resolving remote hosts (ps and top show it to be in the "run" state constantly). None of configuration files have changed recently. After I kill and restart the named, there is no problem, at least for a while. This has been happening on two different machines now (4D/2[05]G). I don't know if this is connected but, I've also noted a lot of resolution failures with MAXQUERIES exceeded recently: Nov 11 15:50:10 babar named[98]: MAXQUERIES exceeded, possible data loop in resolving (2.246.70.192.in-addr.arpa) Nov 11 15:50:16 babar named[98]: MAXQUERIES exceeded, possible data loop in resolving (130.185.65.192.in-addr.arpa) Nov 12 20:43:19 babar named[98]: MAXQUERIES exceeded, possible data loop in resolving (ifi.ethz.ch) Nov 14 23:32:03 babar named[98]: MAXQUERIES exceeded, possible data loop in resolving (15.3.7.129.in-addr.arpa) Anyone seen this sort of stuff? Any clues? Is named in an infinite loop? Thanks for your help. - Scott Presnell -- Scott Presnell +1 (415) 476-9890 Pharm. Chem., S-926 Internet: srp@cgl.ucsf.edu University of California UUCP: ...ucbvax!ucsfcgl!srp San Francisco, CA. 94143-0446 Bitnet: srp@ucsfcgl.bitnet
karron@KARRON.MED.NYU.EDU (11/16/90)
I just (re) tested my resolv.conf setup, and again, nslookup reports the failure of my nameserver. If it was using /etc/hosts, it would get an answer back. Here is my resolv.conf: domain med.nyu.edu hostresorder local bind nameserver 0.0.0.0 nameserver 128.122.135.4 #med.nyu.edu nameserver 128.122.128.2 #nyu.edu Here are the results with the above resolv.conf: karron:~:102nslookup Default Server: karron Address: 0.0.0.0 > ls karron *** Can't list domain karron: No response from server > ls med.nyu.edu *** Can't list domain med.nyu.edu: No response from server > exit karron:~:103 Here are the results if I comment out the line nameserver 0.0.0.0: karron:~:101nslookup Default Server: mcclb0.med.nyu.edu Address: 128.122.135.4 > ls med.nyu.edu [mcclb0.med.nyu.edu] med.nyu.edu server = cmcl2.nyu.edu med.nyu.edu server = acf5.nyu.edu med.nyu.edu server = egress.nyu.edu med.nyu.edu server = mcclb0.med.nyu.edu localhost 127.0.0.1 mcclb0 128.122.135.4 free-135-1 128.122.135.1 mcmnc1 128.122.135.2 karron 128.122.135.3 mcmrm47 128.122.139.47 .lots of stuff deleted... mcmrm48 128.122.139.48 > exit karron:~:102 It is the above property that leads me to believe that the /etc/hosts is not queries, and that a local named BIND is required to get service from a local resolver. +-----------------------------------------------------------------------------+ | karron@nyu.edu (mail alias that will always find me) | | Dan Karron | | . . . . . . . . . . . . . . New York University Medical Center | | 560 First Avenue \ \ Pager <1> (212) 397 9330 | | New York, New York 10016 \**\ <2> 10896 <3> <your-number-here> | | (212) 340 5210 \**\__________________________________________ | | Please Note : Soon to move to dan@karron.med.nyu.edu 128.122.135.3 (Nov 1 )| +-----------------------------------------------------------------------------+
srp@babar.mmwb.ucsf.edu (Scott R. Presnell) (11/20/90)
srp@babar.mmwb.ucsf.edu (I) write: >Hi folks. > I've got a curious problem with the name service that I can't seem >to pin down. So I thought I'd ask to see if anyone else is having this >problem. >Over the last week, I've had several cases of the named process "running >away:" gaining inordinate amounts of CPU time (in the thousands of minutes Just in case someone else runs into this, I'll answer my own question. Turns out that I got hit by the bogus root nameservers that are making the rounds. If you see these guys in a named_dump.db of named, you've been hit too. ; Dumped at Fri Nov 16 08:58:43 1990 ; --- Cache & Data --- $ORIGIN . . 602116 IN NS NS.NIC.DDN.MIL. [...] 18376 IN NS TELECOM. ; bad - does not exist 18352 IN NS NEXTSVR. ; bad 18352 IN NS MTECV1. ; bad ; ; The affected hosts were secondary servers that forwarded requests. My fix was two fold: 1) Don't be a named that forwards requests to a specific host (that, in my case, caused the cache to become contaminated). 2) You may also want to get bind4.8.3 from ucbarpa.Berkeley.EDU (ha, ha, they lost!) and install the named part. It takes no effort to get it up on the SGI, and because you have the source, you can insert code to warn you of cache changes and zone updates. It's also a more recent version than the one SGI ships. I'd be glad to help if anyone else bumps into this problem. - Scott Presnell -- Scott Presnell +1 (415) 476-9890 Pharm. Chem., S-926 Internet: srp@cgl.ucsf.edu University of California UUCP: ...ucbvax!ucsfcgl!srp San Francisco, CA. 94143-0446 Bitnet: srp@ucsfcgl.bitnet