mf@ircam.fr (Michel Fingerhut) (10/22/90)
Machine: DECsystem 5820 (RISC) OS: Ultrix 4.0 (Rev. 179) Every once in a while (every 3-4 days), the name daemon starts eating CPU time, goes to the top of the queue, and fills the syslog error message table with messages of the form Oct 22 10:23:37 localhost: 93 named: accept: Too many open files (one a second, approximately) until it is killed and/or chokes /usr/spool. Upon restart, it works fine. There is no apparent flood of requests prior to that. Does anyone have a suggestion on how to approach the problem? Thanks, Michael Fingerhut
tinguely@plains.NoDak.edu (Mark Tinguely) (10/24/90)
In article <1990Oct22.105209.28006@ircam.ircam.fr> mf@ircam.fr (Michel Fingerhut) writes: >Machine: DECsystem 5820 (RISC) >OS: Ultrix 4.0 (Rev. 179) >Every once in a while (every 3-4 days), the name daemon starts eating CPU >time, goes to the top of the queue, and fills the syslog error message >table with messages of the form > Oct 22 10:23:37 localhost: 93 named: accept: Too many open files Do you have machines that queries the name server by TCP rather than UDP? This can be found by using `netstat'. We had the same problem with a IBM 3090 querying our the BIND 4.8.1 (and earlier releases) nameserver. I am sure the Ultrix server is based upon BIND 4.8. About 7 months ago I posted the fix to this problem, and (though I did not check), I think a simular fix went into BIND 4.8.2. There are two problems, but both are based on the fact that TCP queries are queued. It is possible with the orginal BIND code, that these queries are not properly released as they sit waiting on a time queue. UDP resolutions are just discarded if they can not be resolved right away, and do not cause this problem. If you do not want to update your nameserver to BIND (boy did I find out this week how many people think I am a radical for running public-domain software [that works correctly]), then ask at DEC to update the server. Last week I removed my "diff" files for the BIND error (assuming these were picked up in BIND 4.8.3 located at ucbarpa.berrkeley.edu in the 4.3 directory). I just quickly scanned the areas that I modified in the BIND 4.8.3 files and did not see the removal of queued TCP entries, but since I don't follow the BIND mailing list, they may have implemented the solution in a different fashion than I did (or did not pick the changes at all). If there is a need for the TCP BIND fixes, I can restore them to our anonymous ftp partition. -- Mark Tinguely North Dakota State University, Fargo, ND 58105 UUCP: ...!uunet!plains!tinguely BITNET: tinguely@plains.bitnet INTERNET: tinguely@plains.NoDak.edu
HAROLD@UGA.CC.UGA.EDU (Harold Pritchett) (10/25/90)
On Mon, 22 Oct 90 10:52:09 GMT Michel Fingerhut said: >Machine: DECsystem 5820 (RISC) >OS: Ultrix 4.0 (Rev. 179) > >Every once in a while (every 3-4 days), the name daemon starts eating CPU >time, goes to the top of the queue, and fills the syslog error message >table with messages of the form > > Oct 22 10:23:37 localhost: 93 named: accept: Too many open files > >(one a second, approximately) until it is killed and/or chokes /usr/spool. >Upon restart, it works fine. There is no apparent flood of requests prior >to that. Boy, do I have news for you. We had that same problem here for approx a month!! DEC looked at it, we sent them dumps, they remotely logged onto our machine, and finally they told us what was wrong! The "/etc/resolv.conf" file was mis-configured. Make SURE that the first nameserver entry in the file points to the loopback address. It should look something like this: domain your.domain.edu nameserver 127.0.0.1 We fixed ours, and have not had the problem since and that has been over two weeks. We also found that before we fixed the file, named would not dump cache or stats in response to a kill -INT or kill -IOT command, and this seems to have fixed that also. For more information, you may want to contact Therese Grise in the DEC Nashua, NH office, or Larry Pruitt in Atlanta, GA at (404) 772 2665. Harold C Pritchett | BITNET: HAROLD@UGA BITNET TechRep | ARPA: harold@uga.cc.uga.edu The University of Georgia | Athens, GA 30602 | fido: 1:370/60 (404) 542-3135 | Bbs: SYSOP at (404) 354-0817