[comp.unix.ultrix] named going into an infinite loop ...

mf@ircam.fr (Michel Fingerhut) (10/22/90)

Machine: DECsystem 5820 (RISC)
OS:      Ultrix 4.0 (Rev. 179)

Every once in a while (every 3-4 days), the name daemon starts eating CPU
time, goes to the top of the queue, and fills the syslog error message
table with messages of the form

	Oct 22 10:23:37 localhost: 93 named: accept: Too many open files

(one a second, approximately) until it is killed and/or chokes /usr/spool.
Upon restart, it works fine.  There is no apparent flood of requests prior
to that.

Does anyone have a suggestion on how to approach the problem?

Thanks,
Michael Fingerhut

tinguely@plains.NoDak.edu (Mark Tinguely) (10/24/90)

In article <1990Oct22.105209.28006@ircam.ircam.fr> mf@ircam.fr (Michel Fingerhut) writes:
>Machine: DECsystem 5820 (RISC)
>OS:      Ultrix 4.0 (Rev. 179)
>Every once in a while (every 3-4 days), the name daemon starts eating CPU
>time, goes to the top of the queue, and fills the syslog error message
>table with messages of the form
>	Oct 22 10:23:37 localhost: 93 named: accept: Too many open files


 Do you have machines that queries the name server by TCP rather than
 UDP? This can be found by using `netstat'. We had the same problem with
 a IBM 3090 querying our the BIND 4.8.1 (and earlier releases) nameserver.
 I am sure the Ultrix server is based upon BIND 4.8.

 About 7 months ago I posted the fix to this problem, and (though I did
 not check), I think a simular fix went into BIND 4.8.2. There are two
 problems, but both are based on the fact that TCP queries are queued.
 It is possible with the orginal BIND code, that these queries are not
 properly released as they sit waiting on a time queue. UDP resolutions
 are just discarded if they can not be resolved right away, and do not
 cause this problem.

 If you do not want to update your nameserver to BIND (boy did I find out
 this week how many people think I am a radical for running public-domain
 software [that works correctly]), then ask at DEC to update the server.

 Last week I removed my "diff" files for the BIND error (assuming these
 were picked up in BIND 4.8.3 located at ucbarpa.berrkeley.edu in the
 4.3 directory). I just quickly scanned the areas that I modified in
 the BIND 4.8.3 files and did not see the removal of queued TCP entries,
 but since I don't follow the BIND mailing list, they may have implemented
 the solution in a different fashion than I did (or did not pick the changes
 at all). If there is a need for the TCP BIND fixes, I can restore them
 to our anonymous ftp partition.
-- 
Mark Tinguely           North Dakota State University,  Fargo, ND  58105
  UUCP:       		...!uunet!plains!tinguely
  BITNET:      		tinguely@plains.bitnet
  INTERNET:   		tinguely@plains.NoDak.edu