jeffe@sandino.austin.ibm.com (Peter Jeffe 512.823.4091) (08/02/90)
I'm not sure if this is a bug, but in BSD tahoe res_send.c (6.21), the following code looks fishy: | for (retry = _res.retry; retry > 0; retry--) { | for (ns = 0; ns < _res.nscount; ns++) { |... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... |#if BSD >= 43 | if (_res.nscount == 1 || retry == _res.retry) { !!!^^^^^^^^^^^^^^^^!!! | /* | * Don't use connect if we might | * still receive a response | * from another server. | */ | if (connected == 0) { | if (connect(s, &_res.nsaddr_list[ns], | sizeof(struct sockaddr)) < 0) { | [ #ifdef DEBUG ] | continue; | } | connected = 1; | } | if (send(s, buf, buflen, 0) != buflen) { | [ #ifdef DEBUG ] | continue; | } | } else { | [ connect() to no_addr (0.0.0.0) and sendto(_res.nsaddr_list[ns]) ] The problem (?) is that while (retry == _res.retry) we only query the first host, since we connect() the first time through, and then keep send()ing to whomever we're connected to, until we decrement retry and fall through to the sendto() call. So if we've got 3 nameservers defined in resolv.conf, and all of them are down, we query in the following order: 1, 1, 1, 1, 2, 3, 1, 2, 3 Now, I don't know if this was intended, but it jes' don't seem right to me. Especially since the virtual-circuit code doesn't act the same. Personally, I'd chuck the connect()/send() stuff and just do the sendto() (or at least do it only on (_res.nscount == 1) if you really think it's more efficient). What say ye all? ------------------------------------------------------------------------------- Peter Jeffe ...uunet!cs.utexas.edu!ibmchs!auschs!sandino.austin.ibm.com!jeffe first they want a disclaimer, then they make you pee in a jar, then they come for you in the night
skl@van-bc.wimsey.bc.ca (Samuel Lam) (08/03/90)
In article <3010@awdprime.UUCP>, jeffe@sandino.austin.ibm.com (Peter Jeffe 512.823.4091) wrote: >I'm not sure if this is a bug, but in BSD tahoe res_send.c (6.21), the >following code looks fishy: ... >| if (_res.nscount == 1 || retry == _res.retry) { > !!!^^^^^^^^^^^^^^^^!!! (What a coincident!) I also nailed this one within the last month, as it had caused us grief around here for a long time and I finally gotten around to it. I think this is definitely a bug, but my analysis is different from yours. I fixed it by changing the problem line from if (_res.nscount == 1 || retry == _res.retry) { ** to if (_res.nscount == 1 && retry == _res.retry) { ** as this fix made sense to me at the time. i.e. The condition become "if we only have one name server, *then* if this is also our first attempt to use it, then <do ...>". I have talked to the Berkeley person in charge of 4.4 about this (Mike) at the Vancouver IETF this week, and he commented that the 4.3-tahoe BIND is a couple years old and suggested that I take a look at the 4.4-reno BIND (to be available shortly) to see if the bug is still there before I post a note to the BIND mailing list about it. ...Sam -- Internet: <skl@wimsey.bc.ca> UUCP: {van-bc,ubc-cs,uunet}!wimsey.bc.ca!skl
earle@poseur.JPL.NASA.GOV (Greg Earle - Sun JPL on-site Software Support) (08/04/90)
In article <1199@van-bc.wimsey.bc.ca>, skl@van-bc.wimsey.bc.ca (Samuel Lam) writes: >>I'm not sure if this is a bug, but in BSD tahoe res_send.c (6.21), the >>following code looks fishy: >... >>| if (_res.nscount == 1 || retry == _res.retry) { >> !!!^^^^^^^^^^^^^^^^!!! > >(What a coincidence!) I also nailed this one within the last month, >as it had caused us grief around here for a long time and I finally >gotten around to it. I think this is definitely a bug, but my >analysis is different from yours. > >I fixed it by changing the problem line from > > if (_res.nscount == 1 || retry == _res.retry) { > ** >to > if (_res.nscount == 1 && retry == _res.retry) { > ** > >as this fix made sense to me at the time. i.e. The condition become >"if we only have one name server, *then* if this is also our first >attempt to use it, then <do ...>". Yes, this is a bug. But I do not think that your fix is the correct one; for a better fix, use the res_send.c that comes with any BIND 4.8.2 version that you can find (I believe several versions of BIND that purport to being `BIND 4.8.2.x' abound; perhaps someone could make a definitive statement on the situation?). I chose the one from University of Toronto; you can get it via anonymous FTP from neat.AI.Toronto.EDU, in `utbind.tar.Z'. Examine the code and you will see where this problem is dealt with directly. The connect()-ed socket is dis-connect()-ed if the resolver gets ECONNREFUSED from the first round of nameserver queries. In any case, this topic is fodder for the BIND mailing list, not the comp.protocols.tcp-ip.domains newsgroup. Use FTP to ucbarpa.Berkeley.EDU to retrieve the archives of the BIND mailing list to find the original discussion of this problem, and the solution proposed by the person whose fix is in the U. of Toronto version. -- - Greg Earle Sun Microsystems, Inc. - JPL on-site Software Support earle@poseur.JPL.NASA.GOV (Direct) earle@Sun.COM (Indirect)