hedrick@ATHOS.RUTGERS.EDU (Charles Hedrick) (11/14/87)
Rutgers has been running the beta test named 4.7. One very neat new features of 4.7 is that it implements a two-level caching scheme. That is, you can designate 2 or 3 machines as your primary servers. All other named's are set up to send requests through them. This lets you have a small number of well-populated caches, and avoids having every machine on your network have to interact with the Arpanet. This should be a great performance win for the net as a whole. Unfortunately, this causes a bug that was present as early as 4.4 (the earliest code we have around now) to have very serious effects. In ns_resp, rd is set instead of qr in a couple of places where responses are generated. (For those of you who don't have the bits memorized, rd is used in a query to mean that you are requesting the name server to handle the request recursively for you. qr must be set in all responses. It indicates that it is a response, rather than a query. There is no obvious meaning to rd in a response, though the spec does call for it to be copied from the original query.) The effect is that qr is not set in certain responses. With the two-level cache, interesting things result. The original host sends a query to the primary server. It eventually gives up, and sends a response back. But qr is not set. So the original server things that this is a query. It dutifully handles the query, by sending it back to the primary server. If you have more than one primary server, each request generates N more. The net effect is obvious. Apparently we have attacked various more or less innocent servers because of this bug. I believe it is what caused the high packet rates that Dave Mills saw to some otherwise innocent site at NASA. I just fixed the problem. Site running 4.7 should be very careful about this. I would think there might also be situations where this bug would cause trouble even in earlier releases, but I can't be sure. It is of course possible that I am misunderstanding what this code is supposed to do, and that I will end up with a very red face. (This has happened recently in other contexts.) But we have definitely seen the infinite packets, and the seriousness of the problem seemed to justify broadcasting a warning quickly. *** ns_resp.c.BAK Fri Nov 13 23:34:35 1987 --- ns_resp.c Sat Nov 14 01:09:50 1987 *************** *** 523,529 **** if (debug >= 3) fprintf(ddt,"resp: leaving, MAXCNAMES exceeded\n"); #endif ! hp->id = qp->q_id; hp->rd = hp->ra = 1; (void) send_msg(qp->q_msg, qp->q_msglen, qp); qremove(qp); return; --- 523,529 ---- if (debug >= 3) fprintf(ddt,"resp: leaving, MAXCNAMES exceeded\n"); #endif ! hp->id = qp->q_id; hp->qr = hp->ra = 1; (void) send_msg(qp->q_msg, qp->q_msglen, qp); qremove(qp); return; *************** *** 595,601 **** stats[S_RESPOK].cnt++; #endif /* The "standard" return code */ ! hp->id = qp->q_id; hp->rd = hp->ra = 1; (void) send_msg(msg, msglen, qp); qremove(qp); return; --- 595,601 ---- stats[S_RESPOK].cnt++; #endif /* The "standard" return code */ ! hp->id = qp->q_id; hp->qr = hp->ra = 1; (void) send_msg(msg, msglen, qp); qremove(qp); return; *************** *** 617,623 **** #endif hp = (HEADER *)(cname ? qp->q_cmsg : qp->q_msg); hp->rcode = SERVFAIL; ! hp->id = qp->q_id; hp->rd = hp->ra = 1; (void) send_msg((char *)hp, (cname ? qp->q_cmsglen : qp->q_msglen), qp); qremove(qp); return; --- 617,623 ---- #endif hp = (HEADER *)(cname ? qp->q_cmsg : qp->q_msg); hp->rcode = SERVFAIL; ! hp->id = qp->q_id; hp->qr = hp->ra = 1; (void) send_msg((char *)hp, (cname ? qp->q_cmsglen : qp->q_msglen), qp); qremove(qp); return;