[comp.protocols.tcp-ip] why Rutgers named has been attacking randomly selected net sites

hedrick@ATHOS.RUTGERS.EDU (Charles Hedrick) (11/14/87)

Rutgers has been running the beta test named 4.7.  One very neat new
features of 4.7 is that it implements a two-level caching scheme.
That is, you can designate 2 or 3 machines as your primary servers.
All other named's are set up to send requests through them.  This lets
you have a small number of well-populated caches, and avoids having
every machine on your network have to interact with the Arpanet.  This
should be a great performance win for the net as a whole.
Unfortunately, this causes a bug that was present as early as 4.4 (the
earliest code we have around now) to have very serious effects.  In
ns_resp, rd is set instead of qr in a couple of places where responses
are generated.  (For those of you who don't have the bits memorized,
rd is used in a query to mean that you are requesting the name server
to handle the request recursively for you.  qr must be set in all
responses.  It indicates that it is a response, rather than a query.
There is no obvious meaning to rd in a response, though the spec does
call for it to be copied from the original query.)  The effect is that
qr is not set in certain responses.  With the two-level cache,
interesting things result.  The original host sends a query to the
primary server.  It eventually gives up, and sends a response back.
But qr is not set.  So the original server things that this is a
query.  It dutifully handles the query, by sending it back to the
primary server.  If you have more than one primary server, each
request generates N more.  The net effect is obvious.  Apparently we
have attacked various more or less innocent servers because of this
bug.  I believe it is what caused the high packet rates that Dave
Mills saw to some otherwise innocent site at NASA.  I just fixed the
problem.  Site running 4.7 should be very careful about this.  I would
think there might also be situations where this bug would cause
trouble even in earlier releases, but I can't be sure.

It is of course possible that I am misunderstanding what this code is
supposed to do, and that I will end up with a very red face.  (This
has happened recently in other contexts.)  But we have definitely seen
the infinite packets, and the seriousness of the problem seemed to
justify broadcasting a warning quickly.

*** ns_resp.c.BAK	Fri Nov 13 23:34:35 1987
--- ns_resp.c	Sat Nov 14 01:09:50 1987
***************
*** 523,529 ****
  			if (debug >= 3)
  				fprintf(ddt,"resp: leaving, MAXCNAMES exceeded\n");
  #endif
!  			hp->id = qp->q_id; hp->rd = hp->ra = 1;		
  	 		(void) send_msg(qp->q_msg, qp->q_msglen, qp);
  		 	qremove(qp);
  	 		return;
--- 523,529 ----
  			if (debug >= 3)
  				fprintf(ddt,"resp: leaving, MAXCNAMES exceeded\n");
  #endif
!  			hp->id = qp->q_id; hp->qr = hp->ra = 1;		
  	 		(void) send_msg(qp->q_msg, qp->q_msglen, qp);
  		 	qremove(qp);
  	 		return;
***************
*** 595,601 ****
  	stats[S_RESPOK].cnt++;
  #endif
  	/* The "standard" return code */
! 	hp->id = qp->q_id; hp->rd = hp->ra = 1;
  	(void) send_msg(msg, msglen, qp);
  	qremove(qp);
  	return;
--- 595,601 ----
  	stats[S_RESPOK].cnt++;
  #endif
  	/* The "standard" return code */
! 	hp->id = qp->q_id; hp->qr = hp->ra = 1;
  	(void) send_msg(msg, msglen, qp);
  	qremove(qp);
  	return;
***************
*** 617,623 ****
  #endif
  	hp = (HEADER *)(cname ? qp->q_cmsg : qp->q_msg);
  	hp->rcode = SERVFAIL;
! 	hp->id = qp->q_id; hp->rd = hp->ra = 1;
  	(void) send_msg((char *)hp, (cname ? qp->q_cmsglen : qp->q_msglen), qp);
  	qremove(qp);
  	return;
--- 617,623 ----
  #endif
  	hp = (HEADER *)(cname ? qp->q_cmsg : qp->q_msg);
  	hp->rcode = SERVFAIL;
! 	hp->id = qp->q_id; hp->qr = hp->ra = 1;
  	(void) send_msg((char *)hp, (cname ? qp->q_cmsglen : qp->q_msglen), qp);
  	qremove(qp);
  	return;