[comp.protocols.tcp-ip] Van's algorithm's in Streams

dcrocker@TWG.COM (Dave Crocker) (08/05/88)

This is in response to Kwang Sung's query:

Our newest version of Streams TCP contains Van's and Nagles algorithms
for congestion control, etc.  The header-predection performance code is
scheduled for addition, shortly, although we are still trying to understand
its impact under mixed traffic conditions.

Dave Crocker
VP, Engineering
The Wollongong Group

van@HELIOS.EE.LBL.GOV (Van Jacobson) (08/09/88)

> The header-predection performance code is scheduled for
> addition, shortly, although we are still trying to understand
> its impact under mixed traffic conditions.

Dave -

Are you working on header prediction in-house or is the above
management-speak for "Berkeley hasn't given us the code yet"?
(We haven't given it to anybody yet -- but "real soon now".)

We've been running the header prediction tcp for 3 months now
and the only "impact" I've observed is that things get faster --
under any traffic conditions.  That shouldn't be too surprising:
The "prediction" code adds about 6us of compares to the tcp
input processing.  If the prediction loses, you fall through to
the old input code (about 220us of processing, on the average).
If the prediction wins, you do about 25us of processing, on the
average.  For bidirectional dataflows (e.g., telnet, smtp, etc.)
the prediction wins slightly more than half the time (there are
counters in our implementation to measure this -- I was
surprised the hit rate was so high).  [On unidirectional flows,
the prediction wins all the time.]  So, the impact for, say, N
packets is:  old = N * 220;  new = N/2 * 226 + N/2 * 31.  I.e.,
you pick up about a factor of two on bidirectional traffic and a
factor of seven on unidirectional traffic.  (Times will vary --
these were on a Sun-3/60.  But the new/old ratios should stay
about the same on different architectures.)

Of course, I should point out again that the prediction didn't
make much difference in overall performance -- it just made a
small part of the work smaller.  Given halfway reasonable
protocols (as opposed to, say, the ISO protocols) and a half
decent implementation, protocol processing seems to be a small
fraction of the cost of networking.  So far, my data makes me
suspect that people working on hardware assists for protocol
processing are solving the wrong problem.

 - Van

ps- Just to ground the above timings in reality, here's our
    header prediction code from the 4bsd netinet/tcp_input.c.
    This code goes shortly after the pcb lookup (which is
    cached so it usually costs 3 32-bit compares).  Other
    than checksumming the packet & making sure the header
    length makes sense, this is the complete tcp input processing.

    /*
     * Header prediction: check for the two common cases
     * of a uni-directional data xfer.  If the packet has
     * no control flags, is in-sequence, the window didn't
     * change and we're not retransmitting, it's a
     * candidate.  If the length is zero and the ack moved
     * forward, we're the sender side of the xfer.  Just
     * free the data acked & wake any higher level process
     * that was blocked waiting for space.  If the length
     * is non-zero and the ack didn't move, we're the
     * receiver side.  If we're getting packets in-order
     * (the reassembly queue is empty), add the data to
     * the socket buffer and note that we need a delayed ack.
     */
    if (tp->t_state == TCPS_ESTABLISHED &&
	(tiflags & (TH_SYN|TH_FIN|TH_RST|TH_URG|TH_ACK)) == TH_ACK &&
	ti->ti_seq == tp->rcv_nxt && ti->ti_win == tp->snd_wnd &&
	tp->snd_nxt == tp->snd_max) {
	    if (ti->ti_len == 0) {
		    if (SEQ_GT(ti->ti_ack, tp->snd_una) &&
			SEQ_LEQ(ti->ti_ack, tp->snd_max) &&
			tp->snd_cwnd >= tp->snd_wnd) {
			    /*
			     * this is a pure ack for outstanding data
			     * and we're not in the middle of slow-start
			     * or congestion avoidance.
			     */
			    ++tcppredack;
			    if (tp->t_rtt && SEQ_GT(ti->ti_ack,tp->t_rtseq))
				    tcp_xmit_timer (tp);
			    sbdrop (&so->so_snd, ti->ti_ack - tp->snd_una);
			    tp->snd_una = ti->ti_ack;
			    m_freem (m);
			    /*
			     * If all outstanding data is acked, stop the
			     * retransmit timer.  If there's no one
			     * waiting to output, let tcp_output decide
			     * between more output or persist.  Otherwise
			     * give the user a crack at the new space,
			     * assuming he'll call tcp_output when it's
			     * filled.  If there is more data to be acked,
			     * restart retransmit timer, using current
			     * (possibly backed-off) value.
			     */
			    if (tp->snd_una == tp->snd_max)
				    tp->t_timer[TCPT_REXMT] = 0;
			    else if (tp->t_timer[TCPT_PERSIST] == 0)
				    tp->t_timer[TCPT_REXMT] = tp->t_rxtcur;

			    if ((so->so_snd.sb_flags & SB_WAIT) ||
				so->so_snd.sb_sel)
				    sowwakeup(so);
			    else if (so->so_snd.sb_cc)
				    (void) tcp_output (tp);
			    return;
		    }
	    }
	    else if (ti->ti_ack == tp->snd_una &&
		     tp->seg_next == (struct tcpiphdr *)tp &&
		     ti->ti_len <= sbspace(&so->so_rcv)) {
		    /*
		     * this is a pure, in-sequence data packet
		     * with nothing on the reassembly queue and
		     * we have enough buffer space to take it.
		     */
		    ++tcppreddat;
		    tp->rcv_nxt += ti->ti_len;
		    /*
		     * Drop TCP and IP headers then add data
		     * to socket buffer
		     */
		    m->m_off += sizeof(struct tcpiphdr);
		    m->m_len -= sizeof(struct tcpiphdr);
		    sbappend(&so->so_rcv, m);
		    sorwakeup(so);
		    tp->t_flags |= TF_DELACK;
		    return;
	    }
    }

    /*
     * Prediction failed:  do things one step at a time.
     ...