[comp.protocols.tcp-ip] BSD tcp: multiple acks

dc@lupine.NCD.COM (Dave Cornelius) (08/21/90)

A sniffer trace of the MIT X windows demo 'maze', run on a lightly-loaded
host, connected via TCP to an X server on a heavily loaded (or slow)
host reveals a bug in the TCP ack generation in 4.2/4.3-tahoe derived
TCP implementations.  The result of the bug is that a TCP which
receives many small packets can appear to send an ACK for each
incoming packet.  These acks have the property that the <ack> field
_advances_ and the <window> field _declines_ by the same amount:
the length of the last incoming segment.

The problem is that the available window in the receiver can
be less than the window which was advertised to the sender.
The code in tcp_output.c computes the difference between
these two quantities, and leaves the result in a c-language int.
This item is potentially negative, in the case cited above.
The code then uses this negative int in a comparison with
a unsigned short (t_maxseg), and also in an expression
involving division by an unsigned long (sb_hiwat).  The latter usage
coerces the negative int to an unsigned, which steers
the comparison against 35% of the max window the wrong way,
resulting an outgoing ack for each segment received, (at
least until the receiving socket buffer is drained).

Granted, the 'maze' demo makes poor use of the TCP connection
by causing the host to emit enough 20-byte TCP packets to fill
the X-server's TCP window.  The BSD TCP code causes a few more
packets than necessary to be generated in reply to the client's
bombardment :-)

Repeat by:
(1) On host A:
	run the MIT X server, and
	5-10 processes each running 'main();{while (1) ;}'

(2) On host B:
    run 'maze -display hosta:0'

(3) With an ethernet analyzer, watch the traffic.  Watch the acks returning
    from host A after the bombardment of 20-byte X packets from host B.

This problem has been observed on:
Sun 3/50(sunos3.5) and SparcStation1(sunos 4.1)



Old code from tcp_output.c (near line 158 in 4.3.tahoe)

<	win = sbspace(&so->so_rcv);
<
<[ .... several state checks omitted... ]
<
<	/*
<	 * Compare available window to amount of window
<	 * known to peer (as advertised window less
<	 * next expected input).  If the difference is at least two
<	 * max size segments or at least 35% of the maximum possible
<	 * window, then want to send a window update to peer.
<	 */
<	if (win > 0) {
<		int adv = win - (tp->rcv_adv - tp->rcv_nxt);
<
<		if (so->so_rcv.sb_cc == 0 && adv >= 2 * tp->t_maxseg)
<			goto send;
<		if (100 * adv / so->so_rcv.sb_hiwat >= 35)
<			goto send;
<	}


Modified code:
>	win = sbspace(&so->so_rcv);
>
>[ .... several state checks omitted... ]
>
>	/*
>	 * Compare available window to amount of window
>	 * known to peer (as advertised window less
>	 * next expected input).  If the peer could make some use
>	 * of the window update, and the difference is at least two
>	 * max size segments or at least 35% of the maximum possible
>	 * window, then want to send a window update to peer.
>	 */
>	if (win > 0) {
>		int adv = win - (tp->rcv_adv - tp->rcv_nxt);
>
>-->		if (adv >= 0) {
>			if (so->so_rcv.sb_cc == 0 && adv >= 2 * tp->t_maxseg)
>				goto send;
>			if (100 * adv / so->so_rcv.sb_hiwat >= 35)
>				goto send;
>-->		}
>	}

-----------
Dave Cornelius                          Network Computing Devices
                                        350 North Bernardo Ave
dc@ncd.com   -or-                       Mountain View, CA, 94043
{uunet,ardent,mips}!lupine!dc           415-694-0675