dc@lupine.NCD.COM (Dave Cornelius) (08/21/90)
A sniffer trace of the MIT X windows demo 'maze', run on a lightly-loaded host, connected via TCP to an X server on a heavily loaded (or slow) host reveals a bug in the TCP ack generation in 4.2/4.3-tahoe derived TCP implementations. The result of the bug is that a TCP which receives many small packets can appear to send an ACK for each incoming packet. These acks have the property that the <ack> field _advances_ and the <window> field _declines_ by the same amount: the length of the last incoming segment. The problem is that the available window in the receiver can be less than the window which was advertised to the sender. The code in tcp_output.c computes the difference between these two quantities, and leaves the result in a c-language int. This item is potentially negative, in the case cited above. The code then uses this negative int in a comparison with a unsigned short (t_maxseg), and also in an expression involving division by an unsigned long (sb_hiwat). The latter usage coerces the negative int to an unsigned, which steers the comparison against 35% of the max window the wrong way, resulting an outgoing ack for each segment received, (at least until the receiving socket buffer is drained). Granted, the 'maze' demo makes poor use of the TCP connection by causing the host to emit enough 20-byte TCP packets to fill the X-server's TCP window. The BSD TCP code causes a few more packets than necessary to be generated in reply to the client's bombardment :-) Repeat by: (1) On host A: run the MIT X server, and 5-10 processes each running 'main();{while (1) ;}' (2) On host B: run 'maze -display hosta:0' (3) With an ethernet analyzer, watch the traffic. Watch the acks returning from host A after the bombardment of 20-byte X packets from host B. This problem has been observed on: Sun 3/50(sunos3.5) and SparcStation1(sunos 4.1) Old code from tcp_output.c (near line 158 in 4.3.tahoe) < win = sbspace(&so->so_rcv); < <[ .... several state checks omitted... ] < < /* < * Compare available window to amount of window < * known to peer (as advertised window less < * next expected input). If the difference is at least two < * max size segments or at least 35% of the maximum possible < * window, then want to send a window update to peer. < */ < if (win > 0) { < int adv = win - (tp->rcv_adv - tp->rcv_nxt); < < if (so->so_rcv.sb_cc == 0 && adv >= 2 * tp->t_maxseg) < goto send; < if (100 * adv / so->so_rcv.sb_hiwat >= 35) < goto send; < } Modified code: > win = sbspace(&so->so_rcv); > >[ .... several state checks omitted... ] > > /* > * Compare available window to amount of window > * known to peer (as advertised window less > * next expected input). If the peer could make some use > * of the window update, and the difference is at least two > * max size segments or at least 35% of the maximum possible > * window, then want to send a window update to peer. > */ > if (win > 0) { > int adv = win - (tp->rcv_adv - tp->rcv_nxt); > >--> if (adv >= 0) { > if (so->so_rcv.sb_cc == 0 && adv >= 2 * tp->t_maxseg) > goto send; > if (100 * adv / so->so_rcv.sb_hiwat >= 35) > goto send; >--> } > } ----------- Dave Cornelius Network Computing Devices 350 North Bernardo Ave dc@ncd.com -or- Mountain View, CA, 94043 {uunet,ardent,mips}!lupine!dc 415-694-0675