craig@LOKI.BBN.COM.UUCP (12/14/86)
This weekend I had time to start processing Van Jacobson's suggested fixes/modifications. Things started working very well after the first fix which made TCP choose better fragment sizes and increased the time to live for IP fragments. The subsequent testing also revealed some interesting results. (These are preliminary and subject to reappraisal). (1) EACKs appear to make a huge difference in use of the network. After seeing signs this was the case, I ran the simple test of pushing 50,000 data packets though a software loopback that dropped 4% of the packets. With EACKs there were 1,930 retransmissions, of which 1 received packet was a duplicate (note that some of the retransmissions were also dropped). Without EACKS there were 12,462 retransmissions of which 9,344 received packets were duplicates. 12,462 retransmissions is, of course, bad news, and comes from the fact that this RDP sends up to four packets in parallel. Typically the four get put into the send queue in the same tick of the timer, so when the first gets retransmitted, all four do. The moral seems to be use EACKs even though they aren't required for a conforming implementation. (2) Lixia Zhang's suggestion that one use the RTT of the SYN to compute the initial timeout estimate appears to work very well. (3) EACKs may make it possible to all but stomp out RTT feedback (those unfortunate cases where a dropped packet leds to an RTT = (the number of retries * SRTT) + SRTT being used to compute a new SRTT. I've been experimenting with discarding RTTs for out of order acks. This is best explained by example. If packets 1, 2, 3 and 4 are sent, and the first ack is an EACK for 3, the implementation uses the RTT for 3 to recompute the SRTT, but will discard the RTTs for 1 and 2 when they are eventually acked (or EACKed). The argument in favor of this scheme is that the acks for 1, and 2 probably represent either (a) RTTs for packets that were dropped, and thus including them would lead to feedback or (b) RTTs that reflect an earlier (and slower) state of the network (3 was sent after 1 and 2) and using them would make the SRTT a less good prediction of the RTT of the next packet. Note that (b) would be more convincing if it wasn't the case that 1, 2, 3 and 4 were probaby sent within a few milliseconds of each other. Watching 5 trial runs of 100 64-byte data packets bounced off Goonhilly this algorithm kept the SRTT within the observed range of real RTTs (as opposed to RTTs for packets that were dropped and had to be retransmitted). Using EACKs but taking the RTT for every packet, (again doing 5 trial runs) several cases of RTT-feedback were seen. In one case the SRTT soared to ~35 seconds when a few packets were dropped in a short period. Since the implementation uses Mill's suggested changes which make lowering the SRTT take longer than raising it, the SRTT took some time to recover. People may be wondering about observed throughput. How fast does RDP run vis-a-vis TCP? That turns out to be very difficult to answer. Identical tests run in parallel or one right after another give throughput rates that vary by factors of 2 of more. As a result it is difficult to get throughput numbers that demonstrably show differences which reflect more than random variation. After running tests for 7 weekends (and millions of packets) I have some theories, but those keep changing as different tests are run. Craig P.S. Those millions of packets are almost all over a software loopback. The contribution to network congestion has been small.
walsh@HARVARD.HARVARD.EDU (Bob Walsh) (12/15/86)
Craig, Have you thought of using a separate variable to measure the RTT of each packet so that you can update you smoothed RTT using the EACKs? When I last did RDP work, RDP and TCP were roughly the same speed. Maybe RDP was a bit quicker even in the LAN environment. The reason RDP did not dominate TCP was that the machines I was using were VAXes and the RDP checksumming algorithm did not run as fast as it would on a machine with a different byte ordering (like the 68K based workstations). bob
mills@HUEY.UDEL.EDU.UUCP (12/16/86)
Craig and Bob, Keeping roundtrip-delay samples on a per-packet basis really does help (the fuzzballs have been doing that for several years), as does initializing the estimator with the SYN/ACK exchange. Another thing, first pointed out by Jack Haverty of BBN, is the behavior when the window first opens after it has previously closed. If the ACK carrying the window update is lost, performance can lose big. This may be one reason TP-4 uses a different "active ACK" policy. While at it, consider the receiver policy and when to generate ACKs (delayed or not). Silly implementations that always send 2-3 ACKs for every received packet might actually win under warmonger conditions. Dave
craig@LOKI.BBN.COM (Craig Partridge) (12/16/86)
> Have you thought of using a separate variable to measure the RTT of each > packet so that you can update you smoothed RTT using the EACKs? That's precisely what I'm doing. Then the out-of-order rule is used to discard RTTs that seem likely to cause SRTT explosion. > When I last did RDP work, RDP and TCP were roughly the same speed. Maybe > RDP was a bit quicker even in the LAN environment. The reason RDP did > not dominate TCP was that the machines I was using were VAXes and the > RDP checksumming algorithm did not run as fast as it would on a machine > with a different byte ordering (like the 68K based workstations). Certainly the RDP checksum on the VAX is a real problem. On the SUN the checksum I use is 40% faster than the TCP checksum; on the VAX the checksum is about 3 times *slower* than the TCP checksum. (You probably wrote a better one, I haven't compared them). And over a perfect network, the checksum performance seems to dictate speed. But once there is any packet loss on the network the data handling costs seem to become rather insignifigant, and the big issue (I believe) is retransmission mechanisms. Unfortunately, once the network drops packets, there seems to be a very wide variation in throughput from test to test and it gets hard to say anything definitive. There's also the problem of, when you get a definitive answer, is it a real difference, or merely demonstrating an odd quirk of the particular RDP or TCP implementation? (I.e. am I asking the right question?) One quickly develops a healthy respect for TCP. Craig