[mod.protocols.tcp-ip] TCP RTT woes revisited

craig@LOKI.BBN.COM.UUCP (12/14/86)

    This weekend I had time to start processing Van Jacobson's suggested
fixes/modifications.  Things started working very well after the first fix
which made TCP choose better fragment sizes and increased the time to live
for IP fragments.

    The subsequent testing also revealed some interesting results.  (These
are preliminary and subject to reappraisal).

    (1) EACKs appear to make a huge difference in use of the network.
    After seeing signs this was the case, I ran the simple test of
    pushing 50,000 data packets though a software loopback that
    dropped 4% of the packets.
    
    With EACKs there were 1,930 retransmissions, of which 1 received
    packet was a duplicate (note that some of the retransmissions were
    also dropped).

    Without EACKS there were 12,462 retransmissions of which 9,344
    received packets were duplicates.

    12,462 retransmissions is, of course, bad news, and comes from
    the fact that this RDP sends up to four packets in parallel.
    Typically the four get put into the send queue in the same
    tick of the timer, so when the first gets retransmitted,
    all four do.  The moral seems to be use EACKs even though
    they aren't required for a conforming implementation.

    (2) Lixia Zhang's suggestion that one use the RTT of the SYN to
    compute the initial timeout estimate appears to work very well.

    (3) EACKs may make it possible to all but stomp out RTT feedback
    (those unfortunate cases where a dropped packet leds to an
    RTT = (the number of retries * SRTT) + SRTT being used to compute
    a new SRTT.  I've been experimenting with discarding RTTs for out of
    order acks.  This is best explained by example.  If packets 1, 2, 3
    and 4 are sent, and the first ack is an EACK for 3, the implementation
    uses the RTT for 3 to recompute the SRTT, but will discard the RTTs
    for 1 and 2 when they are eventually acked (or EACKed).  The
    argument in favor of this scheme is that the acks for 1, and 2
    probably represent either (a) RTTs for packets that were dropped,
    and thus including them would lead to feedback or (b) RTTs that reflect 
    an earlier (and slower) state of the network (3 was sent after 1 and 2)
    and using them would make the SRTT a less good prediction of the
    RTT of the next packet.  Note that (b) would be more convincing
    if it wasn't the case that 1, 2, 3 and 4 were probaby sent within
    a few milliseconds of each other.

    Watching 5 trial runs of 100 64-byte data packets bounced off Goonhilly
    this algorithm kept the SRTT within the observed range of real RTTs
    (as opposed to RTTs for packets that were dropped and had to be
    retransmitted).

    Using EACKs but taking the RTT for every packet, (again doing 5 trial
    runs) several cases of RTT-feedback were seen.  In one case the SRTT
    soared to ~35 seconds when a few packets were dropped in a short period.
    Since the implementation uses Mill's suggested changes which make
    lowering the SRTT take longer than raising it, the SRTT took some
    time to recover.

People may be wondering about observed throughput.  How fast does RDP
run vis-a-vis TCP?  That turns out to be very difficult to answer.
Identical tests run in parallel or one right after another give
throughput rates that vary by factors of 2 of more.  As a result it
is difficult to get throughput numbers that demonstrably show differences
which reflect more than random variation.   After running tests for 7
weekends (and millions of packets) I have some theories, but those keep
changing as different tests are run.

Craig

P.S.  Those millions of packets are almost all over a software loopback.
The contribution to network congestion has been small.

walsh@HARVARD.HARVARD.EDU (Bob Walsh) (12/15/86)

Craig,

Have you thought of using a separate variable to measure the RTT of each
packet so that you can update you smoothed RTT using the EACKs?

When I last did RDP work, RDP and TCP were roughly the same speed.  Maybe
RDP was a bit quicker even in the LAN environment.  The reason RDP did
not dominate TCP was that the machines I was using were VAXes and the
RDP checksumming algorithm did not run as fast as it would on a machine
with a different byte ordering (like the 68K based workstations).

bob

mills@HUEY.UDEL.EDU.UUCP (12/16/86)

Craig and Bob,

Keeping roundtrip-delay samples on a per-packet basis really does help
(the fuzzballs have been doing that for several years), as does initializing
the estimator with the SYN/ACK exchange. Another thing, first pointed out
by Jack Haverty of BBN, is the behavior when the window first opens
after it has previously closed. If the ACK carrying the window update is
lost, performance can lose big. This may be one reason TP-4 uses a different
"active ACK" policy. While at it, consider the receiver policy and when
to generate ACKs (delayed or not). Silly implementations that always send
2-3 ACKs for every received packet might actually win under warmonger
conditions.

Dave

craig@LOKI.BBN.COM (Craig Partridge) (12/16/86)

> Have you thought of using a separate variable to measure the RTT of each
> packet so that you can update you smoothed RTT using the EACKs?

    That's precisely what I'm doing.  Then the out-of-order rule is used
to discard RTTs that seem likely to cause SRTT explosion.

> When I last did RDP work, RDP and TCP were roughly the same speed.  Maybe
> RDP was a bit quicker even in the LAN environment.  The reason RDP did
> not dominate TCP was that the machines I was using were VAXes and the
> RDP checksumming algorithm did not run as fast as it would on a machine
> with a different byte ordering (like the 68K based workstations).

    Certainly the RDP checksum on the VAX is a real problem.  On the
SUN the checksum I use is 40% faster than the TCP checksum;  on the
VAX the checksum is about 3 times *slower* than the TCP checksum. (You
probably wrote a better one, I haven't compared them).  And over a
perfect network, the checksum performance seems to dictate speed.

    But once there is any packet loss on the network the data handling
costs seem to become rather insignifigant, and the big issue (I believe)
is retransmission mechanisms.   Unfortunately, once the network drops
packets, there seems to be a very wide variation in throughput from
test to test and it gets hard to say anything definitive.  There's also
the problem of, when you get a definitive answer, is it a real difference,
or merely demonstrating an odd quirk of the particular RDP or TCP
implementation?   (I.e. am I asking the right question?) One quickly
develops a healthy respect for TCP.

Craig