van@LBL-CSAM.ARPA (Van Jacobson) (06/28/87)
Keith - I think that rtt algorithms commonly confuse what's being done with why it's being done. In tcp, "what" is measuring packet round trip time but "why" is to determine when packets have been lost. A good loss estimator may not be a good rtt estimator. The reasoning behind the mean & deviation estimation was that packets "almost always" get acked within the time interval RTTmean + 2 * RTTdev (by Chebyshev's inequality). So you can set a timer for this interval when you launch a packet and have fair confidence that the packet has been lost if the timer goes off before you get an ack. The algorithm I gave makes bad predictions when rtt goes down because adding in the absolute value of the prediction error keeps rto at its old value for four or five samples. There are at least a couple of reasons for rtt to go down: (1) the packets might have found a better route or (2) there might be a mis-match between the window size & the path delay, creating a large time gap between the packet at the right edge of a window & the packet at the left edge of the next window. (1) doesn't happen all that often (at least, not to my packets) and, when it does, the new path is often not as lossy as the old so keeping rto high for a few samples has a negligible effect on performance (because the retransmission timeout becomes less likely). (2) is typical Internet conditions and you find that when rtt goes down it's going to jump back up again in one or two packets. If you let rto follow rtt down, there's a very high probability that you will do a spurious retransmission of the next packet (remember that we are doing one-ahead predictions: we use the current packet's rtt to predict the loss of the next packet). Rtt's are jumping up and down because the network path is overloaded and your retransmission is going to load it more. It's not hard to show this is a regenerative feedback that leads to congestive collapse. So, by the principle of least damage, you want the loss prediction to stay high for a short while after rtt goes down. [Arguments from stochastic control theory can make this handwaving a bit more rigorous. I try to make some of these arguments in my paper. More importantly, the paper has some plots of rtt on typical Internet tcp connections that (I think) make it obvious why you want rto to stay high. With luck, I'll finish this paper shortly and find someone interested in publishing it.] - Van (for people not into alphabet soup: rtt = Round Trip Time rto = Retransmission Timd in'20