[comp.protocols.tcp-ip] Your TCP timer code

van@LBL-CSAM.ARPA (Van Jacobson) (06/28/87)

Keith -

I think that rtt algorithms commonly confuse what's being done
with why it's being done.  In tcp, "what" is measuring packet
round trip time but "why" is to determine when packets have been
lost.  A good loss estimator may not be a good rtt estimator. 

The reasoning behind the mean & deviation estimation was that
packets "almost always" get acked within the time interval
RTTmean + 2 * RTTdev (by Chebyshev's inequality).  So you can
set a timer for this interval when you launch a packet and have
fair confidence that the packet has been lost if the timer goes
off before you get an ack.

The algorithm I gave makes bad predictions when rtt goes down
because adding in the absolute value of the prediction error
keeps rto at its old value for four or five samples.  There are
at least a couple of reasons for rtt to go down: (1) the packets
might have found a better route or (2) there might be a mis-match
between the window size & the path delay, creating a large time
gap between the packet at the right edge of a window & the packet
at the left edge of the next window. 

(1) doesn't happen all that often (at least, not to my packets)
and, when it does, the new path is often not as lossy as the old
so keeping rto high for a few samples has a negligible effect on
performance (because the retransmission timeout becomes less
likely). 

(2) is typical Internet conditions and you find that when rtt
goes down it's going to jump back up again in one or two packets.
If you let rto follow rtt down, there's a very high probability
that you will do a spurious retransmission of the next packet
(remember that we are doing one-ahead predictions: we use the
current packet's rtt to predict the loss of the next packet).
Rtt's are jumping up and down because the network path is overloaded
and your retransmission is going to load it more.  It's not hard
to show this is a regenerative feedback that leads to congestive
collapse.

So, by the principle of least damage, you want the loss prediction
to stay high for a short while after rtt goes down.

[Arguments from stochastic control theory can make this
handwaving a bit more rigorous.  I try to make some of these
arguments in my paper.  More importantly, the paper has some
plots of rtt on typical Internet tcp connections that (I think)
make it obvious why you want rto to stay high.  With luck, I'll
finish this paper shortly and find someone interested in
publishing it.]

  - Van

  (for people not into alphabet soup: 
	rtt = Round Trip Time
	rto = Retransmission Timd in'20