[mod.protocols.tcp-ip] Setting Initial Round-trip time

craig@loki.bbn.com (Craig Partridge) (10/30/86)

    I'm working on an implementation of RDP and am trying to find ways
to improve the round-trip time estimates.  The timeout algorithm is the
same as TCP's with the values suggested in RFC 889, but I've noticed that
choosing the wrong initial value for the estimated round trip time can
have a severe impact on throughput if the total number of packets is
relatively small and the link is lossy.

    I'd like to improve that performance by choosing better initial values.
This isn't something I know very much about so I'm soliciting advice.  How
do other people choose the initial value to put into the round trip estimate
equations?   What mechanisms do you recommend or strongly discourage or
disparage?

Craig Partridge
CSNET Technical Staff

james@ZERMATT.LCS.MIT.EDU ("James William O'Toole, Jr.") (10/30/86)

    Date: 30 Oct 1986 04:24-EST
    From: CERF@A.ISI.EDU
    Subject: Re: Setting Initial Round-trip time

    A background process might try to gather data - but this would by like
    pinging everyone just in case you might want to talk with them - leads
    to predictable disaster.

A background process could more easily maintain a cache of round trip
time data measured from recent traffic.  Connections from a given host
are probably concentrated on certain destinations, so you ought to be
able to do much better than pinging.  Of course, you still need to know
which measurements to take and how to use them.  Mean and variance of
round trip time on a per host basis, with recent data more heavily
weighted, perhaps?

braden@isi.edu (Bob Braden) (10/31/86)

Members of the END2END taskforce have also been worrying  about your
problem  (setting an initial round-trip time).  It is an important
(but not usually critical) problem for connection-oriented protocols like
TCP and RDP; it is more important and more difficult for a transaction-
oriented transport protocol of the sort we have been stalking.

As Vint says, a background process to gather data is a VERY bad idea.
Nor do I think the hop count is at all useful as a predictor.

However, it may not be a bad idea to maintain a cache of historical data
about observed round trip times to hosts you have talked to recently.
This may help in the case of a small "working set" of hosts you talk to.
In the opposite case ... short transfers to a very large collection of
randomly chosen hosts -- the answer may very well be that you cannot
reasonably expect very good performance when nets are lossy, and should
not waste time trying to obtain the impossible.  And at system startup, 
before there is any historical data, you cannot do very well.

Someone needs to think a little about the right way to maintain such a
cache of historical RTT's per host, with respect to the way you maintain
it (considering dispersion of observations), and how you use the data.

Someone is going to say "You have to ask the gateways".  Well, maybe, but
that would seem to come some time after we are able to provide effective
type-of-service routing over widely-diverse paths.  Or maybe at the
same time?  Still, it wouldn't hurt for us to pose this requirement 
(an ICMP message from a host inquiring about the probable delays to a given
Internet host) to Dave Mills' INARCH task force, and see what they can
come up with.  This requirement seems to go deep into the overall routing
architecture of the Internet.

Finally, our Internet Architect has a simple answer to your problem:
don't use RDP, use NETBLT.  NETBLT shares the principal advantages of
RDP (packet-orientation and selective retransmission), but uses a 
different use of timers so that the RTT is not important.  Of course,
NETBLT has its own set of hard timer problems...

Bob Braden

mckee@MITRE.ARPA (H. Craig McKee) (10/31/86)

You noted that the number of packets was small and the link was lossy.

Is the link lossy because of congestion or because of noise?

If the former (congestion) then the advice of John Nagle (RFC 896)
is relevant.

If the latter (noise) then more complex measures are needed; for
example, a Session/Presentation layer Forward Error Correction
procedure.

markl@JHEREG.LCS.MIT.EDU (10/31/86)

Don't use NETBLT.  Not yet.  NETBLT is an *experiment* in high-speed
bulk data transfer.  It works best over long-delay, high-bandwidth
(i.e. satellite) networks, although it gives very good performance on
othre networks as well.  Although a proposed spec is available, we are
actively discouraging people from implementing it until we decide
whether or not the experiment works.  We are currently testing NETBLT
over the Wideband network with fairly good results, but a lot of work
needs to be done tuning the spec before anyone can use NETBLT.

				Mark

Internet: markl@jhereg.lcs.mit.edu

MIT Laboratory for Computer Science
Distributed Systems Group

braden@ISI.EDU (Bob Braden) (10/31/86)

	
	The problem with the cache is that a) you don't know if the values
	correspond to the routes your traffic is/will take. b) you may not
	have enough traffic to enough destinations to maintain statistically
	valid (fresh) delay information. It may be possible that things will
	be static enough that cache will actually work well most of the time,
	but if the internet exhibits significant statistical variation in
	delay/throughput, the cache may be not only misleading but downright
	harmful.
	
Vint, 

Everything you say is true, but I fail to see how bad cache information
will be more harmful than no information at all.

Bob Braden

van@LBL-CSAM.ARPA (Van Jacobson) (10/31/86)

I've got a little bit of hard data and a simple change that may improve
things for 4.[23]bsd.  About six months ago I instrumented our Internet
gateway to record a timestamp, src & dst address and port and data & ack
sequence numbers for each tcp packet.  I reduced two days worth of
this trace data and found

  . the distribution of number of data packets sent on a connection
    was roughly exponential with a mean of 4 (e.g., a lot of mail
    traffic).

  . between 7am and 5pm, PDT, the mean number of retransmissions per
    data packet was 8.  The distribution was bi-modal, with traffic
    through the "mail bridges" in one lobe (with a mode of ~11) and
    all other traffic in the other lobe (a mode of ~2).  Both lobes
    were approximately Poisson, possibly due to the long "learning
    time" of round trip timers on the connections.

  . For a particular connection, round trip times varied about an
    order of magnitude.  Over all connections, round trip times
    varied three orders of magnitude.  (With the large number of
    retransmissions, there is some ambiguity in which packets one
    uses to estimate RTT.  I generally used the time from the first
    use of a sequence number to its first ack, with some ad hoc
    hackery to accomodate the 50% packet loss through the mail
    bridges.  Given this ambiguity, the uncertainty in any RTT
    estimate is at least a factor of two.)

  . The next hop Internet gateway was strongly correlated with the RTT
    in the following sense:  the average RTT of all packets through a
    gateway is within a factor of three of the average RTT of each
    connection through that gateway.

It is not hard to convince yourself that no reasonable setting of the
RTT initial value & filter constants is going to accomodate a factor
of 1000 variation in four packets worth of learning time.  I mentioned
the problem to Mike O'Dell & he suggested using the kernel's routing
entry to cache the RTT.  I made up a kernel that kept an RTT in each
route.  When a TCP connection was opened, it initialized its RTT from
the route.  When the connection closed, its RTT was used to update
the route's RTT with a weighted average
     Rt = A * Rt + (1-A) * Conn
(the 4.3bsd kernel changes for all this amount to about 20 lines of C).

I took 12 hours of trace data with the filter constant set to .5.
The average number of retransmissions *for traffic that originated at
the gateway* went down a factor of two (8 to 4).  I was going to take
more data and try tuning the connection and route filter constants.
Unfortunately, some local political changes intervened and I can no
longer make changes or take data on the gateway.  However, the initial
results were promising enough that I plan to try a similar scheme
for machines that sit behind the gateway (i.e., construct pseudo-route
entries to cache the RTT).

  - Van Jacobson, LBL

brescia@CCV.BBN.COM (Mike Brescia) (11/01/86)

Re: analogies

     ... like having a broken clock which is right twice a day rather than a
     clock which simply runs fast or slow and is never right. I will let you
     pick which analogy to apply!

The analogy I wish to apply would be that neither a broken clock nor a
miscalibrated clock will ever be right if I am trying to count apples or
oranges.

I'd like to attack the assumption that knowing the round trip time will
compensate for the fact that packets are allowed to be dropped in the system.

(Mom & Apple Pie division)

In the current IP model, a packet may be delayed (10 to 100 seconds and more
have been reported), or dropped because of a transmission failure or
congestion at some gateway or packet switch.  If a packet is delayed, there
should be no retransmission because the second packet will only be delayed
behind the first.  If it is dropped due to transmission failure, the
retransmission should be as soon as possible, so that the end-point hosts see
a minimum disruption.  If it is dropped due to congestion, the retransmission
should be only as soon as you know the packet can get through or around the
congestion, otherwise you are only exacerbating it.

If you have arrived at a reasonable round trip time, and you have a packet
which has not been acknowledged after (some factor of) that time, can you
deduce which of the three things has happened to the packet?  If you make the
wrong decision, you can make things worse for yourself or the community of
users.


(Blue Sky division)

If the Internet could provide a better guarantee of delivery, as once the
Arpanet did, retransmission would not need to be so widespread, and a good
measure of round trip time would not be so much of a panic.  The Internet
model would need to be extended so that the effects of transmission losses and
congestion could be controlled.


    Mike

braden@ISI.EDU (Bob Braden) (11/03/86)

Van,

Your message is wonderful.  After years of our sitting around making
Aristotelian speculations on this mailing list, you actually took some
data, with fascinating results.  Can we hope that you will write this
data up?  It is hard to see how we can make progress in this game unless
results like yours get disseminated for peer comment and education.

From my viewpoint, an RFC would be the right level... it would not have
the formality or publication delays of a "real" paper, and would make this
data and your ideas available as soon as possible.

If you don't have time to write it up, could be persaude you to come to
an appropriate task force meeting and present it?

Bob Braden