[mod.protocols.tcp-ip] Initial estimate of round-trip time

jbn@GLACIER.STANFORD.EDU (John B. Nagle) (11/02/86)
      In general, it is better to overestimate the round-trip time and
wait a bit longer than underestimate and cause congestion.  Very short
initial RTT estimates have caused considerable trouble in the past;
at various times, TOPS-20, 4.2BSD, and Symbolics TCP implementations
have had unreasonably short initial guesses.  And of course, if you
consistently use a RTT estimate smaller than the actual RTT, you
will fill up the link with multiple copies of the packet and cause
congestion collapse.  So think big.  I would argue for 5 seconds
as a good first guess.

      Unfortunately, there are problems that lead to a strong desire
to use a shorter interval.  Most of them reflect bugs or weak design
in the systems involved, but they are nonetheless real.   Here
are a few.

      There are some implementations that lose the first packet at
the link level due to a simple-minded implementation of ARP.  If
both source and destination have this problem, the first two packets
may be lost.  If source and destination are both on LANs interconnected
by gateways, everybody uses ARP, and nobody has the relevant entries
cached yet, it may be necessary to send a TCP SYN packet FIVE (5)
times before a reply makes it back to the source host.  And this is
in the absence of any packet loss for other reasons.

      There are some TCP implementations that lose the first SYN
packet received; the first SYN packet triggers the firing off of a
server task, but the server task doesn't get the SYN packet and
has to wait for a duplicate of it.  Again, this is a bad TCP
implementation, but such exist.

      And then, of course, there is packet loss through congestion,
about which I have written before.  The previous comment about an
observed 90% packet loss rate makes it clear that the problems 
have become more severe in the past few months.  Loss rates like
that can only come from vast overloads from badly-behaved implementations.

      The combination of all these problems does indeed temp one to
use a short RTT in hopes of improving one's own performance at the
expense of everybody else.  But resist the temptation.  It won't
fix the problem, which is elsewhere, and it will make the congestion
situation worse.

      Incidentally, I see nothing wrong with loading up a network to
100% link utilization.  If all the proper strategies are in, this should
work just fine.  We did this routinely at Ford Aerospace, with file transfers
running in the background sopping up all the idle line time while TELNET
sessions continued to receive adequate service.  It's no worse than 
running a computer with a mixed batch/time sharing load and 100% CPU
utilization.  (UNIX users may feel otherwise, but UNIX has traditionally
had a weak CPU dispatcher, being designed for a pure time-sharing load.)
The problem is not legitimate load, it's junk traffic due to bad
implementations.  We know this because if everybody did it right
the net would slow down more or less linearly with load, instead of
going into a state of semi-collapse.

      If some node is dropping 90% of its packets, somebody should 
be examining the dropped packets to find out who is sending them.
The party responsible should be spoken to.  Or disconnected. 
A little logging, some analysis, and some hard-nosed management
can cure this problem.  For most implementations, the fixes exist.
They usually just need to be installed.  There are still a lot of
stock 4.2BSD systems out there blithering away, especially from
vendors that don't track Berkeley too closely.

      As I pointed out in RFC970 (which will appear in IEEE Trans. on
Data Communications early next year, by the way), even a simple
scheduling algorithm in the gateways should alleviate this problem,
and prevent one bad host from effectively bringing the net down.

      Good luck, everybody.

					John Nagle