[fa.tcp-ip] TCP Timer

tcp-ip@ucbvax.ARPA (06/12/85)

From: zhang%erlang.DEC@decwrl.ARPA  (Lixia Zhang)

I have a few questions concerning setting timers for outstanding TCP
segments.

The following paragraph from the TCP spec (rfc-793, page 10) sounds like
that a timer should be set for each outstanding segment:

"When the TCP transmits a segment containing data, it puts a copy on a
retransmission queue and starts a timer;  when the acknowledgment for that 
data is received, the segment is deleted from the queue.  If the
acknowledgment is not received before the timer runs out, the segment is
retransmitted."

My questions are:

- Should we understand it as one timer need be set for each outstanding
  segment?

- What are the situations in real implementations?  From what I've heard, most
  implementations use a single timer per connection - am I right?
  I sort of remember this was mentioned in a recent msg
  that some implementations set a timer for each segment, some others use a
  single timer.  What is the population percentage for each side?

- Does anyone have any idea, or observations, about the performance
  difference, big or small, between the two different implementations?

Lixia

tcp-ip@ucbvax.ARPA (06/12/85)

From: David C. Plummer in disguise <DCP@SCRC-QUABBIN.ARPA>

    Date: Wednesday, 12 Jun 1985 06:54:15-PDT
    From: zhang%erlang.DEC@decwrl.ARPA  (Lixia Zhang)

    I have a few questions concerning setting timers for outstanding TCP
    segments.

    The following paragraph from the TCP spec (rfc-793, page 10) sounds like
    that a timer should be set for each outstanding segment:

    "When the TCP transmits a segment containing data, it puts a copy on a
    retransmission queue and starts a timer;  when the acknowledgment for that 
    data is received, the segment is deleted from the queue.  If the
    acknowledgment is not received before the timer runs out, the segment is
    retransmitted."

    My questions are:

    - Should we understand it as one timer need be set for each outstanding
      segment?

    - What are the situations in real implementations?  From what I've heard, most
      implementations use a single timer per connection - am I right?
      I sort of remember this was mentioned in a recent msg
      that some implementations set a timer for each segment, some others use a
      single timer.  What is the population percentage for each side?

    - Does anyone have any idea, or observations, about the performance
      difference, big or small, between the two different implementations?

Our experience at Symbolics says that a per-connection timer and only
retransmitting the FIRST segment on the retransmission queue is
sufficient.  In a mostly-reliable network medium, most segments do get
through and are then acknowledged.  If one does get through, it winds up
at the front of the retransmission queue.  The other segments on the
retransmission queue ususlly got through, but can't be acknowledged
until the first segment is acknowledged.  When the first segment DOES
get through, the whole batch is often acknowledged.

tcp-ip@ucbvax.ARPA (06/13/85)

From: imagen!geof@su-shasta.ARPA


The first packet in the retransmission queue is the important one for
retransmission purposes, as Dave Plummer points out, since it is the
first packet that will be acked (and removed from the retrans. queue).

Dave's point about only retransmitting the first packet from the
retransmission queue is interesting.  I would be stronger than his
statement that this is "good enough" -- if a host always retransmits
everything on the retransmission queue, performance could be
drastically affected in certain situations.  Consider, for example, a
host that is transmitting through a gateway, and can send packets
faster than the gateway can forward them (perhaps a gateway from
Ether->Arpa).  Eventually, the gateway runs out of buffers and starts
to miss packets.  It is not uncommon to see a situation, for example,
where a gateway loses the fourth packet from every batch that is
blasted out by a particular host.

This sort of lossage can be seen at the destination host as well.  It
is a common cause of lost connections when a gateway is fragmenting
packets (in reverse of the above: the fast little gateway sends two
fragments very close together, and the big, slower host always misses
the second).  TCP performance can be affected similarly.

Dave's comment that retransmitting the first packet often results in
everything being acked seems to fit this model.  The first packet on
the retransmission queue (the first one that the foreign host didn't
ack) is probably the single packet that was lost.  [Interestingly,
Xerox' XNS SPP implementation also retransmits only the first packet on
the queue, for similar reasons (they have a serial-line gateway with
one packet buffer in it).]

Unfortunately, retransmitting only the first packet on the
retransmission queue, while it works, also has performance problems.
If the foreign host didn't lose just one packet, but lost a whole
string of them, TCP degenerates to a lock step (cf.  TFTP) protocol for
the rest of the string.  Over a very long-haul connection (e.g.,
satellite), this can cause a delay of seconds every time a packet is
lost.  Maybe in practise the number of packets lost in a row is
statistically close enough to 1 that this is not a problem.

I don't know a real answer to this problem (maybe someone else does...)
other than flow control (congestion control?) between a host the
gateway(s) that it is using on a particular connection (which is not
really possible in the TCP world -- say, wasn't someone asking if there
were any supporting arguments for that anti-TCP article in last month's
SigComm review?:-)).  Perhaps one might arrange to heuristically determine
in the sender that the Nth packet is reliably being lost, and throttle
back on the inter-packet time accordingly.  Sounds complicated.

- Geof

tcp-ip@ucbvax.ARPA (06/13/85)

From: CLYNN@BBNA.ARPA

Lixia,
	Working with the DARPA Protocol Suite on TOPS20s has lead to
the same conclusions that Dave has mentioned.  I think that Geof's
statement that "performance could be drastically affected" is weak - it
is drastically affected, especially when the retransmitter does not
have a very good round-trip-time algorithm.

	A refinement of only sending the "first packet" in the queue
is to only retransmit one packet - the "first packet" plus as much
additional data from other packets as will fit into a maximum sized
segment (repacketization).  This is effective in reducing character
per round-trip-time echoing in telnet applications (dribble echo)
since most original packets have few data octets and most (if not all)
of the outstanding data will fit into a maximum sized segment.  As Geof
points out, it does present problems if the packets to be retransmitted
are all full size; more below.

	The statement that when a packet is retransmitted, it plus all
the others that made it the first time will be acked is true some of
the time.  It fails when the receiver is packet-event driven, i.e.,
the receiver responds to each packet it processes.  In such a case it
will ack the retransmitted packet, then proceed to reassemble the
other packets from its reassembly queue, acking each one in turn.  The
sending host then sees several acks arriving instead of one.  If it has
set a timer for each packet in the retransimssion queue, the timer for
the second packet has probably already gone off.  Consequently, when
the ack for the packet which was retransmitted arrives, the second
packet becomes the first and is retransmitted - frequently just before
the ack arrives.

	Retransmission of everything which is outstanding can make
performance worse, due to the interaction of factors, e.g., window
size, available data, round-trip-time algorithm, gateways, etc.  A
large window with lots of data causes, for example, the x
implementation on the x, to send as many packets as fast as possible.
Many gateways cannot handle the large burst of packets, so discard a
few, pass a few, drop a few more, etc.  When the retransmission timer
goes off, everything in the queue is retransmitted - some packets get
through and a few more are dropped.  The lost packets seem to make the
round-trip-time estimate grow very large, so throughput goes way down
(wait a minute, send a flood, wait a minute ...).  If the recieving
host can figure out what is happening it can "discourage" such
behavior by using the window to limit the number of packets so that
the gateway will not be swamped (source quench doesn't seem to be well
enough defined and widely enough implemented to be effective).  In
practice, closing the window to just cover the gap of lost packets and
delaying the ack until the reassembler has to stop seems to help a
lot.  (Of course, the sending host should attempt to limit its rate of
packet generation - delays between packets (or fragments in a gateway),
maximum number of outstanding packets (possibly a function of what had
to be retransmitted), etc.)

	Add TOPS20s to your survey under the "single timer,
retransmits a single repacketized packet, with delayed ack" column.

Charlie

tcp-ip@ucbvax.ARPA (06/13/85)

From: "J. Noel Chiappa" <JNC@MIT-XX.ARPA>

	The TCP I did for Bridge (the one used in the CS-1/T terminal
concentrator) used the same strategy, and for exactly the same
reasons. It only kept a single timer on the oldest data. On timeout,
it sent up to one full packet of un-ack'd data. So yet another system
in that column. (I'm not sure if it still does this, since they
changed things around and ripped some stuff they didn't udnerstand the
use of, like subnet masks, out.)

	This discussion brings up an interesting point, which is that
on all except the slowest lines network traffic control wants to
deal in units of packets, not bytes, since most overhead is per
packet. Currently TCP is byte oriented because of the window and
flow control; we need to have 'conciousness raising' for higher level
protocol implementations to orient them to this aspect of IP (if
and when IP ever gets traffic control).

	Noel
-------

tcp-ip@ucbvax.ARPA (06/15/85)

From: CERF@USC-ISI.ARPA

Lixia,

I can't offer any statistics on other implementations, but one would
expect that a timer for the segment at the head of the queue for
any paritcular connection would be sufficient. If that segment times
oout, you retransmit it and perhaps you look to see if any other
segments on the queue have timed out.  When the acknowledgement is
received, you reset the timer for the next segment in the queue
if it has not also been acknowledged.

The calculation of the proper timeout in a variable delay 
environment is a challenge. Perhaps Dave Mills has the most experience
in devising ways to cope with varying transmission and propagation
delays.  Others have experimented with solutions to this problem and
I trust they will also respond to your query.  You might also read 
some of Dave Clark's notes to implementors on the subject of handling
retransmission.

Vint Cerf

tcp-ip@ucbvax.ARPA (06/15/85)

From: MILLS@USC-ISID.ARPA

In response to the message sent  15 Jun 1985 06:40-EDT from CERF@USC-ISI.ARPA

Vint,

The problem of estimating the mean of the roundtrip-delay random variable
was discussed in RFC-889. The so-called "RSRE algorithm," which we have
been using for several years, provides only a single sample per roundtrip
interval, which is appropriate for retransmission policies in which only the
first wadge on the retransmission queue is retransmitted. However, convergence
to a good mean-estimate is accellerated if you keep separate timers for
each segment originally transmitted and update the estimate with a new sample
as the ACK-sequence number passes by the first octet of the corresponding
segment, even if the segment timer isn't used for anything else. Under
conditions where many segments can be in flight, the estimate is very much
improved. The estimate can be improved further using a nonlinear smoothing
algorithm, as discussed in RFC-889.

All this horsepower was found necessary for a pair of fuzzballs to grumble
to each other via a transatlantic-cable link using a statistical multiplexor.
The delay variance on that circuit you wouldn't believe!

Dave
-------