[mod.protocols.tcp-ip] Analyzing Acknowledgement Strategies, continued

van@LBL-CSAM.ARPA.UUCP (01/17/87)

Boy, was I wrong!  I've had about 100 messages in the past couple
of days saying this discussion should stay on the list.  But this
opus should convince the diehards that they made a mistake...

Dave -

You're absolutely right.  Linear filters, including Kalman
filters, do a lousy job on estimating round trip time (witness
all the spurious retransmissions on our poor, congested
Internet).  And median filters, because they have very high noise
immunity and contain no assumptions about the underlying data
distribution, should do a great job estimating r.t.t.  But,
(since I'm running 0-for-3, I have to say "But...") I was
thinking of estimating something slightly different from r.t.t. 
I pictured the "filter" estimating the coefficients of a
mathematical model that describes how a network works.  Then tcp
would use those coefficients to decide what to do when sending
and retransmitting.  Since I didn't even come close to explaining
this in the last message, I'll try here.  (The simple-minded
explanations illustrate my ignorance; they're not intended as an
insult anyone's intelligence.)


What's the problem?

If you don't get an ack for a packet within some reasonable period
of time, you will have to retransmit.  There are two possible reasons
for not getting an ack and they lead to diametrically opposed
retransmit strategies:

  1) The packet was damaged or lost in transit.  In this case you
     want to retransmit the packet as soon as you can because
     throughput goes down linearly while you wait.  Or to put it
     another way, in this case throughput will increase linearly
     with retransmit rate.

  2) The packet was discarded because of congestion at a gateway.  In
     this case you want to wait a while before retransmitting to give
     the congestion a chance to clear.  In this case throughput will
     decrease exponentially with retransmit rate.

"No ack" conveys exactly one bit of information (that a packet was
lost) and there's no way we can squeeze another bit out to tell us
if this is case 1 or 2.  But, if acks are not independent of
one another, we might squeeze something out of pattern of acks
(the ack "time series") to help us decide between cases.


What information can you pull out of the time series?

I'm about to play "mathematician", so I'd better define some
notation.  Lets say P(i) is the send time of the i'th packet,
A(i) is the arrival time of the i'th ack, and R(i) = A(i)-P(i) is
the round trip time of the i'th packet.  All times are measured
at the sender.  All packets are the same size and the receive window
is a constant W packets (packets i through i+W-1 can be sent
prior to A(i) or, equivalently, P(i+W) >= A(i)).  The time to
generate and consume packets is small compared to the network
propagation times (the sender will immediately send a packet when
the window opens and the receiver will immediately ack a packet
when it arrives). 

It's pretty clear that our traffic will introduce correlations in
ack time series.  This is because A(i), the ack for packet i, is
related to the send of i by
	A(i) = P(i) + R(i)
but P(i) was sent when the window opened, i.e., when the ack for
i-W arrived.  So P(i) = A(i-W) and
	A(i) = A(i-W) + R(i)
This is what time-series people would call an "auto-regressive"
or AR model.  It's the simplest case of an AR(W) model, one where
current events are related to events W steps in the past with R(i)
adding some randomness so you don't get complacent.

[The nice thing about time series models is that some smart
people have devised cookbook ways for a complete idiot to
construct them (a topic called "system identification").  For
example, "Time-series analysis: forecasting and control" by Box
and Jenkins leads you by the hand from `guessing possible models
from inspection of sample data' through `picking the best model
to describe the data' to `using the model once you have it'.]

It's also pretty clear that the R(i) time series will have
internal correlations, at least under congestion.  You can view
the round trip time as being proportional to the amount of "work"
the network is doing (i.e., the number of packets ahead of us in
the gateway queues).  The definition of congestion is that we
aren't done handling the work from time i-1 when the new work for
time i arrives.  The previous statements define an AR(1) model
for round trip time:
	R(i) = a R(i-1) + d + e(i)
where "a" is the fraction of work carried over from the last
step, "d" is the fixed, minimum, end-to-end propagation delay and
"e" is a random "noise" term to account for new traffic entering
and internal state changes (like adaptive routing). 

Extrapolating and waving my hands, I was going to make a model
something like
	R(i+1) = b (P(i) - P(i-1)) + a R(i) + e(i) + d
where "b" estimates how much our traffic rate affects the r.t.t.,
"a" estimates how much residual work in the network affects the
r.t.t.  and the variance of "e" estimates how much random
fluctuations contribute to the r.t.t.  (for people who like
acronyms, I think this is called an ARMAX model - Auto-Regressive,
Moving Average with eXogenous inputs).  The filter would estimate
a, b and var(e).  (The preceding sounds impressive and difficult
but it's not.  The whole thing consists of about 10 lines of
arithmetic to update 6 numbers which you do each time an ack
arrives.  With the help of a good introductory text, like Peter
Young's "Recursive Estimation and Time-Series Analysis", the
arithmetic even makes sense in a twisted sort of way.)


So What?

The model should get us at least three ways to choose retransmit
strategy: If a is large or da/di is positive, the network can't
handle its load and loss is probably congestive.  If e is small
or de/di is negative, the network is probably congested (this is
a consequence of the P-K theorem: as the network nears
saturation, "random" external perturbations have less effect on
it.  Intuitively, the variance of e is a measure of the random
way traffic arrives at and flows through the network.  But at
saturation you can no longer see these random arrivals: Data
arrives and sits in queues while some poor, constipated,
bottleneck gateway chugs on its backlog.  The whole network
behavior reduces to the (deterministic) behavior of this
gateway).  Finally, if db/di has the same sign as dR/di (i.e.,
sending packets faster lowers throughput) the network is probably
congested. 

There's a bit more information in the model.  "a" is a measure of
how fast congestion is growing so it tells you how much to change
your packet generation rate to help damp out the congestion.  In
the case of loss by damage, var(e) tells you how long to wait
before retransmitting.  E.g., say you're the only user of a
point-to-point link with 1 sec. mean round trip time and the
variation in the r.t.t. is +-50ms.  The tcp spec says to wait 2
sec. before retransmitting but you can be damn near certain you
need to retransmit after 1.2 sec.  The model should say a = b = 0,
d = 1 sec and var(e) = .1 sec.  If you set your retransmit timer
to d + 2var(e) instead of 2(d + var(e)), you won't do any more
retransmissions and thoughput could improve by up to 50%,
depending on the loss rate. 

There are many things in this that need to be checked by
experimentation.  One is the applicability of the network model. 
The whole idea is useless if someone has to do a new model
identification when an IMP is added to a network.  My hope is
that the model is "generic"; that anything serving the function
of "computer network" has a first-order description of this form.
The parameters will change from network to network but those are
estimated dynamically by the filter IF the model form is close to
correct. 

Another check is for the rate of convergence of the model.  This
will have been "an interesting academic exercise" if it takes 50
packets for the model to figure out the parameters.  My past
experience with these kind of filters (in control systems, not
networks) has been that they converge incredibly quickly (3 to 5
samples) so I'm hopeful. 

And finally, to get back to the start of this whole thing: acks. 
Send-to-ack round trip time is the only way the sender has of
probing the state of the network.  If the receiver can do
something random, acks only probe the composite state of
network+receiver.  If the receiver can go "maybe I'll generate 3
acks, just in case one gets lost", the information for the sender
in an ack is enormously reduced. 

I think this is a protocol design issue.  To do the type of
modelling I've been talking about, the actions of the receiver
have to be deterministic from the point of view of the sender. 
(Note that this prohibits things like delayed acks: the
receiver's behavior is determined by the packet interarrival
times, something the sender cannot know.  Since delayed acks are
a good thing on a local net, I'll modify the opening statement to
"the receiver's behavior should be deterministic when the transit
delay is long".)  In the case of tcp, I'd like the receiver's
behavior to be "one packet in, one ack out", just because I don't
know how to model anything more complicated. 

Congratulations to anyone who got this far.  I'm off to Usenix
for a week but there will be a test over this material a week
from Monday ;-).

 - Van

Mills@LOUIE.UDEL.EDU.UUCP (01/17/87)

Van,

Oh, yummy. Erstwhile grad student Mike Minnich here has been casting for
a dissertation topic. I think he just hooked his fish. It turns out to be
reasonably easy to capture real data to test various estimators, including
those you suggest, using the various fuzzballs scattered over the Internet
(some in positions of power). The easiest way of doing that is with ICMP
Echo messages with controlled spacing and size. The PING test program can
generate these messages under several different scenarios and collect the
RTTs in a file for later analysis. It would not be hard to add to that
program a scenario similar to that you suggest. I will include that on
Mike's plate of fish.

I already have a bunch of data collected under several scenarios and would
be glad to make it available to any who ask. The data include the test runs
described in RFC-889 plus the data collected recently and reported to this
list. All we need are some gullible number-cruncher hackers (having no
esthetic principles or agendae at all, I just hack in BASIC) willing to read
the texts you suggest (plus maybe H. van Trees' book(s) - you see how
old I am) and troll their own fishlines.

One of the things that struck me when I was experimenting with algorithms
similar to that suggested by Nagle was the enormous impact of the piggyback
mechanism with TELNET and remote echo. This affects the time series in
interesting ways and suggests a simplification where the traffic flows are
assumed symmetric. Otherwise, I think you may have to split your R(i) into
two components, one going and the other coming back.

Dave

geof@decwrl.DEC.COM@imagen.UUCP (Geof Cooper) (01/19/87)

I did some research on timers in TFTP at MIT in 1983, with Karen
Sollins and John Romkey.  The problem at hand was to develop a 
strategy that made TFTP work on both 1200 baud lines and ethernets.
We had time for exactly one test, and it worked in that test.  After
that I left MIT and no one else was interested in tuning the algorithm
further.  I don't have a SCRIBE implementation that is willing to output
this thing without a fuss.  I hesitate to send the note to the entire
list, although it is short.  Perhaps someone would like to be mailed
a copy and act as an FTP gateway for the rest.

The algorithm is different from one previous to it in that it used
something similar to the "timeline" strategy that you outlined.  In
this case, I used the number of retransmissions of successive packets
as an indicator.  I concentrated on avoiding congestion and not on
recovering from lost packets.  The algorithm is interesting in that
it is two algorithms -- one is invoked if everything is going well,
and the other is invoked if recent history suggests that congestion
is going on.

The basic idea was to "get the hell out of there" when you detected
that you were sending multiple retransmissions of each packet.  In
this case, the algorithm increases as N^2 with each successive sample.
The algorithm comes down using a 2-element linear filter [(this+last)/2].
Measurements of the roundtrip time are made only when no retransmissions
occur.  This is the only way that the roundtrip timer can be tightened.
The N^2 backoff can happen any time.

The following is a rough outline of the algorithm.  Recall that in TFTP,
only one data packet can be outstanding.  This packet is retransmitted
until an ACK is received and then the next packet is sent.  Thus, it is
equivalent to a sliding window protocol with window size 1, where each
ACK opens the window.  The particular ACKing strategy is not important,
except that at least one ACK must be generated for each data packet
received.

NT means Number of Transmissions of the current data packet.
(NT = 1) => packet sent once.  When an ACK is received execute:

    MeasuredRoundTripTime := TimeReceivedFirstAck - TimeSentFirstData;
    
    if last_NT = 1 then
        K := K_Init;
    
    if NT = 1 then
        RountripTime := ( MeasuredRoundTripTime + RoundTripTime ) / 2
    else
        begin
            if last_NT > 1 then
                K := K + K_incr;
            RoundTripTime := RoundTripTime + K
        end;

    last_NT = NT;
    NT := 0;
    
When sending a packet, execute:

    RetransmitTime = (1.5 * RoundTripTime) * 2^NT;
    NT := NT + 1;

Initial values are chosen to overestimate the roundtrip time.

======================


Also, Raj Jain (who at least was then) of DEC did some work on retransmission
strategy.  He did some simulations while at MIT's lab for computer science,
and wrote a paper on the results.  I have a draft version of the paper.
It was to be published, but I don't remember where, and I never saw it come
by in print.  Maybe I missed it.  Here is the abstract:

    DIVERGENCE OF TIMEOUT ALGORITHMS FOR PACKET RETRANSMISSIONS
        Raj Jain
        DEC Hudson HLO2-3/N03

    The problem of adaptively setting the timeout interval for retransmitting
a packet has been discussed.  A layered view of the algorithms has been
presented.  It is shown that a timeout algorithm consists of essentially
five layers or procedures which can be independently chosen and modified.
A number of timeout algorithms proposed in the literature have been
decomposed into these five layers.

    One of the key layers not discussed in the literature is that of
determining the sample round trip delay for packets that have been transmitted
more than once.  It is shown that this layer has a significant impact on the
network performance.

    Under repeated packet loss, most timeout algorithms either diverge
or converge to a wrong value.  A number of alternative schemes have been
presented.  It is argued that divergence is preferable to false converence.
It is a feature that is helpful in reducing network traffic during congestion.

    DEC-TR-329
    Version: August 28, 1985

- Geof

GROSSMAN@Sierra.Stanford.EDU (Stu Grossman) (01/22/87)

When I was at DEC, Raj Jain gave a paper on this subject, but it seemed to be
somewhat more refined than the work you mentioned.  Essentially, he was trying
to deal with congestion Control, not congestion Prevention.

The method he proposed went something like this:
1) Under conditions of no congestion, send as many packets as you want.

2) When congestion is first detected, reduce the number of outstanding
   (unacknowleged) packets to 1.

3) Increase the number of outstanding packets until either:
	a) the number of outstanding packets is equal to three times the
	   number of hops between you and your destination, or
	b) congestion is detected again, go back to step 2.

The number of outstanding packets is increased by one each time you have
successfully transmitted all of your previous outstanding packets without
detecting congestion.

I don't recall the particulars of how the congestion is detected.

This method was used in DECnet and did indeed make congested situations
much more palatable.  One of the primary cases where this algorithm helped
a lot was in the case where you had a gateway (pardon me, a DECnet router)
with interfaces to networks of differing speeds on each side (such as an
Ethernet and a 19.2kb line).

			Stu Grossman
-------

mckee@MITRE.ARPA (H. Craig McKee) (01/22/87)

Concerning the design of a filter to estimate rtt - I think 
you're trying to do too much "within" TCP.  ICMP source quench messages
are a good indication of network congestion.  X.25 has a RNR (Receiver
Not Ready) indication that may also be useful.

Regards - Craig