van@LBL-CSAM.ARPA.UUCP (01/17/87)
Boy, was I wrong! I've had about 100 messages in the past couple of days saying this discussion should stay on the list. But this opus should convince the diehards that they made a mistake... Dave - You're absolutely right. Linear filters, including Kalman filters, do a lousy job on estimating round trip time (witness all the spurious retransmissions on our poor, congested Internet). And median filters, because they have very high noise immunity and contain no assumptions about the underlying data distribution, should do a great job estimating r.t.t. But, (since I'm running 0-for-3, I have to say "But...") I was thinking of estimating something slightly different from r.t.t. I pictured the "filter" estimating the coefficients of a mathematical model that describes how a network works. Then tcp would use those coefficients to decide what to do when sending and retransmitting. Since I didn't even come close to explaining this in the last message, I'll try here. (The simple-minded explanations illustrate my ignorance; they're not intended as an insult anyone's intelligence.) What's the problem? If you don't get an ack for a packet within some reasonable period of time, you will have to retransmit. There are two possible reasons for not getting an ack and they lead to diametrically opposed retransmit strategies: 1) The packet was damaged or lost in transit. In this case you want to retransmit the packet as soon as you can because throughput goes down linearly while you wait. Or to put it another way, in this case throughput will increase linearly with retransmit rate. 2) The packet was discarded because of congestion at a gateway. In this case you want to wait a while before retransmitting to give the congestion a chance to clear. In this case throughput will decrease exponentially with retransmit rate. "No ack" conveys exactly one bit of information (that a packet was lost) and there's no way we can squeeze another bit out to tell us if this is case 1 or 2. But, if acks are not independent of one another, we might squeeze something out of pattern of acks (the ack "time series") to help us decide between cases. What information can you pull out of the time series? I'm about to play "mathematician", so I'd better define some notation. Lets say P(i) is the send time of the i'th packet, A(i) is the arrival time of the i'th ack, and R(i) = A(i)-P(i) is the round trip time of the i'th packet. All times are measured at the sender. All packets are the same size and the receive window is a constant W packets (packets i through i+W-1 can be sent prior to A(i) or, equivalently, P(i+W) >= A(i)). The time to generate and consume packets is small compared to the network propagation times (the sender will immediately send a packet when the window opens and the receiver will immediately ack a packet when it arrives). It's pretty clear that our traffic will introduce correlations in ack time series. This is because A(i), the ack for packet i, is related to the send of i by A(i) = P(i) + R(i) but P(i) was sent when the window opened, i.e., when the ack for i-W arrived. So P(i) = A(i-W) and A(i) = A(i-W) + R(i) This is what time-series people would call an "auto-regressive" or AR model. It's the simplest case of an AR(W) model, one where current events are related to events W steps in the past with R(i) adding some randomness so you don't get complacent. [The nice thing about time series models is that some smart people have devised cookbook ways for a complete idiot to construct them (a topic called "system identification"). For example, "Time-series analysis: forecasting and control" by Box and Jenkins leads you by the hand from `guessing possible models from inspection of sample data' through `picking the best model to describe the data' to `using the model once you have it'.] It's also pretty clear that the R(i) time series will have internal correlations, at least under congestion. You can view the round trip time as being proportional to the amount of "work" the network is doing (i.e., the number of packets ahead of us in the gateway queues). The definition of congestion is that we aren't done handling the work from time i-1 when the new work for time i arrives. The previous statements define an AR(1) model for round trip time: R(i) = a R(i-1) + d + e(i) where "a" is the fraction of work carried over from the last step, "d" is the fixed, minimum, end-to-end propagation delay and "e" is a random "noise" term to account for new traffic entering and internal state changes (like adaptive routing). Extrapolating and waving my hands, I was going to make a model something like R(i+1) = b (P(i) - P(i-1)) + a R(i) + e(i) + d where "b" estimates how much our traffic rate affects the r.t.t., "a" estimates how much residual work in the network affects the r.t.t. and the variance of "e" estimates how much random fluctuations contribute to the r.t.t. (for people who like acronyms, I think this is called an ARMAX model - Auto-Regressive, Moving Average with eXogenous inputs). The filter would estimate a, b and var(e). (The preceding sounds impressive and difficult but it's not. The whole thing consists of about 10 lines of arithmetic to update 6 numbers which you do each time an ack arrives. With the help of a good introductory text, like Peter Young's "Recursive Estimation and Time-Series Analysis", the arithmetic even makes sense in a twisted sort of way.) So What? The model should get us at least three ways to choose retransmit strategy: If a is large or da/di is positive, the network can't handle its load and loss is probably congestive. If e is small or de/di is negative, the network is probably congested (this is a consequence of the P-K theorem: as the network nears saturation, "random" external perturbations have less effect on it. Intuitively, the variance of e is a measure of the random way traffic arrives at and flows through the network. But at saturation you can no longer see these random arrivals: Data arrives and sits in queues while some poor, constipated, bottleneck gateway chugs on its backlog. The whole network behavior reduces to the (deterministic) behavior of this gateway). Finally, if db/di has the same sign as dR/di (i.e., sending packets faster lowers throughput) the network is probably congested. There's a bit more information in the model. "a" is a measure of how fast congestion is growing so it tells you how much to change your packet generation rate to help damp out the congestion. In the case of loss by damage, var(e) tells you how long to wait before retransmitting. E.g., say you're the only user of a point-to-point link with 1 sec. mean round trip time and the variation in the r.t.t. is +-50ms. The tcp spec says to wait 2 sec. before retransmitting but you can be damn near certain you need to retransmit after 1.2 sec. The model should say a = b = 0, d = 1 sec and var(e) = .1 sec. If you set your retransmit timer to d + 2var(e) instead of 2(d + var(e)), you won't do any more retransmissions and thoughput could improve by up to 50%, depending on the loss rate. There are many things in this that need to be checked by experimentation. One is the applicability of the network model. The whole idea is useless if someone has to do a new model identification when an IMP is added to a network. My hope is that the model is "generic"; that anything serving the function of "computer network" has a first-order description of this form. The parameters will change from network to network but those are estimated dynamically by the filter IF the model form is close to correct. Another check is for the rate of convergence of the model. This will have been "an interesting academic exercise" if it takes 50 packets for the model to figure out the parameters. My past experience with these kind of filters (in control systems, not networks) has been that they converge incredibly quickly (3 to 5 samples) so I'm hopeful. And finally, to get back to the start of this whole thing: acks. Send-to-ack round trip time is the only way the sender has of probing the state of the network. If the receiver can do something random, acks only probe the composite state of network+receiver. If the receiver can go "maybe I'll generate 3 acks, just in case one gets lost", the information for the sender in an ack is enormously reduced. I think this is a protocol design issue. To do the type of modelling I've been talking about, the actions of the receiver have to be deterministic from the point of view of the sender. (Note that this prohibits things like delayed acks: the receiver's behavior is determined by the packet interarrival times, something the sender cannot know. Since delayed acks are a good thing on a local net, I'll modify the opening statement to "the receiver's behavior should be deterministic when the transit delay is long".) In the case of tcp, I'd like the receiver's behavior to be "one packet in, one ack out", just because I don't know how to model anything more complicated. Congratulations to anyone who got this far. I'm off to Usenix for a week but there will be a test over this material a week from Monday ;-). - Van
Mills@LOUIE.UDEL.EDU.UUCP (01/17/87)
Van, Oh, yummy. Erstwhile grad student Mike Minnich here has been casting for a dissertation topic. I think he just hooked his fish. It turns out to be reasonably easy to capture real data to test various estimators, including those you suggest, using the various fuzzballs scattered over the Internet (some in positions of power). The easiest way of doing that is with ICMP Echo messages with controlled spacing and size. The PING test program can generate these messages under several different scenarios and collect the RTTs in a file for later analysis. It would not be hard to add to that program a scenario similar to that you suggest. I will include that on Mike's plate of fish. I already have a bunch of data collected under several scenarios and would be glad to make it available to any who ask. The data include the test runs described in RFC-889 plus the data collected recently and reported to this list. All we need are some gullible number-cruncher hackers (having no esthetic principles or agendae at all, I just hack in BASIC) willing to read the texts you suggest (plus maybe H. van Trees' book(s) - you see how old I am) and troll their own fishlines. One of the things that struck me when I was experimenting with algorithms similar to that suggested by Nagle was the enormous impact of the piggyback mechanism with TELNET and remote echo. This affects the time series in interesting ways and suggests a simplification where the traffic flows are assumed symmetric. Otherwise, I think you may have to split your R(i) into two components, one going and the other coming back. Dave
geof@decwrl.DEC.COM@imagen.UUCP (Geof Cooper) (01/19/87)
I did some research on timers in TFTP at MIT in 1983, with Karen Sollins and John Romkey. The problem at hand was to develop a strategy that made TFTP work on both 1200 baud lines and ethernets. We had time for exactly one test, and it worked in that test. After that I left MIT and no one else was interested in tuning the algorithm further. I don't have a SCRIBE implementation that is willing to output this thing without a fuss. I hesitate to send the note to the entire list, although it is short. Perhaps someone would like to be mailed a copy and act as an FTP gateway for the rest. The algorithm is different from one previous to it in that it used something similar to the "timeline" strategy that you outlined. In this case, I used the number of retransmissions of successive packets as an indicator. I concentrated on avoiding congestion and not on recovering from lost packets. The algorithm is interesting in that it is two algorithms -- one is invoked if everything is going well, and the other is invoked if recent history suggests that congestion is going on. The basic idea was to "get the hell out of there" when you detected that you were sending multiple retransmissions of each packet. In this case, the algorithm increases as N^2 with each successive sample. The algorithm comes down using a 2-element linear filter [(this+last)/2]. Measurements of the roundtrip time are made only when no retransmissions occur. This is the only way that the roundtrip timer can be tightened. The N^2 backoff can happen any time. The following is a rough outline of the algorithm. Recall that in TFTP, only one data packet can be outstanding. This packet is retransmitted until an ACK is received and then the next packet is sent. Thus, it is equivalent to a sliding window protocol with window size 1, where each ACK opens the window. The particular ACKing strategy is not important, except that at least one ACK must be generated for each data packet received. NT means Number of Transmissions of the current data packet. (NT = 1) => packet sent once. When an ACK is received execute: MeasuredRoundTripTime := TimeReceivedFirstAck - TimeSentFirstData; if last_NT = 1 then K := K_Init; if NT = 1 then RountripTime := ( MeasuredRoundTripTime + RoundTripTime ) / 2 else begin if last_NT > 1 then K := K + K_incr; RoundTripTime := RoundTripTime + K end; last_NT = NT; NT := 0; When sending a packet, execute: RetransmitTime = (1.5 * RoundTripTime) * 2^NT; NT := NT + 1; Initial values are chosen to overestimate the roundtrip time. ====================== Also, Raj Jain (who at least was then) of DEC did some work on retransmission strategy. He did some simulations while at MIT's lab for computer science, and wrote a paper on the results. I have a draft version of the paper. It was to be published, but I don't remember where, and I never saw it come by in print. Maybe I missed it. Here is the abstract: DIVERGENCE OF TIMEOUT ALGORITHMS FOR PACKET RETRANSMISSIONS Raj Jain DEC Hudson HLO2-3/N03 The problem of adaptively setting the timeout interval for retransmitting a packet has been discussed. A layered view of the algorithms has been presented. It is shown that a timeout algorithm consists of essentially five layers or procedures which can be independently chosen and modified. A number of timeout algorithms proposed in the literature have been decomposed into these five layers. One of the key layers not discussed in the literature is that of determining the sample round trip delay for packets that have been transmitted more than once. It is shown that this layer has a significant impact on the network performance. Under repeated packet loss, most timeout algorithms either diverge or converge to a wrong value. A number of alternative schemes have been presented. It is argued that divergence is preferable to false converence. It is a feature that is helpful in reducing network traffic during congestion. DEC-TR-329 Version: August 28, 1985 - Geof
GROSSMAN@Sierra.Stanford.EDU (Stu Grossman) (01/22/87)
When I was at DEC, Raj Jain gave a paper on this subject, but it seemed to be somewhat more refined than the work you mentioned. Essentially, he was trying to deal with congestion Control, not congestion Prevention. The method he proposed went something like this: 1) Under conditions of no congestion, send as many packets as you want. 2) When congestion is first detected, reduce the number of outstanding (unacknowleged) packets to 1. 3) Increase the number of outstanding packets until either: a) the number of outstanding packets is equal to three times the number of hops between you and your destination, or b) congestion is detected again, go back to step 2. The number of outstanding packets is increased by one each time you have successfully transmitted all of your previous outstanding packets without detecting congestion. I don't recall the particulars of how the congestion is detected. This method was used in DECnet and did indeed make congested situations much more palatable. One of the primary cases where this algorithm helped a lot was in the case where you had a gateway (pardon me, a DECnet router) with interfaces to networks of differing speeds on each side (such as an Ethernet and a 19.2kb line). Stu Grossman -------
mckee@MITRE.ARPA (H. Craig McKee) (01/22/87)
Concerning the design of a filter to estimate rtt - I think you're trying to do too much "within" TCP. ICMP source quench messages are a good indication of network congestion. X.25 has a RNR (Receiver Not Ready) indication that may also be useful. Regards - Craig