craig@loki.bbn.com (Craig Partridge) (10/30/86)
I'm working on an implementation of RDP and am trying to find ways to improve the round-trip time estimates. The timeout algorithm is the same as TCP's with the values suggested in RFC 889, but I've noticed that choosing the wrong initial value for the estimated round trip time can have a severe impact on throughput if the total number of packets is relatively small and the link is lossy. I'd like to improve that performance by choosing better initial values. This isn't something I know very much about so I'm soliciting advice. How do other people choose the initial value to put into the round trip estimate equations? What mechanisms do you recommend or strongly discourage or disparage? Craig Partridge CSNET Technical Staff
james@ZERMATT.LCS.MIT.EDU ("James William O'Toole, Jr.") (10/30/86)
Date: 30 Oct 1986 04:24-EST From: CERF@A.ISI.EDU Subject: Re: Setting Initial Round-trip time A background process might try to gather data - but this would by like pinging everyone just in case you might want to talk with them - leads to predictable disaster. A background process could more easily maintain a cache of round trip time data measured from recent traffic. Connections from a given host are probably concentrated on certain destinations, so you ought to be able to do much better than pinging. Of course, you still need to know which measurements to take and how to use them. Mean and variance of round trip time on a per host basis, with recent data more heavily weighted, perhaps?
braden@isi.edu (Bob Braden) (10/31/86)
Members of the END2END taskforce have also been worrying about your problem (setting an initial round-trip time). It is an important (but not usually critical) problem for connection-oriented protocols like TCP and RDP; it is more important and more difficult for a transaction- oriented transport protocol of the sort we have been stalking. As Vint says, a background process to gather data is a VERY bad idea. Nor do I think the hop count is at all useful as a predictor. However, it may not be a bad idea to maintain a cache of historical data about observed round trip times to hosts you have talked to recently. This may help in the case of a small "working set" of hosts you talk to. In the opposite case ... short transfers to a very large collection of randomly chosen hosts -- the answer may very well be that you cannot reasonably expect very good performance when nets are lossy, and should not waste time trying to obtain the impossible. And at system startup, before there is any historical data, you cannot do very well. Someone needs to think a little about the right way to maintain such a cache of historical RTT's per host, with respect to the way you maintain it (considering dispersion of observations), and how you use the data. Someone is going to say "You have to ask the gateways". Well, maybe, but that would seem to come some time after we are able to provide effective type-of-service routing over widely-diverse paths. Or maybe at the same time? Still, it wouldn't hurt for us to pose this requirement (an ICMP message from a host inquiring about the probable delays to a given Internet host) to Dave Mills' INARCH task force, and see what they can come up with. This requirement seems to go deep into the overall routing architecture of the Internet. Finally, our Internet Architect has a simple answer to your problem: don't use RDP, use NETBLT. NETBLT shares the principal advantages of RDP (packet-orientation and selective retransmission), but uses a different use of timers so that the RTT is not important. Of course, NETBLT has its own set of hard timer problems... Bob Braden
mckee@MITRE.ARPA (H. Craig McKee) (10/31/86)
You noted that the number of packets was small and the link was lossy. Is the link lossy because of congestion or because of noise? If the former (congestion) then the advice of John Nagle (RFC 896) is relevant. If the latter (noise) then more complex measures are needed; for example, a Session/Presentation layer Forward Error Correction procedure.
markl@JHEREG.LCS.MIT.EDU (10/31/86)
Don't use NETBLT. Not yet. NETBLT is an *experiment* in high-speed bulk data transfer. It works best over long-delay, high-bandwidth (i.e. satellite) networks, although it gives very good performance on othre networks as well. Although a proposed spec is available, we are actively discouraging people from implementing it until we decide whether or not the experiment works. We are currently testing NETBLT over the Wideband network with fairly good results, but a lot of work needs to be done tuning the spec before anyone can use NETBLT. Mark Internet: markl@jhereg.lcs.mit.edu MIT Laboratory for Computer Science Distributed Systems Group
braden@ISI.EDU (Bob Braden) (10/31/86)
The problem with the cache is that a) you don't know if the values correspond to the routes your traffic is/will take. b) you may not have enough traffic to enough destinations to maintain statistically valid (fresh) delay information. It may be possible that things will be static enough that cache will actually work well most of the time, but if the internet exhibits significant statistical variation in delay/throughput, the cache may be not only misleading but downright harmful. Vint, Everything you say is true, but I fail to see how bad cache information will be more harmful than no information at all. Bob Braden
van@LBL-CSAM.ARPA (Van Jacobson) (10/31/86)
I've got a little bit of hard data and a simple change that may improve things for 4.[23]bsd. About six months ago I instrumented our Internet gateway to record a timestamp, src & dst address and port and data & ack sequence numbers for each tcp packet. I reduced two days worth of this trace data and found . the distribution of number of data packets sent on a connection was roughly exponential with a mean of 4 (e.g., a lot of mail traffic). . between 7am and 5pm, PDT, the mean number of retransmissions per data packet was 8. The distribution was bi-modal, with traffic through the "mail bridges" in one lobe (with a mode of ~11) and all other traffic in the other lobe (a mode of ~2). Both lobes were approximately Poisson, possibly due to the long "learning time" of round trip timers on the connections. . For a particular connection, round trip times varied about an order of magnitude. Over all connections, round trip times varied three orders of magnitude. (With the large number of retransmissions, there is some ambiguity in which packets one uses to estimate RTT. I generally used the time from the first use of a sequence number to its first ack, with some ad hoc hackery to accomodate the 50% packet loss through the mail bridges. Given this ambiguity, the uncertainty in any RTT estimate is at least a factor of two.) . The next hop Internet gateway was strongly correlated with the RTT in the following sense: the average RTT of all packets through a gateway is within a factor of three of the average RTT of each connection through that gateway. It is not hard to convince yourself that no reasonable setting of the RTT initial value & filter constants is going to accomodate a factor of 1000 variation in four packets worth of learning time. I mentioned the problem to Mike O'Dell & he suggested using the kernel's routing entry to cache the RTT. I made up a kernel that kept an RTT in each route. When a TCP connection was opened, it initialized its RTT from the route. When the connection closed, its RTT was used to update the route's RTT with a weighted average Rt = A * Rt + (1-A) * Conn (the 4.3bsd kernel changes for all this amount to about 20 lines of C). I took 12 hours of trace data with the filter constant set to .5. The average number of retransmissions *for traffic that originated at the gateway* went down a factor of two (8 to 4). I was going to take more data and try tuning the connection and route filter constants. Unfortunately, some local political changes intervened and I can no longer make changes or take data on the gateway. However, the initial results were promising enough that I plan to try a similar scheme for machines that sit behind the gateway (i.e., construct pseudo-route entries to cache the RTT). - Van Jacobson, LBL
brescia@CCV.BBN.COM (Mike Brescia) (11/01/86)
Re: analogies ... like having a broken clock which is right twice a day rather than a clock which simply runs fast or slow and is never right. I will let you pick which analogy to apply! The analogy I wish to apply would be that neither a broken clock nor a miscalibrated clock will ever be right if I am trying to count apples or oranges. I'd like to attack the assumption that knowing the round trip time will compensate for the fact that packets are allowed to be dropped in the system. (Mom & Apple Pie division) In the current IP model, a packet may be delayed (10 to 100 seconds and more have been reported), or dropped because of a transmission failure or congestion at some gateway or packet switch. If a packet is delayed, there should be no retransmission because the second packet will only be delayed behind the first. If it is dropped due to transmission failure, the retransmission should be as soon as possible, so that the end-point hosts see a minimum disruption. If it is dropped due to congestion, the retransmission should be only as soon as you know the packet can get through or around the congestion, otherwise you are only exacerbating it. If you have arrived at a reasonable round trip time, and you have a packet which has not been acknowledged after (some factor of) that time, can you deduce which of the three things has happened to the packet? If you make the wrong decision, you can make things worse for yourself or the community of users. (Blue Sky division) If the Internet could provide a better guarantee of delivery, as once the Arpanet did, retransmission would not need to be so widespread, and a good measure of round trip time would not be so much of a panic. The Internet model would need to be extended so that the effects of transmission losses and congestion could be controlled. Mike
braden@ISI.EDU (Bob Braden) (11/03/86)
Van, Your message is wonderful. After years of our sitting around making Aristotelian speculations on this mailing list, you actually took some data, with fascinating results. Can we hope that you will write this data up? It is hard to see how we can make progress in this game unless results like yours get disseminated for peer comment and education. From my viewpoint, an RFC would be the right level... it would not have the formality or publication delays of a "real" paper, and would make this data and your ideas available as soon as possible. If you don't have time to write it up, could be persaude you to come to an appropriate task force meeting and present it? Bob Braden