tcp-ip@ucbvax.ARPA (06/12/85)
From: zhang%erlang.DEC@decwrl.ARPA (Lixia Zhang) I have a few questions concerning setting timers for outstanding TCP segments. The following paragraph from the TCP spec (rfc-793, page 10) sounds like that a timer should be set for each outstanding segment: "When the TCP transmits a segment containing data, it puts a copy on a retransmission queue and starts a timer; when the acknowledgment for that data is received, the segment is deleted from the queue. If the acknowledgment is not received before the timer runs out, the segment is retransmitted." My questions are: - Should we understand it as one timer need be set for each outstanding segment? - What are the situations in real implementations? From what I've heard, most implementations use a single timer per connection - am I right? I sort of remember this was mentioned in a recent msg that some implementations set a timer for each segment, some others use a single timer. What is the population percentage for each side? - Does anyone have any idea, or observations, about the performance difference, big or small, between the two different implementations? Lixia
tcp-ip@ucbvax.ARPA (06/12/85)
From: David C. Plummer in disguise <DCP@SCRC-QUABBIN.ARPA> Date: Wednesday, 12 Jun 1985 06:54:15-PDT From: zhang%erlang.DEC@decwrl.ARPA (Lixia Zhang) I have a few questions concerning setting timers for outstanding TCP segments. The following paragraph from the TCP spec (rfc-793, page 10) sounds like that a timer should be set for each outstanding segment: "When the TCP transmits a segment containing data, it puts a copy on a retransmission queue and starts a timer; when the acknowledgment for that data is received, the segment is deleted from the queue. If the acknowledgment is not received before the timer runs out, the segment is retransmitted." My questions are: - Should we understand it as one timer need be set for each outstanding segment? - What are the situations in real implementations? From what I've heard, most implementations use a single timer per connection - am I right? I sort of remember this was mentioned in a recent msg that some implementations set a timer for each segment, some others use a single timer. What is the population percentage for each side? - Does anyone have any idea, or observations, about the performance difference, big or small, between the two different implementations? Our experience at Symbolics says that a per-connection timer and only retransmitting the FIRST segment on the retransmission queue is sufficient. In a mostly-reliable network medium, most segments do get through and are then acknowledged. If one does get through, it winds up at the front of the retransmission queue. The other segments on the retransmission queue ususlly got through, but can't be acknowledged until the first segment is acknowledged. When the first segment DOES get through, the whole batch is often acknowledged.
tcp-ip@ucbvax.ARPA (06/13/85)
From: imagen!geof@su-shasta.ARPA The first packet in the retransmission queue is the important one for retransmission purposes, as Dave Plummer points out, since it is the first packet that will be acked (and removed from the retrans. queue). Dave's point about only retransmitting the first packet from the retransmission queue is interesting. I would be stronger than his statement that this is "good enough" -- if a host always retransmits everything on the retransmission queue, performance could be drastically affected in certain situations. Consider, for example, a host that is transmitting through a gateway, and can send packets faster than the gateway can forward them (perhaps a gateway from Ether->Arpa). Eventually, the gateway runs out of buffers and starts to miss packets. It is not uncommon to see a situation, for example, where a gateway loses the fourth packet from every batch that is blasted out by a particular host. This sort of lossage can be seen at the destination host as well. It is a common cause of lost connections when a gateway is fragmenting packets (in reverse of the above: the fast little gateway sends two fragments very close together, and the big, slower host always misses the second). TCP performance can be affected similarly. Dave's comment that retransmitting the first packet often results in everything being acked seems to fit this model. The first packet on the retransmission queue (the first one that the foreign host didn't ack) is probably the single packet that was lost. [Interestingly, Xerox' XNS SPP implementation also retransmits only the first packet on the queue, for similar reasons (they have a serial-line gateway with one packet buffer in it).] Unfortunately, retransmitting only the first packet on the retransmission queue, while it works, also has performance problems. If the foreign host didn't lose just one packet, but lost a whole string of them, TCP degenerates to a lock step (cf. TFTP) protocol for the rest of the string. Over a very long-haul connection (e.g., satellite), this can cause a delay of seconds every time a packet is lost. Maybe in practise the number of packets lost in a row is statistically close enough to 1 that this is not a problem. I don't know a real answer to this problem (maybe someone else does...) other than flow control (congestion control?) between a host the gateway(s) that it is using on a particular connection (which is not really possible in the TCP world -- say, wasn't someone asking if there were any supporting arguments for that anti-TCP article in last month's SigComm review?:-)). Perhaps one might arrange to heuristically determine in the sender that the Nth packet is reliably being lost, and throttle back on the inter-packet time accordingly. Sounds complicated. - Geof
tcp-ip@ucbvax.ARPA (06/13/85)
From: CLYNN@BBNA.ARPA Lixia, Working with the DARPA Protocol Suite on TOPS20s has lead to the same conclusions that Dave has mentioned. I think that Geof's statement that "performance could be drastically affected" is weak - it is drastically affected, especially when the retransmitter does not have a very good round-trip-time algorithm. A refinement of only sending the "first packet" in the queue is to only retransmit one packet - the "first packet" plus as much additional data from other packets as will fit into a maximum sized segment (repacketization). This is effective in reducing character per round-trip-time echoing in telnet applications (dribble echo) since most original packets have few data octets and most (if not all) of the outstanding data will fit into a maximum sized segment. As Geof points out, it does present problems if the packets to be retransmitted are all full size; more below. The statement that when a packet is retransmitted, it plus all the others that made it the first time will be acked is true some of the time. It fails when the receiver is packet-event driven, i.e., the receiver responds to each packet it processes. In such a case it will ack the retransmitted packet, then proceed to reassemble the other packets from its reassembly queue, acking each one in turn. The sending host then sees several acks arriving instead of one. If it has set a timer for each packet in the retransimssion queue, the timer for the second packet has probably already gone off. Consequently, when the ack for the packet which was retransmitted arrives, the second packet becomes the first and is retransmitted - frequently just before the ack arrives. Retransmission of everything which is outstanding can make performance worse, due to the interaction of factors, e.g., window size, available data, round-trip-time algorithm, gateways, etc. A large window with lots of data causes, for example, the x implementation on the x, to send as many packets as fast as possible. Many gateways cannot handle the large burst of packets, so discard a few, pass a few, drop a few more, etc. When the retransmission timer goes off, everything in the queue is retransmitted - some packets get through and a few more are dropped. The lost packets seem to make the round-trip-time estimate grow very large, so throughput goes way down (wait a minute, send a flood, wait a minute ...). If the recieving host can figure out what is happening it can "discourage" such behavior by using the window to limit the number of packets so that the gateway will not be swamped (source quench doesn't seem to be well enough defined and widely enough implemented to be effective). In practice, closing the window to just cover the gap of lost packets and delaying the ack until the reassembler has to stop seems to help a lot. (Of course, the sending host should attempt to limit its rate of packet generation - delays between packets (or fragments in a gateway), maximum number of outstanding packets (possibly a function of what had to be retransmitted), etc.) Add TOPS20s to your survey under the "single timer, retransmits a single repacketized packet, with delayed ack" column. Charlie
tcp-ip@ucbvax.ARPA (06/13/85)
From: "J. Noel Chiappa" <JNC@MIT-XX.ARPA> The TCP I did for Bridge (the one used in the CS-1/T terminal concentrator) used the same strategy, and for exactly the same reasons. It only kept a single timer on the oldest data. On timeout, it sent up to one full packet of un-ack'd data. So yet another system in that column. (I'm not sure if it still does this, since they changed things around and ripped some stuff they didn't udnerstand the use of, like subnet masks, out.) This discussion brings up an interesting point, which is that on all except the slowest lines network traffic control wants to deal in units of packets, not bytes, since most overhead is per packet. Currently TCP is byte oriented because of the window and flow control; we need to have 'conciousness raising' for higher level protocol implementations to orient them to this aspect of IP (if and when IP ever gets traffic control). Noel -------
tcp-ip@ucbvax.ARPA (06/15/85)
From: CERF@USC-ISI.ARPA Lixia, I can't offer any statistics on other implementations, but one would expect that a timer for the segment at the head of the queue for any paritcular connection would be sufficient. If that segment times oout, you retransmit it and perhaps you look to see if any other segments on the queue have timed out. When the acknowledgement is received, you reset the timer for the next segment in the queue if it has not also been acknowledged. The calculation of the proper timeout in a variable delay environment is a challenge. Perhaps Dave Mills has the most experience in devising ways to cope with varying transmission and propagation delays. Others have experimented with solutions to this problem and I trust they will also respond to your query. You might also read some of Dave Clark's notes to implementors on the subject of handling retransmission. Vint Cerf
tcp-ip@ucbvax.ARPA (06/15/85)
From: MILLS@USC-ISID.ARPA In response to the message sent 15 Jun 1985 06:40-EDT from CERF@USC-ISI.ARPA Vint, The problem of estimating the mean of the roundtrip-delay random variable was discussed in RFC-889. The so-called "RSRE algorithm," which we have been using for several years, provides only a single sample per roundtrip interval, which is appropriate for retransmission policies in which only the first wadge on the retransmission queue is retransmitted. However, convergence to a good mean-estimate is accellerated if you keep separate timers for each segment originally transmitted and update the estimate with a new sample as the ACK-sequence number passes by the first octet of the corresponding segment, even if the segment timer isn't used for anything else. Under conditions where many segments can be in flight, the estimate is very much improved. The estimate can be improved further using a nonlinear smoothing algorithm, as discussed in RFC-889. All this horsepower was found necessary for a pair of fuzzballs to grumble to each other via a transatlantic-cable link using a statistical multiplexor. The delay variance on that circuit you wouldn't believe! Dave -------