karn@petrus.UUCP (Phil R. Karn) (12/23/85)
Recently I've been thinking about ways to improve the performance of link level protocols on noisy links, and I came across a very interesting idea. In the paper "ALOHA Packet Broadcasting - A Retrospect", by Binder, et. al., published in the 1975 (10 years ago!) National Computer Conference, are the following interesting sections: [KA9Q Note: the "MENEHUNE was the ALOHANET's packet switch, analogous to our TNC. Its name is an obscure wordplay. The ARPANET people were calling their packet switches "IMPs" (Interface Message Processors), so the ALOHA people simply chose the Hawaiian word for "imp", as in "mischevious dwarf".] "Error control for broadcast channel data packets (MENEHUNE to user nodes) involves some special considerations. For efficient operation, the usual positive acknowledgment scheme in which the ACK's themselves are not acknowledged depends on a high probability of the ACK's being successfully received. However, an ACK sent from user nodes must compete with data traffic in the random access channel. At full channel loading each random access packet must be retransmitted an average of 1.7 times, which means each data packet or ACK must be sent a total of 2.7 times on the average before it is successfully received. But in order to force retransmission of the ACK's, the data packet being acknowledged must also be sent an average of 2.7 times by the MENEHUNE - even though it was received correctly the first time!.... [ka9q note: these numbers come from the fact that running a pure ALOHA channel at the optimum throughput level of 1/2e or 18.4% requires an "offered load" of 1/2 or 50%. I.e., the channel will have at LEAST one carrier on it 50% of the time, but only 18% of the time will there be ONLY one carrier. This means that each packet has a probability of (1/2e) / (1/2) = 1/e = 36.8% of making it across, and therefore packets must be sent on the average 1/(1/e) = e = 2.718 times to make it across once.] "...to eliminate the unnecessary repetitions of data packets from the MENEHUNE, it is also necessary to acknowledge the ACK. That is, the ACK sent by a user node is timed out and retransmitted until an acknowledgement for it is received, just as for data packets. If another packet is waiting for transmission to the node at this time, its transmission with the next sequence number constitutes the ACK to the ACK; otherwise, a short ACK-ACK packet is sent by the MENEHUNE. This can be easily shown to result in significantly less total channel overhead, at the expense of more complication in the node implementation." I was intrigued by this idea and decided to study it further. What follows is my initial analysis of the ACK-ACK protocol and how it compares to the current AX.25 protocol. In AX.25, ACKs are not acknowledged. This means that if the probability of getting a packet across a given channel is p, then on the average, A = 1/p transmissions of the ACK by the receiver will be necessary for the sender to get one back. To elicit 1/p ACKs from the receiver, the sender has to send D = 1/p * A = 1/(p**2) data packets. For example, if p = .5 (50% probability of getting a packet across the channel), then you can expect, on the average, A = 2 transmissions of the ACK and D = 4 transmissions of the data packet before sender can go on to the next data packet. If p = .25, then A = 4 and D = 16. Note I've assumed for sake of illustration that p is equal in both directions; in the "real world" a data packet is a bigger target than an ACK and is therefore even more likely to get clobbered. To show the effect of ACK-ACKs, I'll define a number N, the ratio between the sender's data-retransmission timer to the receiver's ACK-retransmission timer. We assume that this number is always greater than 1. (If it wasn't, the sender would always time out and resend the data before the receiver ever gets a chance to retransmit its ACK). N therefore represents the number of times the receiver will attempt to retransmit its ACK before the sender unnecessarily resends its data packet (which is what we'd like to avoid). The probability P of getting an ACK through at least once in N transmission attempts when each attempt has probability p of succeeding is P = 1 - (1 - p)**N Make N large enough, and P approaches 1 (i.e., the ACK will almost certainly get through.) Of course there will still be plenty of data packet retransmissions, simply because (1-p) of them will never make it to the receiver in the first place. So the total expected number of data packet transmissions D changes from 1/(p**2) in the present case to (1/p) * (1/P) with ACK-ACKs. Let's plug in some numbers and see what effect this has on D, the expected number of data packet transmissions. p = .5, N = 5, no ACK-ACK case: D = A * 1/p = 1/(p**2) = 1/(.5*2) = 4 same, with ACK-ACKs: P = .969 D = (1/p) * (1/P) = 2.06, a 49% decrease p = .25, N = 5, no ACK-ACKs: D = 16 same, with ACK-ACKs: P = .763 D = 5.24, a 67% decrease You may be asking "what about the extra overhead of the ACK-ACK"? Recall that a new data packet can serve as an ACK-ACK. So long as there is a steady stream of traffic, "standalone" ACK-ACKs are never transmitted; the next data packet provides that function. Only at the end of a traffic burst is an extra packet generated, and this seems a small price to pay to save all those unnecessary retransmissions. So clearly we have something here. The question is how to implement it. I am not certain that this technique can be adapted easily to LAPB, the connection-oriented upper sublayer of AX.25 Level 2. It may be possible, but it would take some work. LAPB was originally intended for highly reliable point-to-point links, and radical surgery may be necessary. In any event, LAPB is an unnecessarily complex link level protocol for a datagram network, so I already have a motivation for doing a lobotomy instead. Fortunately, there are "hooks" in AX.25 that allow us to dispense with LAPB and implement the ACK-ACK protocol while still conforming to the AX.25 addressing format. We need three types of packets to implement the ACK-ACK protocol: a data packet, a regular ACK packet (sent in response to the data packet) and an ACK-ACK packet (sent in response to the regular acknowledgment when no additional data is available). Here's one possible assignment: data packet - UI frame with or without Poll bit set regular ack packet - UA, with or without Final bit set ACK-ACK - empty UI frame with Poll bit cleared The poll/final bits allow either or both acknowledgments to be optional. For example, stations in a datagram network operating on high quality links with very low packet loss rates might elect to operate without the overhead of link level acknowledgments; in this case, each data packet is sent in a UI frame with the poll bit cleared. Similarly, when regular acknowledgments are requested (i.e., with the poll bit set in UI data frames), the receiver uses the final bit in the UA to indicate whether an ACK-ACK is requested. Since a new data packet can be sent in place of an ACK-ACK, making both out of UI frames simplifies the receiver implementation. In each case, setting the poll/final bit means that some sort of response is expected from the other side, which also simplfies the implementation. Notice that there are no sequence numbers, since a sliding window protocol such as LAPB is unnecessary on a half duplex channel. The protocol operates in a lock-step fashion; there is always a one-for-one relationship between a data packet and its acknowledgement, and this simplifies things enormously. Since packets cannot "cross in the mail" on a shared half-duplex channel, there is never any ambiguity as to which data packet an ACK refers to (note that this assumption is violated and the protocol breaks down in two situations: a long digipeater string where the end stations cannot hear each other directly, and on a long-delay satellite channel where the round trip delay exceeds the packet transmission time. Digipeaters we're already getting rid of, and I'll worry about high speed satellite channels when we get one.) The only "state" required in the protocol is a buffer for the most recently sent or received packet. (The transmitter may need to resend it and the receiver needs to detect and filter out duplicates transmissions). However, once a packet has been sent and fully acknowledged, no state needs to be maintained, a MAJOR advantage when servicing a large number of mostly idle local "links" with limited memory. Comments and suggestions on this idea are welcome. 73, Phil Karn, KA9Q
koning@koning.DEC (Paul Koning -- LAS Engineering) (03/13/86)
On 23-Dec-1985, Phil Karn (KA9Q) posted a paper about ACK acknowledgement as a way to improve performance on noisy channels. After thinking about the details for a while, I feel that the analysis is inaccurate and that there is in fact little benefit from the proposal; it seems that the ALOHAnet people didn't think things through all the way. Here's why: The basic notion is that one should not retransmit Data messages if it is not necessary to do so. The main reason is that data messages are large and it is better to transmit small control messages instead. There are two cases where the sender of a data message fails to receive an ACK: 1. The data message was lost. In this case, the receiver never sent the ACK, and the optimal recovery is to resend the data. 2. The data message was received but the ACK was lost. In this case, the optimal recovery avoids resending the data. It's pretty obvious (and Phil shows the details) that detecting the second case as a special case is worthwhile IF you assume that the two cases have equal probability. Phil also points out that they in fact do not have equal probability but doesn't follow this through. Actually, the probability of a packet being in error is approximately proportional to the packet size, since the BIT error rate is constant. Ignoring digipeaters, an ACK is 19 bytes long and a full-size data packet is 276 bytes long. This means the probability of case 1 is more than ten times that of case 2. So for more than 9 out of 10 lost ACKs, the recovery as specified in AX.25 is in fact the optimal one. This is the worst case analysis, of course. If you have digipeaters (which are a declining feature) or shorter data packets, ACKs are not as small compared to the data packets, and the lost-ACK case makes up a larger fraction, but rarely would it be even close to 50% of the total. So in short, I'm not convinced this is a case worth worrying about. But if it is, there is a better alternative than ACK acking. Here's why: ACK acking requires both the sender and receiver to do timeouts. The receiver has to keep a timeout >= the round trip delay -- this is used to generate retransmitted ACKs. The sender has to use a timeout which is at least twice or three times the receiver's timeout since otherwise the sender would retransmit the data before the receiver gets a chance to retransmit the ACK. The consequence of having the sender use this extra long timer is that in the lost data packet case (which is the majority case, as discussed above) the sender waits far longer than optimal before retransmitting. Another problem is that there are two timers, one at each station, whose values are tied together; best performance requires that the one is somehow maintained as 2 or 3 times the other so long as the connection is up. An alternative approach is the one used in DDCMP. Here, the receiver does not have to keep a timer. When the sender fails to receive an ACK for its data packet, rather than resending the data packet it sends a special control packet (REP, for "Reply requested"). When the receiver gets the REP, it responds by resending the latest ACK. The sender then can decide which data packets (if any) have to be retransmitted. This alternative is better because it does not require the sender to keep an artificially long timeout. The sender can use the optimal timeout (slightly greater than the round trip delay). Also, it is simpler since the receiver does not need to maintain a timer of its own. On a slightly different subject... Near the end of his message, Phil describes an alternative protocol that does not use sequence numbers. I agree that the sliding window protocol as in AX.25 is excessively complex (indeed AX.25 is far too complex in lots of places and yet has some serious omissions, but that's for another note). But you can't build a correct protocol without sequence numbers if you use retransmission for error recovery. As a minimum, you need a one-bit sequence number, as is done in BISYNC. This is necessary because otherwise you can't distinguish these two cases: 1. A sends data packet 1, B acks it, A receives ack, sends data packet 2 2. A sends data packet 1, B acks it, ack is lost, A times out and retransmits data packet 1. Note that ACK acking doesn't eliminate the problem since case 2 still applies if 2 or 3 acks in a row are lost and A eventually times out anyway. Phil appears to suggest that you can tell the two apart by having the receiver keep a copy of the last received data packet and comparing this with the next packet. Apart from the fact that this is costly (having to do a string comparison on each packet) it doesn't work because it is perfectly legal to send two consecutive packets with the same data fields. So minimally you need 1 bit in each direction for a sequence number. AX.25 uses 3 bits, which really makes no difference; the amount of state to be maintained in AX.25 per connection is no different (give or take a byte or two) from that required for the "bare bones" protocol. The reason to argue against AX.25 is not the amount of state info required but rather the unnecessary complexity and the errors in the protocol. 73, paul koning, pa0pkg/w1
karn@petrus.UUCP (Phil R. Karn) (03/19/86)
This is in response to Paul Koning's article of 13 March 1986 commenting on my posting in December on an experimental new link layer protocol. That posting represented my ideas at a very early stage; WB6RQN and I wrote a paper for the 5th ARRL Computer Networking Conference that incorporates a revised version of the protocol as part of a larger paper on link level performance issues. Some of your questions and/or objections are answered in the final paper; it's available as <chepponis>ack-ack.ms (ms macro source) or <chepponis>ack-ack.txt (printable output) on c.cs.cmu.edu; use anonymous FTP. You are correct that my assumption of equal loss probability for data and ACK packets is probably too optimistic. However, I do not agree that "there is in fact little benefit from the proposal". Most packets are lost due to collisions, not random thermal noise, so the relationship between packet size and loss probability is not as clear. Monitoring the local packet radio frequency reveals a LOT of unnecessary data packet retransmissions, for whatever reason. ACK-ACK would clearly help in this case. On the other hand, given a choice I would certainly prefer to have channels so good that ACK-ACK wasn't needed... Your alternate proposal (the Reply Requested control packet) is already in AX.25 Version Two in the form of the Poll/Final recovery procedures. However, I think that in the very lossy environment for which my protocol is designed, receiver ACK retransmission is more efficient because the sender wouldn't have to send anything to elicit the ACK retransmission. (Remember that the poll can itself be clobbered). My goal in all of this has been to minimize the amount of channel time needed to transfer a given amount of data; since we operate on a shared channel, I consider the delay as seen by one user to be much less important than optimizing the efficiency of the channel as a whole. It ought to be pretty obvious that our protocol was motivated by the desire for a simpler and more efficient means of relaying IP datagrams. The host-host (transport level) protocols (e.g., TCP) used atop IP are designed to reject duplicates created by retransmissions within the network. However, a little analysis shows that the number of duplicates would snowball (i.e., increase exponentially) as a packet propagates across a series of links, and performance would be pretty bad. Therefore, the published version of the protocol includes an "ID" field whose purpose is to detect and filter out the majority (but not necessarily all) duplicates caused by link layer retransmission. Paul, thanks very much for your thoughtful note. Brian and I would be very interested in any further comments after you've had a chance to look over the revised paper, as it contains some further refinements and simplifications I haven't mentioned here. We'd also be interested in hearing your comments on AX.25. 73, Phil Karn, KA9Q