karn@petrus.UUCP (Phil R. Karn) (12/23/85)
Recently I've been thinking about ways to improve the performance of
link level protocols on noisy links, and I came across a very interesting
idea.
In the paper "ALOHA Packet Broadcasting - A Retrospect", by Binder, et. al.,
published in the 1975 (10 years ago!) National Computer Conference, are the
following interesting sections:
[KA9Q Note: the "MENEHUNE was the ALOHANET's packet switch, analogous to
our TNC. Its name is an obscure wordplay. The ARPANET people were calling their
packet switches "IMPs" (Interface Message Processors), so the ALOHA people
simply chose the Hawaiian word for "imp", as in "mischevious dwarf".]
"Error control for broadcast channel data packets (MENEHUNE to user nodes)
involves some special considerations. For efficient operation, the usual
positive acknowledgment scheme in which the ACK's themselves are not
acknowledged depends on a high probability of the ACK's being successfully
received. However, an ACK sent from user nodes must compete with data
traffic in the random access channel. At full channel loading each
random access packet must be retransmitted an average of 1.7 times,
which means each data packet or ACK must be sent a total of 2.7 times on
the average before it is successfully received. But in order to force
retransmission of the ACK's, the data packet being acknowledged must
also be sent an average of 2.7 times by the MENEHUNE - even though it
was received correctly the first time!....
[ka9q note: these numbers come from the fact that running a pure ALOHA
channel at the optimum throughput level of 1/2e or 18.4% requires
an "offered load" of 1/2 or 50%. I.e., the channel will have at LEAST
one carrier on it 50% of the time, but only 18% of the time will there
be ONLY one carrier. This means that each packet has a probability of
(1/2e) / (1/2) = 1/e = 36.8% of making it across, and therefore packets
must be sent on the average 1/(1/e) = e = 2.718 times to make it across
once.]
"...to eliminate the unnecessary repetitions of data packets from the
MENEHUNE, it is also necessary to acknowledge the ACK. That is, the ACK
sent by a user node is timed out and retransmitted until an
acknowledgement for it is received, just as for data packets. If another
packet is waiting for transmission to the node at this time, its
transmission with the next sequence number constitutes the ACK to the
ACK; otherwise, a short ACK-ACK packet is sent by the MENEHUNE. This can
be easily shown to result in significantly less total channel overhead,
at the expense of more complication in the node implementation."
I was intrigued by this idea and decided to study it further. What follows
is my initial analysis of the ACK-ACK protocol and how it compares to
the current AX.25 protocol.
In AX.25, ACKs are not acknowledged. This means that if the probability of
getting a packet across a given channel is p, then on the average, A = 1/p
transmissions of the ACK by the receiver will be necessary for the sender to
get one back. To elicit 1/p ACKs from the receiver, the sender has to send
D = 1/p * A = 1/(p**2) data packets. For example, if p = .5 (50% probability
of getting a packet across the channel), then you can expect, on the average,
A = 2 transmissions of the ACK and D = 4 transmissions of the data packet
before sender can go on to the next data packet. If p = .25, then A = 4 and
D = 16.
Note I've assumed for sake of illustration that p is equal in both directions;
in the "real world" a data packet is a bigger target than an ACK and is
therefore even more likely to get clobbered.
To show the effect of ACK-ACKs, I'll define a number N, the ratio between
the sender's data-retransmission timer to the receiver's ACK-retransmission
timer. We assume that this number is always greater than 1. (If it wasn't,
the sender would always time out and resend the data before the receiver
ever gets a chance to retransmit its ACK). N therefore represents the number
of times the receiver will attempt to retransmit its ACK before the sender
unnecessarily resends its data packet (which is what we'd like to avoid).
The probability P of getting an ACK through at least once in N transmission
attempts when each attempt has probability p of succeeding is
P = 1 - (1 - p)**N
Make N large enough, and P approaches 1 (i.e., the ACK will almost certainly
get through.) Of course there will still be plenty of data packet
retransmissions, simply because (1-p) of them will never make it to the
receiver in the first place. So the total expected number of data packet
transmissions D changes from 1/(p**2) in the present case to (1/p) * (1/P)
with ACK-ACKs. Let's plug in some numbers and see what effect this has
on D, the expected number of data packet transmissions.
p = .5, N = 5, no ACK-ACK case:
D = A * 1/p = 1/(p**2) = 1/(.5*2) = 4
same, with ACK-ACKs:
P = .969
D = (1/p) * (1/P) = 2.06, a 49% decrease
p = .25, N = 5, no ACK-ACKs:
D = 16
same, with ACK-ACKs:
P = .763
D = 5.24, a 67% decrease
You may be asking "what about the extra overhead of the ACK-ACK"? Recall
that a new data packet can serve as an ACK-ACK. So long as there is
a steady stream of traffic, "standalone" ACK-ACKs are never transmitted;
the next data packet provides that function. Only at the end of a traffic
burst is an extra packet generated, and this seems a small price to pay to
save all those unnecessary retransmissions.
So clearly we have something here. The question is how to implement it.
I am not certain that this technique can be adapted easily to LAPB,
the connection-oriented upper sublayer of AX.25 Level 2. It may be possible,
but it would take some work. LAPB was originally intended for highly reliable
point-to-point links, and radical surgery may be necessary. In any event,
LAPB is an unnecessarily complex link level protocol for a datagram network,
so I already have a motivation for doing a lobotomy instead.
Fortunately, there are "hooks" in AX.25 that allow us to dispense with LAPB
and implement the ACK-ACK protocol while still conforming to the AX.25
addressing format. We need three types of packets to implement the ACK-ACK
protocol: a data packet, a regular ACK packet (sent in response to the data
packet) and an ACK-ACK packet (sent in response to the regular acknowledgment
when no additional data is available). Here's one possible assignment:
data packet - UI frame with or without Poll bit set
regular ack packet - UA, with or without Final bit set
ACK-ACK - empty UI frame with Poll bit cleared
The poll/final bits allow either or both acknowledgments to be optional. For
example, stations in a datagram network operating on high quality links with
very low packet loss rates might elect to operate without the overhead of
link level acknowledgments; in this case, each data packet is sent in a UI
frame with the poll bit cleared. Similarly, when regular acknowledgments are
requested (i.e., with the poll bit set in UI data frames), the receiver uses
the final bit in the UA to indicate whether an ACK-ACK is requested. Since a
new data packet can be sent in place of an ACK-ACK, making both out of UI
frames simplifies the receiver implementation. In each case, setting the
poll/final bit means that some sort of response is expected from the other
side, which also simplfies the implementation.
Notice that there are no sequence numbers, since a sliding window protocol such
as LAPB is unnecessary on a half duplex channel. The protocol operates in a
lock-step fashion; there is always a one-for-one relationship between a data
packet and its acknowledgement, and this simplifies things enormously.
Since packets cannot "cross in the mail" on a shared half-duplex channel,
there is never any ambiguity as to which data packet an ACK refers to
(note that this assumption is violated and the protocol breaks down in two
situations: a long digipeater string where the end stations cannot hear each
other directly, and on a long-delay satellite channel where the round trip
delay exceeds the packet transmission time. Digipeaters we're already getting
rid of, and I'll worry about high speed satellite channels when we get
one.)
The only "state" required in the protocol is a buffer for the most recently
sent or received packet. (The transmitter may need to resend it and the
receiver needs to detect and filter out duplicates transmissions). However,
once a packet has been sent and fully acknowledged, no state needs to be
maintained, a MAJOR advantage when servicing a large number of mostly idle
local "links" with limited memory.
Comments and suggestions on this idea are welcome.
73,
Phil Karn, KA9Qkoning@koning.DEC (Paul Koning -- LAS Engineering) (03/13/86)
On 23-Dec-1985, Phil Karn (KA9Q) posted a paper about ACK acknowledgement as a way to improve performance on noisy channels. After thinking about the details for a while, I feel that the analysis is inaccurate and that there is in fact little benefit from the proposal; it seems that the ALOHAnet people didn't think things through all the way. Here's why: The basic notion is that one should not retransmit Data messages if it is not necessary to do so. The main reason is that data messages are large and it is better to transmit small control messages instead. There are two cases where the sender of a data message fails to receive an ACK: 1. The data message was lost. In this case, the receiver never sent the ACK, and the optimal recovery is to resend the data. 2. The data message was received but the ACK was lost. In this case, the optimal recovery avoids resending the data. It's pretty obvious (and Phil shows the details) that detecting the second case as a special case is worthwhile IF you assume that the two cases have equal probability. Phil also points out that they in fact do not have equal probability but doesn't follow this through. Actually, the probability of a packet being in error is approximately proportional to the packet size, since the BIT error rate is constant. Ignoring digipeaters, an ACK is 19 bytes long and a full-size data packet is 276 bytes long. This means the probability of case 1 is more than ten times that of case 2. So for more than 9 out of 10 lost ACKs, the recovery as specified in AX.25 is in fact the optimal one. This is the worst case analysis, of course. If you have digipeaters (which are a declining feature) or shorter data packets, ACKs are not as small compared to the data packets, and the lost-ACK case makes up a larger fraction, but rarely would it be even close to 50% of the total. So in short, I'm not convinced this is a case worth worrying about. But if it is, there is a better alternative than ACK acking. Here's why: ACK acking requires both the sender and receiver to do timeouts. The receiver has to keep a timeout >= the round trip delay -- this is used to generate retransmitted ACKs. The sender has to use a timeout which is at least twice or three times the receiver's timeout since otherwise the sender would retransmit the data before the receiver gets a chance to retransmit the ACK. The consequence of having the sender use this extra long timer is that in the lost data packet case (which is the majority case, as discussed above) the sender waits far longer than optimal before retransmitting. Another problem is that there are two timers, one at each station, whose values are tied together; best performance requires that the one is somehow maintained as 2 or 3 times the other so long as the connection is up. An alternative approach is the one used in DDCMP. Here, the receiver does not have to keep a timer. When the sender fails to receive an ACK for its data packet, rather than resending the data packet it sends a special control packet (REP, for "Reply requested"). When the receiver gets the REP, it responds by resending the latest ACK. The sender then can decide which data packets (if any) have to be retransmitted. This alternative is better because it does not require the sender to keep an artificially long timeout. The sender can use the optimal timeout (slightly greater than the round trip delay). Also, it is simpler since the receiver does not need to maintain a timer of its own. On a slightly different subject... Near the end of his message, Phil describes an alternative protocol that does not use sequence numbers. I agree that the sliding window protocol as in AX.25 is excessively complex (indeed AX.25 is far too complex in lots of places and yet has some serious omissions, but that's for another note). But you can't build a correct protocol without sequence numbers if you use retransmission for error recovery. As a minimum, you need a one-bit sequence number, as is done in BISYNC. This is necessary because otherwise you can't distinguish these two cases: 1. A sends data packet 1, B acks it, A receives ack, sends data packet 2 2. A sends data packet 1, B acks it, ack is lost, A times out and retransmits data packet 1. Note that ACK acking doesn't eliminate the problem since case 2 still applies if 2 or 3 acks in a row are lost and A eventually times out anyway. Phil appears to suggest that you can tell the two apart by having the receiver keep a copy of the last received data packet and comparing this with the next packet. Apart from the fact that this is costly (having to do a string comparison on each packet) it doesn't work because it is perfectly legal to send two consecutive packets with the same data fields. So minimally you need 1 bit in each direction for a sequence number. AX.25 uses 3 bits, which really makes no difference; the amount of state to be maintained in AX.25 per connection is no different (give or take a byte or two) from that required for the "bare bones" protocol. The reason to argue against AX.25 is not the amount of state info required but rather the unnecessary complexity and the errors in the protocol. 73, paul koning, pa0pkg/w1
karn@petrus.UUCP (Phil R. Karn) (03/19/86)
This is in response to Paul Koning's article of 13 March 1986 commenting on my posting in December on an experimental new link layer protocol. That posting represented my ideas at a very early stage; WB6RQN and I wrote a paper for the 5th ARRL Computer Networking Conference that incorporates a revised version of the protocol as part of a larger paper on link level performance issues. Some of your questions and/or objections are answered in the final paper; it's available as <chepponis>ack-ack.ms (ms macro source) or <chepponis>ack-ack.txt (printable output) on c.cs.cmu.edu; use anonymous FTP. You are correct that my assumption of equal loss probability for data and ACK packets is probably too optimistic. However, I do not agree that "there is in fact little benefit from the proposal". Most packets are lost due to collisions, not random thermal noise, so the relationship between packet size and loss probability is not as clear. Monitoring the local packet radio frequency reveals a LOT of unnecessary data packet retransmissions, for whatever reason. ACK-ACK would clearly help in this case. On the other hand, given a choice I would certainly prefer to have channels so good that ACK-ACK wasn't needed... Your alternate proposal (the Reply Requested control packet) is already in AX.25 Version Two in the form of the Poll/Final recovery procedures. However, I think that in the very lossy environment for which my protocol is designed, receiver ACK retransmission is more efficient because the sender wouldn't have to send anything to elicit the ACK retransmission. (Remember that the poll can itself be clobbered). My goal in all of this has been to minimize the amount of channel time needed to transfer a given amount of data; since we operate on a shared channel, I consider the delay as seen by one user to be much less important than optimizing the efficiency of the channel as a whole. It ought to be pretty obvious that our protocol was motivated by the desire for a simpler and more efficient means of relaying IP datagrams. The host-host (transport level) protocols (e.g., TCP) used atop IP are designed to reject duplicates created by retransmissions within the network. However, a little analysis shows that the number of duplicates would snowball (i.e., increase exponentially) as a packet propagates across a series of links, and performance would be pretty bad. Therefore, the published version of the protocol includes an "ID" field whose purpose is to detect and filter out the majority (but not necessarily all) duplicates caused by link layer retransmission. Paul, thanks very much for your thoughtful note. Brian and I would be very interested in any further comments after you've had a chance to look over the revised paper, as it contains some further refinements and simplifications I haven't mentioned here. We'd also be interested in hearing your comments on AX.25. 73, Phil Karn, KA9Q