[net.ham-radio.packet] Link layer thoughts

karn@petrus.UUCP (Phil R. Karn) (12/23/85)

Recently I've been thinking about ways to improve the performance of
link level protocols on noisy links, and I came across a very interesting
idea.

In the paper "ALOHA Packet Broadcasting - A Retrospect", by Binder, et. al.,
published in the 1975 (10 years ago!) National Computer Conference, are the
following interesting sections:

[KA9Q Note: the "MENEHUNE was the ALOHANET's packet switch, analogous to
our TNC. Its name is an obscure wordplay. The ARPANET people were calling their
packet switches "IMPs" (Interface Message Processors), so the ALOHA people
simply chose the Hawaiian word for "imp", as in "mischevious dwarf".]

"Error control for broadcast channel data packets (MENEHUNE to user nodes)
involves some special considerations. For efficient operation, the usual
positive acknowledgment scheme in which the ACK's themselves are not
acknowledged depends on a high probability of the ACK's being successfully
received. However, an ACK sent from user nodes must compete with data
traffic in the random access channel. At full channel loading each
random access packet must be retransmitted an average of 1.7 times,
which means each data packet or ACK must be sent a total of 2.7 times on
the average before it is successfully received. But in order to force
retransmission of the ACK's, the data packet being acknowledged must
also be sent an average of 2.7 times by the MENEHUNE - even though it
was received correctly the first time!....

[ka9q note: these numbers come from the fact that running a pure ALOHA
channel at the optimum throughput level of 1/2e or 18.4% requires
an "offered load" of 1/2 or 50%. I.e., the channel will have at LEAST
one carrier on it 50% of the time, but only 18% of the time will there
be ONLY one carrier. This means that each packet has a probability of
(1/2e) / (1/2) = 1/e = 36.8% of making it across, and therefore packets
must be sent on the average 1/(1/e) = e = 2.718 times to make it across
once.]

"...to eliminate the unnecessary repetitions of data packets from the
MENEHUNE, it is also necessary to acknowledge the ACK. That is, the ACK
sent by a user node is timed out and retransmitted until an
acknowledgement for it is received, just as for data packets. If another
packet is waiting for transmission to the node at this time, its
transmission with the next sequence number constitutes the ACK to the
ACK; otherwise, a short ACK-ACK packet is sent by the MENEHUNE. This can
be easily shown to result in significantly less total channel overhead,
at the expense of more complication in the node implementation."

I was intrigued by this idea and decided to study it further. What follows
is my initial analysis of the ACK-ACK protocol and how it compares to
the current AX.25 protocol.

In AX.25, ACKs are not acknowledged. This means that if the probability of
getting a packet across a given channel is p, then on the average, A = 1/p
transmissions of the ACK by the receiver will be necessary for the sender to
get one back. To elicit 1/p ACKs from the receiver, the sender has to send
D = 1/p * A = 1/(p**2) data packets. For example, if p = .5 (50% probability
of getting a packet across the channel), then you can expect, on the average,
A = 2 transmissions of the ACK and D = 4 transmissions of the data packet
before sender can go on to the next data packet. If p = .25, then A = 4 and
D = 16.

Note I've assumed for sake of illustration that p is equal in both directions;
in the "real world" a data packet is a bigger target than an ACK and is
therefore even more likely to get clobbered.

To show the effect of ACK-ACKs, I'll define a number N, the ratio between
the sender's data-retransmission timer to the receiver's ACK-retransmission
timer. We assume that this number is always greater than 1. (If it wasn't,
the sender would always time out and resend the data before the receiver
ever gets a chance to retransmit its ACK). N therefore represents the number
of times the receiver will attempt to retransmit its ACK before the sender
unnecessarily resends its data packet (which is what we'd like to avoid).

The probability P of getting an ACK through at least once in N transmission
attempts when each attempt has probability p of succeeding is

P = 1 - (1 - p)**N

Make N large enough, and P approaches 1 (i.e., the ACK will almost certainly
get through.) Of course there will still be plenty of data packet
retransmissions, simply because (1-p) of them will never make it to the
receiver in the first place. So the total expected number of data packet
transmissions D changes from 1/(p**2) in the present case to (1/p) * (1/P)
with ACK-ACKs. Let's plug in some numbers and see what effect this has
on D, the expected number of data packet transmissions.

p = .5, N = 5, no ACK-ACK case:
     D = A * 1/p = 1/(p**2) = 1/(.5*2) = 4
same, with ACK-ACKs:
     P = .969
     D = (1/p) * (1/P) = 2.06, a 49% decrease

p = .25, N = 5, no ACK-ACKs:
     D = 16
same, with ACK-ACKs:
 P = .763
     D = 5.24, a 67% decrease

You may be asking "what about the extra overhead of the ACK-ACK"? Recall
that a new data packet can serve as an ACK-ACK. So long as there is
a steady stream of traffic, "standalone" ACK-ACKs are never transmitted;
the next data packet provides that function. Only at the end of a traffic
burst is an extra packet generated, and this seems a small price to pay to
save all those unnecessary retransmissions.

So clearly we have something here. The question is how to implement it.

I am not certain that this technique can be adapted easily to LAPB,
the connection-oriented upper sublayer of AX.25 Level 2.  It may be possible,
but it would take some work. LAPB was originally intended for highly reliable
point-to-point links, and radical surgery may be necessary. In any event,
LAPB is an unnecessarily complex link level protocol for a datagram network,
so I already have a motivation for doing a lobotomy instead.

Fortunately, there are "hooks" in AX.25 that allow us to dispense with LAPB
and implement the ACK-ACK protocol while still conforming to the AX.25
addressing format. We need three types of packets to implement the ACK-ACK
protocol: a data packet, a regular ACK packet (sent in response to the data
packet) and an ACK-ACK packet (sent in response to the regular acknowledgment
when no additional data is available). Here's one possible assignment:

data packet - UI frame with or without Poll bit set
regular ack packet - UA, with or without Final bit set
ACK-ACK - empty UI frame with Poll bit cleared

The poll/final bits allow either or both acknowledgments to be optional. For
example, stations in a datagram network operating on high quality links with
very low packet loss rates might elect to operate without the overhead of
link level acknowledgments; in this case, each data packet is sent in a UI
frame with the poll bit cleared.  Similarly, when regular acknowledgments are
requested (i.e., with the poll bit set in UI data frames), the receiver uses
the final bit in the UA to indicate whether an ACK-ACK is requested. Since a
new data packet can be sent in place of an ACK-ACK, making both out of UI
frames simplifies the receiver implementation. In each case, setting the
poll/final bit means that some sort of response is expected from the other
side, which also simplfies the implementation.

Notice that there are no sequence numbers, since a sliding window protocol such
as LAPB is unnecessary on a half duplex channel. The protocol operates in a
lock-step fashion; there is always a one-for-one relationship between a data
packet and its acknowledgement, and this simplifies things enormously.
Since packets cannot "cross in the mail" on a shared half-duplex channel,
there is never any ambiguity as to which data packet an ACK refers to
(note that this assumption is violated and the protocol breaks down in two
situations: a long digipeater string where the end stations cannot hear each
other directly, and on a long-delay satellite channel where the round trip
delay exceeds the packet transmission time. Digipeaters we're already getting
rid of, and I'll worry about high speed satellite channels when we get
one.)

The only "state" required in the protocol is a buffer for the most recently
sent or received packet. (The transmitter may need to resend it and the
receiver needs to detect and filter out duplicates transmissions). However,
once a packet has been sent and fully acknowledged, no state needs to be
maintained, a MAJOR advantage when servicing a large number of mostly idle
local "links" with limited memory.

Comments and suggestions on this idea are welcome.

73,

Phil Karn, KA9Q

koning@koning.DEC (Paul Koning -- LAS Engineering) (03/13/86)

On 23-Dec-1985, Phil Karn (KA9Q) posted a paper about ACK acknowledgement
as a way to improve performance on noisy channels.
 
After thinking about the details for a while, I feel that the analysis
is inaccurate and that there is in fact little benefit from the proposal;
it seems that the ALOHAnet people didn't think things through all the way.
Here's why:
 
The basic notion is that one should not retransmit Data messages if it
is not necessary to do so.  The main reason is that data messages are
large and it is better to transmit small control messages instead.
There are two cases where the sender of a data message fails to receive
an ACK:
 
1. The data message was lost.  In this case, the receiver never sent
   the ACK, and the optimal recovery is to resend the data.
 
2. The data message was received but the ACK was lost.  In this case,
   the optimal recovery avoids resending the data.
 
It's pretty obvious (and Phil shows the details) that detecting the
second case as a special case is worthwhile IF you assume that the
two cases have equal probability.  Phil also points out that they in
fact do not have equal probability but doesn't follow this through.
 
Actually, the probability of a packet being in error is approximately
proportional to the packet size, since the BIT error rate is constant.
Ignoring digipeaters, an ACK is 19 bytes long and a full-size data
packet is 276 bytes long.  This means the probability of case 1 is
more than ten times that of case 2.  So for more than 9 out of 10
lost ACKs, the recovery as specified in AX.25 is in fact the optimal
one.
 
This is the worst case analysis, of course.  If you have digipeaters
(which are a declining feature) or shorter data packets, ACKs are
not as small compared to the data packets, and the lost-ACK case
makes up a larger fraction, but rarely would it be even close to
50% of the total.
 
So in short, I'm not convinced this is a case worth worrying about.
But if it is, there is a better alternative than ACK acking.
Here's why:
 
ACK acking requires both the sender and receiver to do timeouts.
The receiver has to keep a timeout >= the round trip delay -- this
is used to generate retransmitted ACKs.  The sender has to use a
timeout which is at least twice or three times the receiver's timeout
since otherwise the sender would retransmit the data before the receiver
gets a chance to retransmit the ACK.
 
The consequence of having the sender use this extra long timer is that
in the lost data packet case (which is the majority case, as discussed
above) the sender waits far longer than optimal before retransmitting.
 
Another problem is that there are two timers, one at each station,
whose values are tied together; best performance requires that the
one is somehow maintained as 2 or 3 times the other so long as the
connection is up.
 
An alternative approach is the one used in DDCMP.  Here, the receiver
does not have to keep a timer.  When the sender fails to receive
an ACK for its data packet, rather than resending the data packet it
sends a special control packet (REP, for "Reply requested").  When
the receiver gets the REP, it responds by resending the latest ACK.
The sender then can decide which data packets (if any) have to be
retransmitted.
 
This alternative is better because it does not require the sender
to keep an artificially long timeout.  The sender can use the optimal
timeout (slightly greater than the round trip delay).  Also, it is
simpler since the receiver does not need to maintain a timer of
its own.
 
On a slightly different subject...
 
Near the end of his message, Phil describes an alternative protocol
that does not use sequence numbers.  I agree that the sliding window
protocol as in AX.25 is excessively complex (indeed AX.25 is far
too complex in lots of places and yet has some serious omissions, but
that's for another note).  But you can't build a correct protocol
without sequence numbers if you use retransmission for error
recovery.  As a minimum, you need a one-bit sequence number,
as is done in BISYNC.  This is necessary because otherwise you can't
distinguish these two cases:
 
1. A sends data packet 1, B acks it, A receives ack, sends data packet 2
 
2. A sends data packet 1, B acks it, ack is lost, A times out and
   retransmits data packet 1.
 
Note that ACK acking doesn't eliminate the problem since case 2 still
applies if 2 or 3 acks in a row are lost and A eventually times out
anyway.
 
Phil appears to suggest that you can tell the two apart by having the
receiver keep a copy of the last received data packet and comparing
this with the next packet.  Apart from the fact that this is costly
(having to do a string comparison on each packet) it doesn't work
because it is perfectly legal to send two consecutive packets with
the same data fields.
 
So minimally you need 1 bit in each direction for a sequence number.
AX.25 uses 3 bits, which really makes no difference; the amount of
state to be maintained in AX.25 per connection is no different
(give or take a byte or two) from that required for the "bare bones"
protocol.  The reason to argue against AX.25 is not the amount of
state info required but rather the unnecessary complexity and the
errors in the protocol.
 
73,
	paul koning, pa0pkg/w1

karn@petrus.UUCP (Phil R. Karn) (03/19/86)

This is in response to Paul Koning's article of 13 March 1986 commenting
on my posting in December on an experimental new link layer protocol.

That posting represented my ideas at a very early stage; WB6RQN and I
wrote a paper for the 5th ARRL Computer Networking Conference that
incorporates a revised version of the protocol as part of a larger paper
on link level performance issues. Some of your questions and/or
objections are answered in the final paper; it's available as
<chepponis>ack-ack.ms (ms macro source) or <chepponis>ack-ack.txt (printable
output) on c.cs.cmu.edu; use anonymous FTP.

You are correct that my assumption of equal loss probability for data
and ACK packets is probably too optimistic. However, I do not agree that
"there is in fact little benefit from the proposal". Most packets are
lost due to collisions, not random thermal noise, so the relationship
between packet size and loss probability is not as clear. Monitoring the
local packet radio frequency reveals a LOT of unnecessary data packet
retransmissions, for whatever reason. ACK-ACK would clearly help in this
case. On the other hand, given a choice I would certainly prefer to have
channels so good that ACK-ACK wasn't needed...

Your alternate proposal (the Reply Requested control packet) is already
in AX.25 Version Two in the form of the Poll/Final recovery procedures.
However, I think that in the very lossy environment for which my
protocol is designed, receiver ACK retransmission is more efficient
because the sender wouldn't have to send anything to elicit the ACK
retransmission.  (Remember that the poll can itself be clobbered). My
goal in all of this has been to minimize the amount of channel time
needed to transfer a given amount of data; since we operate on a shared
channel, I consider the delay as seen by one user to be much less
important than optimizing the efficiency of the channel as a whole.

It ought to be pretty obvious that our protocol was motivated by the
desire for a simpler and more efficient means of relaying IP
datagrams. The host-host (transport level) protocols (e.g., TCP) used
atop IP are designed to reject duplicates created by retransmissions
within the network. However, a little analysis shows that the number of
duplicates would snowball (i.e., increase exponentially) as a packet
propagates across a series of links, and performance would be pretty
bad. Therefore, the published version of the protocol includes an "ID"
field whose purpose is to detect and filter out the majority (but not
necessarily all) duplicates caused by link layer retransmission.

Paul, thanks very much for your thoughtful note. Brian and I would be
very interested in any further comments after you've had a chance to
look over the revised paper, as it contains some further refinements and
simplifications I haven't mentioned here. We'd also be interested in
hearing your comments on AX.25.

73,

Phil Karn, KA9Q