ron@BRL.ARPA.UUCP (07/19/86)
Dave Clark once again observes that a token ringnet outperforms an
Ethernet in handling back-to-back packets. The ringnet has an
automatic retransmission function built into the network
interface, and will retransmit rejected packets until they get
accepted, while an Ethernet interface loses subsequent packets if
they follow the first one too closely.
In theory, but let us look at two fielded devices. The INTERLAN N1010A
Ethernet interface and the PROTEON 10MB RINGNET. The INTERLAN handles
multiple incoming packets by buffering some number of messages comming
in from the net in interface memory while waiting for the host to begin
the data transfer. The PROTEON can not accept back to back messages
because the board does not reset to copying messages from the interface
after the end of the first message so it misses the header of the second
message. There is no automatic retransmit because the source board drains
the ring until it sees its own message come back, which should be at the
beginning of the train of messages. It can't leave it in the ring, because
it will be eaten by another interface who had transmitted a message. It
can't retransmit until the token comes by. I've tried reenabling copy
as soon as the DMA has finished, but there is still a delay, and I also
feel that something is amis in the interrupt logic when I do this.
You are still at a slight win because it is possible for the lower levels
to tell when retransmission is needed, however, a lot of retransmission
is needed because of the misdesign of the interface, significantly more
so than is ever needed on our Ethernets.
Not to say that I am down on the Proteons, much of what we are doing at
BRL would be difficult or impossible without them. I just wish they
could double buffer so that you would not miss the header of successive
packets.
-RonJNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (07/20/86)
First, let me correct some misstatements in both your characterization of the ring systems (one of which I had a hand in designing at MIT). It does not automatically retransmit packets; perhaps the person was confusing it with Ethernet interfaces which do retransmit packets automatically in case of a collision. Also, the 80MB ring interface does have on board packet buffer and it will receive back to back packets without host intervention, although the 10MB ring interface does not. What both 10M and 80M *do* have (as you alluded to) is a low level *acknowledgment*. I.e., you know (with some reasonable degree of probability) whether or not the intended recipient got the packet. The reason I think that this is important is because it is starting to become clear that dropping packets is a Bad Thing in terms of the effect on performance. Any losses and retransmissions have serious effects on the performance of 'single-ACK' protocols like TCP, especially when you are running at high data rates. The single ACK is such a weak mechanism that it should only be used as a backstop for rare failures; if you have to use it a lot, you lose a lot of performance. Since we are stuck with TCP -> you should not drop packets. The point of all this is that nets ought to have a reasonable hardware acknowledgement feature that would let you know when a host could not accept a packet destined to it. I don't know why the didn't put one in Ethernet; it would have been really trivial. The CHAOS hardware built at the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature; the recipient jammed the cable (causing a collision) if a packet for that destination could not be handled. What amazes me is that although the IEEE Ethernet spec did make all sorts of changes to the spec, of little or no pactical utility as far as I can see (e.g. Ethernet demonstrably works fine without a length field), they didn't fix this glaring defect! A typical standards committee: they make all sorts of gratuitous changes to an existing widespread spec, resulting in a massive incompatability problem, without fixing any real problems. I guess it is true that with a multi-buffered interface you are less likely to drop packets, but it still does help. Also, low level acks can help gateways a lot. If the next hop gateway on a route dies, you can detect it, and reroute around it. You can also give more informative error messages. (How I hate 'connection timed out' - I want to know why! The annoying thing is that even when gateways go to great lengths to send back ICMP error messages, most hosts do not reflect them to users. When I put ICMP error support into a gateway, I could not find a host at MIT that would take the messages and use them to make an intelligent error message for the user!) Noel -------
Lixia@XX.LCS.MIT.EDU.UUCP (07/20/86)
I wrote the original paragraph of
Dave Clark once again observes that a token ringnet outperforms an
Ethernet in handling back-to-back packets. The ringnet has an
automatic retransmission function built into the network
interface, and will retransmit rejected packets until they get
accepted, while an Ethernet interface loses subsequent packets if
they follow the first one too closely.
So I'd better clean up my own mistake. As Noel has pointed out, the
ringnet interface returns an acknowledgment; therefore when the receiving
interface cannot catch up with incoming packets, the source host network
driver quickly retransmits any negatively acknowledged packet till the
interface returns an positive-ACK (or till hits some max retrans number).
For Ethernet, if the receiving interface doesn't get a packet right, the
packet is lost.
Having buffers at the interface is helpful, for both ethernet and ring.
Ethernet is a loser in that, using Noel's word, it lacks a low-level ACK.
Lixia
-------CERF@A.ISI.EDU.UUCP (07/21/86)
Noel, Please elaborate on "single ACK" problem. As you know, TCP ACKS are at least "inclusive" so that subsequent ACK can make up for one lost, if more data is sent and received. This isn't perfect, of course, since "inclusive" ACK doesn't help if data was lost at the receiver. Perhaps you are thinking about ACKs which cover data received past data lost (selective ACKS)? We looked at this several times but the complexity of the mechanism did not seem to buy enough to justify it. Vint
mark@cbosgd.ATT.COM.UUCP (07/21/86)
In article <12224206784.24.JNC@XX.LCS.MIT.EDU> you write: > The point of all this is that nets ought to have a reasonable >hardware acknowledgement feature that would let you know when a host could >not accept a packet destined to it. I don't know why the didn't put one in >Ethernet; it would have been really trivial. The CHAOS hardware built at >the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature; >the recipient jammed the cable (causing a collision) if a packet for that >destination could not be handled. I note that 802.2 has a whole bunch of "connection oriented" features added onto the side of Ethernet. While I assume most of us are ignoring them (I gather they were put there by X.25 types) I wonder if there would be a clean way to use these facilities to get an ack or nak back for normal IP type datagrams? Since I gather we have some standards to work out regarding ARP on 802.2 anyway, maybe this would be a good time to adopt some other conventions too? Mark
jas@BRUBECK.PROTEON.COM (07/21/86)
I'll answer two open questions at once here. The 802.2 connection oriented features probably really exist so that IBM could run SNA on "their" 802.5 Token-Ring Network. SNA absolutely requires a reliable data-link layer, this is essentially the only level where there are any data integrity features in the SNA architecture. That's why IBM's Token-Ring board has a complete 802.2 connection oriented (VC) in the firmware of their PC board, along with an extended XID frame for SNA. I don't think that using a VC data link for IP is going to help you on a LAN. First of all, nobody's going to manage to write a 6-10 megabit/sec 802.2 VC layer. Secondly, stacking VC layers does not always work well. Third, this is not really the way TCP/IP was intended to be used. (Of course, on slow nets like the ARPANET, the VC code does not get in the way.) Fourth, the sequence numbering is only modulo-128. This can get consumed rapidly by tinygrams, and you will go into senquence number wait. On the issues of single ACK in TCP, this has to do with degenerative congestion when packets are being dropped. The sender sends 5120 bytes in ten TCP packets. The second one gets dropped due to congestion. The ACK of the first comes back. The last 8 packets get retransmitted. The second one (orignal fourth) gets dropped due to congestion. Repeat. What we have is a tendency towards instability when packets start getting lost. Note that the congestion is getting worse for eveyone due to this, since packets are being sent many extra times. This sort of problem is why people are developing protocols with what I call "ACK vectors", such as NETBLT at MIT, and NETEX from Network Systems. These provide the fate of the last 'n' packets in ACKs, rather than a ACK-point. Only the dropped packet gets retransmitted in these protocols. john shriver proteon -------
leong@ANDREW.CMU.EDU (John Leong) (07/21/86)
In IEEE802.5 (a.k. IBM token ring), there is two low level acknowlegemnt of sort in the MAC layer encapsulation - at the end of the frame. When a station grap a token for transmission, it will set the A and C bits (Address Recognised and Frame Copied) to 0. As the frame zap round the ring, if all goes well, the detsination station will receive the frame and set both the A and C bits to 1. When the frame continues its merry way back to the sender for purging, the sender can deduce from the status of the A and C bit what has happened. If A and C are both set to 1, all's well. If A and C are 0, there is a good probability that the destination is not up or on the net. If A is 1 and C is 0 then the receiving station has a congestion problem. If A is 0 and C is 1, we have something really strange going on. Note that the acknowlegement is all done within one ring rotation as the A and C bit is flipped on the fly by the receiver and is very efficient. There is no explicit ACK frame involved. Furthermore, the IBM token ring has a nifty feature built into the chip set. If an interface detects a congestion situation, it will send out a special frame (MAC frame) to tell whoever wants to know (network monitoring station) that a soft error situation has been detected. It is really useful for network management and planning. Leong
leong@ANDREW.CMU.EDU (John Leong) (07/21/86)
Mark, Re : 802.2 Type 2 operation 802.2 offers you Type 1 or Type 2 operation. Type 1 is pure datagram stuff with the ARPANET's "take your chance" approach while Type 2 goes the other extreme and do both flow control and error recovery. The general idea is that if you are going for a heavy weight Tranpsort Layer already such as TCP or TP-4, you should leave every thing to that layer and chose Type 1. If you are going to use light weight Tranpsort layer such as TP-0, then Type 2 is for you. (Interestingly, IBM is using Type 2 since under SNA, the link layer is the only level that will do error recovery). Hence unless we can get IEEE802.2 to create a Type 1.5, we don't think it is worth our while to spend the cycles required for Type 2. (Actually, having a Type 1.5 that will do low level acknowlegement but without flow control and error recovery procedure may be quite useful - particularly for network level gateway machines). Leong
jas@BRUBECK.PROTEON.COM (07/23/86)
There is a proposed type 3 802.2 under consideration, which is reliable datagram. Still, the A and C bits help so much that I'm not so sure this will be valuable for TCP/IP. john shriver proteon -------
jbn@GLACIER.STANFORD.EDU.UUCP (07/31/86)
1. If you are losing packets due to having too few
receiving buffers in your Ethernet controller,
get a modern Ethernet controller. The worst known
offender is the old 3COM Multibus Ethernet controller
used in early SUN systems; not only does it have only
two receiving buffers, it has no overrun detection, and
thus the software never tallies the many packets it tends
to lose.
2. If you are losing packets due to congestion problems in a
TCP-based system, this can be fixed; see my various RFCs
on the subject. "Improving" the protocol by adding extra
acknowledgements or fancier retransmission schemes is
NOT the answer. I've developed some workable solutions
that are documented in RFCs and implemented in 4.3BSD.
3. The real need for link-level acknowledges, or at least
some indication of non-delivery that works most of the
time, is for routing around faults. Ethernets transmit
happily into black holes; when the destination dies,
the source never knows.
When the destination Ethernet node is a gateway,
and said gateway goes down, there is no low-level way for
the sending Ethernet node to notice this and divert to an
alternate gateway. This is a serious problem in hi-rel
systems, because we have no standard way for a host on
a multi-gateway Ethernet to behave which will cause it
to divert from one gateway to another when one gateway
fails. There are a number of approaches to this
problem, all of them lousy:
- Ignore it and put up with at least minutes and perhaps
indefinite downtime when a supposedly redundant gateway fails.
(Considered unacceptable in military systems)
- Shorten the ARP timeout to 10 seconds or so and spend
excessive resources sending ARPs.
(Tends to cause one retransmit every 10 seconds due
to non-clever ARP implementations).
- Let the hosts participate in some kind of nonstandard
routing protocol so they can tell when a gateway dies.
(No good for off-the-shelf hosts).
- Let the transport layer inform the datagram layer when
a retransmit occurs, so that the datagram layer can trigger
the selection of a different gateway; if this causes
selection of an up but ill-chosen gateway, a redirect
from that gateway corrects the situation. (Some code
to do this is in 4.2BSD, but it wasn't fully implemented.)
It's all so much easier if you have link-level failure-to
deliver indications.
John NagleJNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (08/05/86)
Right, I was referring to selective ACK's; i.e. a bit vector or an
array of ack ranges or something which allows you to say 'I did get this
stuff but not that' and describe holes, etc. (Just out of interest, protocol
archaelogists and old fogies may remember that the Xerox PUP BSP had such
ACK's!)
As far as the whole quesion of engineering tradeoffs on ACK's go,
there are a lot of different interacting factors and criteria. The two big
questions seem to be whether to ack packets or bytes, and whether to have
single or multiple ack's. (Following is expansion for those who aren't
familiar with the tradeoffs.)
The correct answer seems to be conditioned by a couple of design
criteria. The first is what effective data rates you expect to see, and the
second is what packet loss rate the system has. If you want high data rates,
either a) the net has to have an extremely low packet loss rate, or b) you
need a smarter acknowledgement strategy. In case b), it would seem that
since the overhead of processing ack's on a per byte basis is too high, the
thing to do is to do ack's on a per packet basis. It seems that in a lossy
system, ack'ing on a per byte basis (which allows retransmissions to be
coalesced) is the right thing for slow connections.
I'm not sure what the right answer is. I really don't go back far
enough to know what the discussions in the early days of TCP ('76 or so, I
would imagine) made of all the issues and tradeoffs. I talked to Dave Clark,
who does remember, and in retrospect the problem was fairly fully
understood; the impact of packet losses on high data rates transfers was
clear (although perhaps the degree to which a single loss could affect very
high speed transfers was not appreciated). Apparently, the system was
assumed to have a low loss rate, modulo congestion, which was supposed to be
handled via a separate mechanism. (The fact that the original design of this
mechanism didn't work and a new one has yet to be created is the cause of a
lot of our current problems.) The per-byte acks were part of the flow
control, which wanted to be on a per byte basis.
I guess we won't really know if the right decision was made until
the system as a whole either is made to obey the design criterion that is
currently being violated (low loss rates) or it proves impossible to meet
that constraint. In the latter case, a different mechanism would be
indicated.
It seems to be another case of the 'simple safety net' philosophy;
as long as some mechanism is not used much, it doesn't matter if the design
is optimal: it's used so rarely. Ack's are in precisely this boat: if you
don't lose many packets, you don't need a sophisticated ack strategy.
Noel
-------