ron@BRL.ARPA.UUCP (07/19/86)
Dave Clark once again observes that a token ringnet outperforms an Ethernet in handling back-to-back packets. The ringnet has an automatic retransmission function built into the network interface, and will retransmit rejected packets until they get accepted, while an Ethernet interface loses subsequent packets if they follow the first one too closely. In theory, but let us look at two fielded devices. The INTERLAN N1010A Ethernet interface and the PROTEON 10MB RINGNET. The INTERLAN handles multiple incoming packets by buffering some number of messages comming in from the net in interface memory while waiting for the host to begin the data transfer. The PROTEON can not accept back to back messages because the board does not reset to copying messages from the interface after the end of the first message so it misses the header of the second message. There is no automatic retransmit because the source board drains the ring until it sees its own message come back, which should be at the beginning of the train of messages. It can't leave it in the ring, because it will be eaten by another interface who had transmitted a message. It can't retransmit until the token comes by. I've tried reenabling copy as soon as the DMA has finished, but there is still a delay, and I also feel that something is amis in the interrupt logic when I do this. You are still at a slight win because it is possible for the lower levels to tell when retransmission is needed, however, a lot of retransmission is needed because of the misdesign of the interface, significantly more so than is ever needed on our Ethernets. Not to say that I am down on the Proteons, much of what we are doing at BRL would be difficult or impossible without them. I just wish they could double buffer so that you would not miss the header of successive packets. -Ron
JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (07/20/86)
First, let me correct some misstatements in both your characterization of the ring systems (one of which I had a hand in designing at MIT). It does not automatically retransmit packets; perhaps the person was confusing it with Ethernet interfaces which do retransmit packets automatically in case of a collision. Also, the 80MB ring interface does have on board packet buffer and it will receive back to back packets without host intervention, although the 10MB ring interface does not. What both 10M and 80M *do* have (as you alluded to) is a low level *acknowledgment*. I.e., you know (with some reasonable degree of probability) whether or not the intended recipient got the packet. The reason I think that this is important is because it is starting to become clear that dropping packets is a Bad Thing in terms of the effect on performance. Any losses and retransmissions have serious effects on the performance of 'single-ACK' protocols like TCP, especially when you are running at high data rates. The single ACK is such a weak mechanism that it should only be used as a backstop for rare failures; if you have to use it a lot, you lose a lot of performance. Since we are stuck with TCP -> you should not drop packets. The point of all this is that nets ought to have a reasonable hardware acknowledgement feature that would let you know when a host could not accept a packet destined to it. I don't know why the didn't put one in Ethernet; it would have been really trivial. The CHAOS hardware built at the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature; the recipient jammed the cable (causing a collision) if a packet for that destination could not be handled. What amazes me is that although the IEEE Ethernet spec did make all sorts of changes to the spec, of little or no pactical utility as far as I can see (e.g. Ethernet demonstrably works fine without a length field), they didn't fix this glaring defect! A typical standards committee: they make all sorts of gratuitous changes to an existing widespread spec, resulting in a massive incompatability problem, without fixing any real problems. I guess it is true that with a multi-buffered interface you are less likely to drop packets, but it still does help. Also, low level acks can help gateways a lot. If the next hop gateway on a route dies, you can detect it, and reroute around it. You can also give more informative error messages. (How I hate 'connection timed out' - I want to know why! The annoying thing is that even when gateways go to great lengths to send back ICMP error messages, most hosts do not reflect them to users. When I put ICMP error support into a gateway, I could not find a host at MIT that would take the messages and use them to make an intelligent error message for the user!) Noel -------
Lixia@XX.LCS.MIT.EDU.UUCP (07/20/86)
I wrote the original paragraph of Dave Clark once again observes that a token ringnet outperforms an Ethernet in handling back-to-back packets. The ringnet has an automatic retransmission function built into the network interface, and will retransmit rejected packets until they get accepted, while an Ethernet interface loses subsequent packets if they follow the first one too closely. So I'd better clean up my own mistake. As Noel has pointed out, the ringnet interface returns an acknowledgment; therefore when the receiving interface cannot catch up with incoming packets, the source host network driver quickly retransmits any negatively acknowledged packet till the interface returns an positive-ACK (or till hits some max retrans number). For Ethernet, if the receiving interface doesn't get a packet right, the packet is lost. Having buffers at the interface is helpful, for both ethernet and ring. Ethernet is a loser in that, using Noel's word, it lacks a low-level ACK. Lixia -------
CERF@A.ISI.EDU.UUCP (07/21/86)
Noel, Please elaborate on "single ACK" problem. As you know, TCP ACKS are at least "inclusive" so that subsequent ACK can make up for one lost, if more data is sent and received. This isn't perfect, of course, since "inclusive" ACK doesn't help if data was lost at the receiver. Perhaps you are thinking about ACKs which cover data received past data lost (selective ACKS)? We looked at this several times but the complexity of the mechanism did not seem to buy enough to justify it. Vint
mark@cbosgd.ATT.COM.UUCP (07/21/86)
In article <12224206784.24.JNC@XX.LCS.MIT.EDU> you write: > The point of all this is that nets ought to have a reasonable >hardware acknowledgement feature that would let you know when a host could >not accept a packet destined to it. I don't know why the didn't put one in >Ethernet; it would have been really trivial. The CHAOS hardware built at >the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature; >the recipient jammed the cable (causing a collision) if a packet for that >destination could not be handled. I note that 802.2 has a whole bunch of "connection oriented" features added onto the side of Ethernet. While I assume most of us are ignoring them (I gather they were put there by X.25 types) I wonder if there would be a clean way to use these facilities to get an ack or nak back for normal IP type datagrams? Since I gather we have some standards to work out regarding ARP on 802.2 anyway, maybe this would be a good time to adopt some other conventions too? Mark
jas@BRUBECK.PROTEON.COM (07/21/86)
I'll answer two open questions at once here. The 802.2 connection oriented features probably really exist so that IBM could run SNA on "their" 802.5 Token-Ring Network. SNA absolutely requires a reliable data-link layer, this is essentially the only level where there are any data integrity features in the SNA architecture. That's why IBM's Token-Ring board has a complete 802.2 connection oriented (VC) in the firmware of their PC board, along with an extended XID frame for SNA. I don't think that using a VC data link for IP is going to help you on a LAN. First of all, nobody's going to manage to write a 6-10 megabit/sec 802.2 VC layer. Secondly, stacking VC layers does not always work well. Third, this is not really the way TCP/IP was intended to be used. (Of course, on slow nets like the ARPANET, the VC code does not get in the way.) Fourth, the sequence numbering is only modulo-128. This can get consumed rapidly by tinygrams, and you will go into senquence number wait. On the issues of single ACK in TCP, this has to do with degenerative congestion when packets are being dropped. The sender sends 5120 bytes in ten TCP packets. The second one gets dropped due to congestion. The ACK of the first comes back. The last 8 packets get retransmitted. The second one (orignal fourth) gets dropped due to congestion. Repeat. What we have is a tendency towards instability when packets start getting lost. Note that the congestion is getting worse for eveyone due to this, since packets are being sent many extra times. This sort of problem is why people are developing protocols with what I call "ACK vectors", such as NETBLT at MIT, and NETEX from Network Systems. These provide the fate of the last 'n' packets in ACKs, rather than a ACK-point. Only the dropped packet gets retransmitted in these protocols. john shriver proteon -------
leong@ANDREW.CMU.EDU (John Leong) (07/21/86)
In IEEE802.5 (a.k. IBM token ring), there is two low level acknowlegemnt of sort in the MAC layer encapsulation - at the end of the frame. When a station grap a token for transmission, it will set the A and C bits (Address Recognised and Frame Copied) to 0. As the frame zap round the ring, if all goes well, the detsination station will receive the frame and set both the A and C bits to 1. When the frame continues its merry way back to the sender for purging, the sender can deduce from the status of the A and C bit what has happened. If A and C are both set to 1, all's well. If A and C are 0, there is a good probability that the destination is not up or on the net. If A is 1 and C is 0 then the receiving station has a congestion problem. If A is 0 and C is 1, we have something really strange going on. Note that the acknowlegement is all done within one ring rotation as the A and C bit is flipped on the fly by the receiver and is very efficient. There is no explicit ACK frame involved. Furthermore, the IBM token ring has a nifty feature built into the chip set. If an interface detects a congestion situation, it will send out a special frame (MAC frame) to tell whoever wants to know (network monitoring station) that a soft error situation has been detected. It is really useful for network management and planning. Leong
leong@ANDREW.CMU.EDU (John Leong) (07/21/86)
Mark, Re : 802.2 Type 2 operation 802.2 offers you Type 1 or Type 2 operation. Type 1 is pure datagram stuff with the ARPANET's "take your chance" approach while Type 2 goes the other extreme and do both flow control and error recovery. The general idea is that if you are going for a heavy weight Tranpsort Layer already such as TCP or TP-4, you should leave every thing to that layer and chose Type 1. If you are going to use light weight Tranpsort layer such as TP-0, then Type 2 is for you. (Interestingly, IBM is using Type 2 since under SNA, the link layer is the only level that will do error recovery). Hence unless we can get IEEE802.2 to create a Type 1.5, we don't think it is worth our while to spend the cycles required for Type 2. (Actually, having a Type 1.5 that will do low level acknowlegement but without flow control and error recovery procedure may be quite useful - particularly for network level gateway machines). Leong
jas@BRUBECK.PROTEON.COM (07/23/86)
There is a proposed type 3 802.2 under consideration, which is reliable datagram. Still, the A and C bits help so much that I'm not so sure this will be valuable for TCP/IP. john shriver proteon -------
jbn@GLACIER.STANFORD.EDU.UUCP (07/31/86)
1. If you are losing packets due to having too few receiving buffers in your Ethernet controller, get a modern Ethernet controller. The worst known offender is the old 3COM Multibus Ethernet controller used in early SUN systems; not only does it have only two receiving buffers, it has no overrun detection, and thus the software never tallies the many packets it tends to lose. 2. If you are losing packets due to congestion problems in a TCP-based system, this can be fixed; see my various RFCs on the subject. "Improving" the protocol by adding extra acknowledgements or fancier retransmission schemes is NOT the answer. I've developed some workable solutions that are documented in RFCs and implemented in 4.3BSD. 3. The real need for link-level acknowledges, or at least some indication of non-delivery that works most of the time, is for routing around faults. Ethernets transmit happily into black holes; when the destination dies, the source never knows. When the destination Ethernet node is a gateway, and said gateway goes down, there is no low-level way for the sending Ethernet node to notice this and divert to an alternate gateway. This is a serious problem in hi-rel systems, because we have no standard way for a host on a multi-gateway Ethernet to behave which will cause it to divert from one gateway to another when one gateway fails. There are a number of approaches to this problem, all of them lousy: - Ignore it and put up with at least minutes and perhaps indefinite downtime when a supposedly redundant gateway fails. (Considered unacceptable in military systems) - Shorten the ARP timeout to 10 seconds or so and spend excessive resources sending ARPs. (Tends to cause one retransmit every 10 seconds due to non-clever ARP implementations). - Let the hosts participate in some kind of nonstandard routing protocol so they can tell when a gateway dies. (No good for off-the-shelf hosts). - Let the transport layer inform the datagram layer when a retransmit occurs, so that the datagram layer can trigger the selection of a different gateway; if this causes selection of an up but ill-chosen gateway, a redirect from that gateway corrects the situation. (Some code to do this is in 4.2BSD, but it wasn't fully implemented.) It's all so much easier if you have link-level failure-to deliver indications. John Nagle
JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (08/05/86)
Right, I was referring to selective ACK's; i.e. a bit vector or an array of ack ranges or something which allows you to say 'I did get this stuff but not that' and describe holes, etc. (Just out of interest, protocol archaelogists and old fogies may remember that the Xerox PUP BSP had such ACK's!) As far as the whole quesion of engineering tradeoffs on ACK's go, there are a lot of different interacting factors and criteria. The two big questions seem to be whether to ack packets or bytes, and whether to have single or multiple ack's. (Following is expansion for those who aren't familiar with the tradeoffs.) The correct answer seems to be conditioned by a couple of design criteria. The first is what effective data rates you expect to see, and the second is what packet loss rate the system has. If you want high data rates, either a) the net has to have an extremely low packet loss rate, or b) you need a smarter acknowledgement strategy. In case b), it would seem that since the overhead of processing ack's on a per byte basis is too high, the thing to do is to do ack's on a per packet basis. It seems that in a lossy system, ack'ing on a per byte basis (which allows retransmissions to be coalesced) is the right thing for slow connections. I'm not sure what the right answer is. I really don't go back far enough to know what the discussions in the early days of TCP ('76 or so, I would imagine) made of all the issues and tradeoffs. I talked to Dave Clark, who does remember, and in retrospect the problem was fairly fully understood; the impact of packet losses on high data rates transfers was clear (although perhaps the degree to which a single loss could affect very high speed transfers was not appreciated). Apparently, the system was assumed to have a low loss rate, modulo congestion, which was supposed to be handled via a separate mechanism. (The fact that the original design of this mechanism didn't work and a new one has yet to be created is the cause of a lot of our current problems.) The per-byte acks were part of the flow control, which wanted to be on a per byte basis. I guess we won't really know if the right decision was made until the system as a whole either is made to obey the design criterion that is currently being violated (low loss rates) or it proves impossible to meet that constraint. In the latter case, a different mechanism would be indicated. It seems to be another case of the 'simple safety net' philosophy; as long as some mechanism is not used much, it doesn't matter if the design is optimal: it's used so rarely. Ack's are in precisely this boat: if you don't lose many packets, you don't need a sophisticated ack strategy. Noel -------