[mod.protocols.tcp-ip] RING vs. ETHER - Theory and practice.

ron@BRL.ARPA.UUCP (07/19/86)

      Dave Clark once again observes that a token ringnet outperforms an
      Ethernet in handling back-to-back packets.  The ringnet has an
      automatic retransmission function built into the network
      interface, and will retransmit rejected packets until they get
      accepted, while an Ethernet interface loses subsequent packets if
      they follow the first one too closely.

In theory, but let us look at two fielded devices.  The INTERLAN N1010A
Ethernet interface and the PROTEON 10MB RINGNET.  The INTERLAN  handles
multiple incoming packets by buffering some number of messages comming
in from the net in interface memory while waiting for the host to begin
the data transfer.  The PROTEON can not accept back to back messages
because the board does not reset to copying messages from the interface
after the end of the first message so it misses the header of the second
message.  There is no automatic retransmit because the source board drains
the ring until it sees its own message come back, which should be at the
beginning of the train of messages.  It can't leave it in the ring, because
it will be eaten by another interface who had transmitted a message.  It
can't retransmit until the token comes by.  I've tried reenabling copy
as soon as the DMA has finished, but there is still a delay, and I also
feel that something is amis in the interrupt logic when I do this.

You are still at a slight win because it is possible for the lower levels
to tell when retransmission is needed, however, a lot of retransmission
is needed because of the misdesign of the interface, significantly more
so than is ever needed on our Ethernets.

Not to say that I am down on the Proteons, much of what we are doing at
BRL would be difficult or impossible without them.  I just wish they
could double buffer so that you would not miss the header of successive
packets.

-Ron

JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (07/20/86)

	First, let me correct some misstatements in both your characterization
of the ring systems (one of which I had a hand in designing at MIT). It does
not automatically retransmit packets; perhaps the person was confusing it with
Ethernet interfaces which do retransmit packets automatically in case of a
collision. Also, the 80MB ring interface does have on board packet buffer and
it will receive back to back packets without host intervention, although the
10MB ring interface does not.

	What both 10M and 80M *do* have (as you alluded to) is a low level
*acknowledgment*. I.e., you know (with some reasonable degree of probability)
whether or not the intended recipient got the packet. The reason I think that
this is important is because it is starting to become clear that dropping
packets is a Bad Thing in terms of the effect on performance. Any losses and
retransmissions have serious effects on the performance of 'single-ACK'
protocols like TCP, especially when you are running at high data rates. The
single ACK is such a weak mechanism that it should only be used as a backstop
for rare failures; if you have to use it a lot, you lose a lot of performance.
Since we are stuck with TCP -> you should not drop packets.
	The point of all this is that nets ought to have a reasonable
hardware acknowledgement feature that would let you know when a host could
not accept a packet destined to it. I don't know why the didn't put one in
Ethernet; it would have been really trivial. The CHAOS hardware built at
the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature;
the recipient jammed the cable (causing a collision) if a packet for that
destination could not be handled.
	What amazes me is that although the IEEE Ethernet spec did make all
sorts of changes to the spec, of little or no pactical utility as far as I can
see (e.g. Ethernet demonstrably works fine without a length field), they
didn't fix this glaring defect! A typical standards committee: they make all
sorts of gratuitous changes to an existing widespread spec, resulting in a
massive incompatability problem, without fixing any real problems.

	I guess it is true that with a multi-buffered interface you are less
likely to drop packets, but it still does help. Also, low level acks can help
gateways a lot. If the next hop gateway on a route dies, you can detect it,
and reroute around it. You can also give more informative error messages. (How
I hate 'connection timed out' - I want to know why! The annoying thing is that
even when gateways go to great lengths to send back ICMP error messages, most
hosts do not reflect them to users. When I put ICMP error support into a
gateway, I could not find a host at MIT that would take the messages and use
them to make an intelligent error message for the user!)

	Noel
-------

Lixia@XX.LCS.MIT.EDU.UUCP (07/20/86)

I wrote the original paragraph of 

      Dave Clark once again observes that a token ringnet outperforms an
      Ethernet in handling back-to-back packets.  The ringnet has an
      automatic retransmission function built into the network
      interface, and will retransmit rejected packets until they get
      accepted, while an Ethernet interface loses subsequent packets if
      they follow the first one too closely.

So I'd better clean up my own mistake.  As Noel has pointed out, the
ringnet interface returns an acknowledgment; therefore when the receiving
interface cannot catch up with incoming packets, the source host network
driver quickly retransmits any negatively acknowledged packet till the
interface returns an positive-ACK (or till hits some max retrans number).
For Ethernet, if the receiving interface doesn't get a packet right, the
packet is lost.

Having buffers at the interface is helpful, for both ethernet and ring.
Ethernet is a loser in that, using Noel's word, it lacks a low-level ACK.

Lixia
-------

CERF@A.ISI.EDU.UUCP (07/21/86)

Noel,

Please elaborate on "single ACK" problem. As you know, TCP ACKS are at least
"inclusive" so that subsequent ACK can make up for one lost, if more data
is sent and received. This isn't perfect, of course, since "inclusive" ACK 
doesn't help if data was lost at the receiver. Perhaps you are thinking about
ACKs which cover data received past data lost (selective ACKS)? We looked at
this several times but the complexity of the mechanism did not seem to
buy enough to justify it.

Vint

mark@cbosgd.ATT.COM.UUCP (07/21/86)

In article <12224206784.24.JNC@XX.LCS.MIT.EDU> you write:
>	The point of all this is that nets ought to have a reasonable
>hardware acknowledgement feature that would let you know when a host could
>not accept a packet destined to it. I don't know why the didn't put one in
>Ethernet; it would have been really trivial. The CHAOS hardware built at
>the MIT AI Lab (which was a 4MB/sec Ether like system) had such a feature;
>the recipient jammed the cable (causing a collision) if a packet for that
>destination could not be handled.

I note that 802.2 has a whole bunch of "connection oriented" features
added onto the side of Ethernet.  While I assume most of us are ignoring
them (I gather they were put there by X.25 types) I wonder if there would
be a clean way to use these facilities to get an ack or nak back for
normal IP type datagrams?  Since I gather we have some standards to
work out regarding ARP on 802.2 anyway, maybe this would be a good
time to adopt some other conventions too?

	Mark

jas@BRUBECK.PROTEON.COM (07/21/86)

I'll answer two open questions at once here.

The 802.2 connection oriented features probably really exist so that IBM
could run SNA on "their" 802.5 Token-Ring Network. SNA absolutely
requires a reliable data-link layer, this is essentially the only level
where there are any data integrity features in the SNA architecture.
That's why IBM's Token-Ring board has a complete 802.2 connection
oriented (VC) in the firmware of their PC board, along with an extended
XID frame for SNA.

I don't think that using a VC data link for IP is going to help you on a
LAN. First of all, nobody's going to manage to write a 6-10 megabit/sec
802.2 VC layer. Secondly, stacking VC layers does not always work well.
Third, this is not really the way TCP/IP was intended to be used. (Of
course, on slow nets like the ARPANET, the VC code does not get in the
way.) Fourth, the sequence numbering is only modulo-128. This can get
consumed rapidly by tinygrams, and you will go into senquence number
wait.

On the issues of single ACK in TCP, this has to do with degenerative
congestion when packets are being dropped. The sender sends 5120 bytes
in ten TCP packets. The second one gets dropped due to congestion. The
ACK of the first comes back. The last 8 packets get retransmitted. The
second one (orignal fourth) gets dropped due to congestion. Repeat. What
we have is a tendency towards instability when packets start getting
lost. Note that the congestion is getting worse for eveyone due to this,
since packets are being sent many extra times. This sort of problem is
why people are developing protocols with what I call "ACK vectors", such
as NETBLT at MIT, and NETEX from Network Systems. These provide the fate
of the last 'n' packets in ACKs, rather than a ACK-point. Only the
dropped packet gets retransmitted in these protocols.

					john shriver
 					proteon
-------

leong@ANDREW.CMU.EDU (John Leong) (07/21/86)

In IEEE802.5 (a.k. IBM token ring), there is two low level acknowlegemnt of
sort in the MAC layer encapsulation - at the end of the frame. When a station
grap a token for transmission, it will set the A and C bits (Address Recognised
and Frame Copied) to 0. As the frame zap round the ring, if all goes well, the
detsination station will receive the frame and set both the A and C bits to
1. When the frame continues its merry way back to the sender for purging, the
sender can deduce from the status of the A and C bit what has happened. 

If A and C are both set to 1, all's well.
If A and C are 0,  there is a good probability that the destination is not up
or on the net.
If A is 1 and C is 0 then the receiving station has a congestion problem.
If A is 0 and C is 1, we have something really strange going on.

Note that the acknowlegement is all done within one ring rotation as the A and
C bit is flipped on the fly by the receiver and is very efficient. There is
no explicit ACK frame involved.

Furthermore, the IBM token ring has a nifty feature built into the chip set.
If an interface detects a congestion situation, it will send out a special frame
(MAC frame) to tell whoever wants to know (network monitoring station) that
a soft error situation has been detected. It is really useful for network management
and planning.

Leong

leong@ANDREW.CMU.EDU (John Leong) (07/21/86)

Mark,

Re : 802.2 Type 2 operation

802.2 offers you Type 1 or Type 2 operation. Type 1 is pure datagram stuff with
the ARPANET's "take your chance" approach while Type 2 goes the other extreme
and do both flow control and error recovery. 

The general idea is that if you are going for a heavy weight Tranpsort Layer
already such as TCP or TP-4, you should leave every thing to that layer and
chose Type 1. If you are going to use light weight Tranpsort layer such as TP-0,
then Type 2 is for you. (Interestingly, IBM is using Type 2 since under SNA,
the link layer is the only level that will do error recovery). 

Hence unless we can get IEEE802.2 to create a Type 1.5, we don't think it is
worth our while to spend the cycles required for Type 2. (Actually, having a
Type 1.5 that will do low level acknowlegement but without flow control and
error recovery procedure may be quite useful - particularly for network level
gateway machines).

Leong

jas@BRUBECK.PROTEON.COM (07/23/86)

There is a proposed type 3 802.2 under consideration, which is reliable
datagram. Still, the A and C bits help so much that I'm not so sure
this will be valuable for TCP/IP.
					john shriver
					proteon
-------

jbn@GLACIER.STANFORD.EDU.UUCP (07/31/86)

      1.  If you are losing packets due to having too few
	  receiving buffers in your Ethernet controller,
	  get a modern Ethernet controller.  The worst known
	  offender is the old 3COM Multibus Ethernet controller
	  used in early SUN systems; not only does it have only
	  two receiving buffers, it has no overrun detection, and
	  thus the software never tallies the many packets it tends
	  to lose.  

      2.  If you are losing packets due to congestion problems in a
	  TCP-based system, this can be fixed; see my various RFCs
	  on the subject.  "Improving" the protocol by adding extra
	  acknowledgements or fancier retransmission schemes is
	  NOT the answer.  I've developed some workable solutions
	  that are documented in RFCs and implemented in 4.3BSD.

      3.  The real need for link-level acknowledges, or at least
	  some indication of non-delivery that works most of the
	  time, is for routing around faults.  Ethernets transmit
	  happily into black holes; when the destination dies,
	  the source never knows.  
	  When the destination Ethernet node is a gateway,
	  and said gateway goes down, there is no low-level way for
	  the sending Ethernet node to notice this and divert to an
	  alternate gateway.  This is a serious problem in hi-rel
	  systems, because we have no standard way for a host on
	  a multi-gateway Ethernet to behave which will cause it
	  to divert from one gateway to another when one gateway
	  fails.   There are a number of approaches to this
	  problem, all of them lousy:

	  - Ignore it and put up with at least minutes and perhaps
	    indefinite downtime when a supposedly redundant gateway fails.
	    (Considered unacceptable in military systems)
          - Shorten the ARP timeout to 10 seconds or so and spend
            excessive resources sending ARPs.
 	    (Tends to cause one retransmit every 10 seconds due
	    to non-clever ARP implementations).
          - Let the hosts participate in some kind of nonstandard
	    routing protocol so they can tell when a gateway dies.
	    (No good for off-the-shelf hosts).
	  - Let the transport layer inform the datagram layer when
	    a retransmit occurs, so that the datagram layer can trigger
     	    the selection of a different gateway; if this causes
	    selection of an up but ill-chosen gateway, a redirect
	    from that gateway corrects the situation.  (Some code
	    to do this is in 4.2BSD, but it wasn't fully implemented.)

	  It's all so much easier if you have link-level failure-to
	  deliver indications.

					John Nagle

JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (08/05/86)

	Right, I was referring to selective ACK's; i.e. a bit vector or an
array of ack ranges or something which allows you to say 'I did get this
stuff but not that' and describe holes, etc. (Just out of interest, protocol
archaelogists and old fogies may remember that the Xerox PUP BSP had such
ACK's!)

	As far as the whole quesion of engineering tradeoffs on ACK's go,
there are a lot of different interacting factors and criteria. The two big
questions seem to be whether to ack packets or bytes, and whether to have
single or multiple ack's. (Following is expansion for those who aren't
familiar with the tradeoffs.)
	The correct answer seems to be conditioned by a couple of design
criteria. The first is what effective data rates you expect to see, and the
second is what packet loss rate the system has. If you want high data rates,
either a) the net has to have an extremely low packet loss rate, or b) you
need a smarter acknowledgement strategy. In case b), it would seem that
since the overhead of processing ack's on a per byte basis is too high, the
thing to do is to do ack's on a per packet basis. It seems that in a lossy
system, ack'ing on a per byte basis (which allows retransmissions to be
coalesced) is the right thing for slow connections.

	I'm not sure what the right answer is. I really don't go back far
enough to know what the discussions in the early days of TCP ('76 or so, I
would imagine) made of all the issues and tradeoffs. I talked to Dave Clark,
who does remember, and in retrospect the problem was fairly fully
understood; the impact of packet losses on high data rates transfers was
clear (although perhaps the degree to which a single loss could affect very
high speed transfers was not appreciated). Apparently, the system was
assumed to have a low loss rate, modulo congestion, which was supposed to be
handled via a separate mechanism. (The fact that the original design of this
mechanism didn't work and a new one has yet to be created is the cause of a
lot of our current problems.) The per-byte acks were part of the flow
control, which wanted to be on a per byte basis.
	I guess we won't really know if the right decision was made until
the system as a whole either is made to obey the design criterion that is
currently being violated (low loss rates) or it proves impossible to meet
that constraint. In the latter case, a different mechanism would be
indicated.

	It seems to be another case of the 'simple safety net' philosophy;
as long as some mechanism is not used much, it doesn't matter if the design
is optimal: it's used so rarely. Ack's are in precisely this boat: if you
don't lose many packets, you don't need a sophisticated ack strategy.

	Noel
-------