vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (10/09/89)
(I'm sorry to talk about ethernets here, but I don't think enough knowledgable people read other forums.) At last week's Interop, I was told that a large workstation company's products can be configured to use non-standard retransmission-after-collision delays. Is this true? Does it cause big problems, as my informant implied? Exactly which parameter(s) is(are) changed--the exponent, the multiplier, the algorithm? Does it really help NFS that much? Is there a relevant de-facto or de-jurie standard? Should Silicon Graphics do the same thing? Are the Protocol Police about to squash the idea? Vernon Schryver Silicon Graphics vjs@sgi.com
nagle@well.UUCP (John Nagle) (10/09/89)
In article <42686@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes: > >At last week's Interop, I was told that a large workstation company's >products can be configured to use non-standard retransmission-after-collision >delays. > >Is this true? Does it cause big problems, as my informant implied? The issues here are very subtle. Hosts that use shortened backoff delays are engaging in economic warfare with other hosts on the cable for a bigger share of the network bandwidth. Attempts to do this on nets with more than a few hosts may result in congestion collapse, at the link rather than the datagram layer. Packet networks are actually computational ecologies in which the hosts are competing for resources. Certain types of competition result in economic instability. Ethernet hosts contend for resources, but under a uniform discipline that makes the contention fair. By violating the rules, a host can improve its share of the resource at the expense of other hosts. However, in doing so, it increases the number of collisions on tries ohther than the first. This reduces the overall capacity of the cable, because a higher percentage of the bandwidth is lost to collisions. Thus, the optimial strategy for the host is suboptimal for the network. Continued pursuit of the optimal strategy for the host results in the network operating in a grossly suboptimal way. In networks, we call this congestion collapse. In economics, it's called the "tragedy of the commons." Back in 1984, somebody at Sun turned down the retransmit delay in 4.2BSD's TCP, apparently hoping to "improve performance". The resulting mess caused trouble in the Internet for years. (See my paper in IEEE Trans. on Communications, April 1987, for details.) Is it Sun again? General advice: DON'T DO THIS unless you can establish by both mathematical calculation and tests on large networks that you are not introducing instability. Operationally, the way this class of problem manifests itself is that everything works fine until a momentary network overload occurs, and then throughput drops while net traffic remains heavy. After a while, connections fail, network traffic drops, and things return to normal, leaving users angry and puzzled. John Nagle
ug051@crayamid.cray.com (Michael Nittmann) (10/10/89)
comment: ethernet retransmission upon collision is way down below IP and on the tcp/ip level you should not bother. the "standard" retransmission serves to scatter the retransmission events in time space since a collision is detected always on all neighbouring hosts on an ethernet trunc . It is in fact that the electrical messages spreading out on the ethernet trunc overlap (at least partially) and scramble. By the way the detection is done purely on hardware level: when two signals arrive the energy on the cable is higher and this leads to collision detect going off. When all people use the same algorithm of retransmitting timing, which contains a statistical element (random number dependent) that is different for all hosts (the point is different for all hosts) you have a pretty good guaranty that they will not (yes, NOT) retransmit at the same time. When you start using different algorithms, there might be accidental "resonances" between the different algorithms, causing perhaps during some instatnces of retransmittals two hosts to retransmit in a time interval than might lead to new collisions. Even if the new algorithm seems "better", in a fairly local region (about the length of an ethernet packet or between bridges) all hosts should use about the same algorithm to avoid accidental synchronous retransmissions. Even in an environment where collisions are frequent, I would not change the retransmission algorithms but scope the line to see who spoils it. michael
philf@xymox.metaphor.com (Phil Fernandez) (10/13/89)
In article <14015@well.UUCP> nagle@well.UUCP (John Nagle) writes: > (...re. collision backoff pros and cons) > General advice: DON'T DO THIS unless you can establish by both >mathematical calculation and tests on large networks that you are not >introducing instability... This general area has been a topic of much conversation around Metaphor lately, where we're looking to some excessive collision problems. Let me ask: Just how does one change the Ethernet backoff algorithm when it's implemented completely within an IC such as the AMD Lance? Sun, for example, uses the Lance on their desktop workstations, and I don't know how they might have changed the backoff scheme even if they wanted to. How often does one even get the opportunity to reimplement this algorithm these days? pmf +-----------------------------+----------------------------------------------+ | Phil Fernandez | philf@metaphor.com | | | ...!{apple|decwrl}!metaphor!philf | | Metaphor Computer Systems |"Does the body rule the mind, or does the mind| | Mountain View, CA | rule the body? I dunno..." - Morrissey | +-----------------------------+----------------------------------------------+
nowicki@ENG.SUN.COM (Bill Nowicki) (10/14/89)
From: nagle@well.UUCP (John Nagle) Subject: Re: ether collision backoff Date: 9 Oct 89 16:53:56 GMT References: <42686@sgi.sgi.com> Back in 1984, somebody at Sun turned down the retransmit delay in 4.2BSD's TCP, apparently hoping to "improve performance". The resulting mess caused trouble in the Internet for years. The above is **NOT TRUE**. For some reason our competitors are always starting such false rumors. The truth is that Sun just ported 4.2BSD, like many other vendors. Sun just happened to have faster hardware than most of the others. The 4.2BSD TCP implementors did not give much thought to the dynamics of retransmission timers, as everybody should know by now who has read any recent papers on the subject. Some work was done at Berkeley to improve this in the 4.3BSD project, which Sun incorporated into the SunOS 3.4 release, and even more work was done for the "4.3BSD Tahoe" release on congestion control that was included in SunOS 4.0. I could certainly believe that Sun might have subtle bugs in one of its Ethernet implementations. These are not intentional. Please report any known violations of the spec to the vendor through the appropriate customer support channels. Spreading rumors is a bad idea; if it is broken, get the vendor to fix it!
mogul@decwrl.dec.com (Jeffrey Mogul) (10/18/89)
In article <42686@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes: At last week's Interop, I was told that a large workstation company's products can be configured to use non-standard retransmission-after- collision delays. To my knowledge this is false rumor, which has been circulating for years. Is this true? Does it cause big problems, as my informant implied? Exactly which parameter(s) is(are) changed--the exponent, the multiplier, the algorithm? Does it really help NFS that much? Is there a relevant de-facto or de-jurie standard? Should Silicon Graphics do the same thing? Are the Protocol Police about to squash the idea? Others have already pointed out that this is an extremely bad idea. It might help to prevent misguided experimentation if I try to explain why it is a bad idea, instead of simply stating "it's against the law" (which, in fact, it is). The rationale behind the "binary exponential backoff" algorithm required by the standard is explained in Robert Metcalfe's PhD thesis, although the original CACM paper on Ethernet might be more available. At any rate, the point is that if there are Q hosts simultaneously wanting to send a packet on an Ethernet, then in order to avoid congestive instability each should attempt to transmit with probability 1/Q. The problem is that each individual host has no global knowledge about the number of other hosts wanting to transmit. It must therefore make an estimate of this number based on the only property of the network it can observe, namely the number of times it has tried to send the current packet and had a collision. Clearly, it is important that this estimate be nearly right, and that any remaining error be biased in the direction of inefficiency rather than congestive collapse. It turns out (if you believe the math in Metcalfe's thesis; I don't intend to check it) that the "binary exponential backoff" algorithm gives the right behaviour; the "estimate" of Q embodied in the backoff count (which is really the base-2 logarithm of Q) is close enough to work well even under heavy load. (See the paper I co-authored with Dave Boggs and Chris Kent in Proc. SIGCOMM '88; we measured a real Ethernet under moderate overload and showed that it does work.) One aspect of the Standard Ethernet is that, although a host is supposed to attempt transmission up to 16 times, it is supposed to stop doubling the delay after the 10th time; otherwise, the delays would get unreasonably large. Of these two constants, "16" is sort of an arbitrary choice, "10" is not! This is because the maximum number of hosts on an Ethernet is set to be 1024 = 2^10; truncating the delay at less than 2^10 slot times could (if someone was stupid enough to build a net that large) conceivably cause congestive collapse. Truncating the delay at more than 2^10 slot times, on the other hand, doesn't buy you anything. The reason why one is supposed to give up after 16 attempts is that after a while it is pointless to continue. Moreover, in order to maintain both fairness and reasonable performance, it is important to limit the time over which "history" of previous network behaviour is retained. That is why one starts the next transmission attempt with a delay-count of 0, rather than trying to be clever and use something based on the delay you used for the previous packet. Note that there are some pathological situations where, with small numbers of very busy hosts, the Ethernet can be measurably unfair over rather long time scales. In general, though, the Ethernet works right. So: obey the law, it's for your own good. This law, at any rate. -Jeff