[comp.protocols.tcp-ip] ether collision backoff

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (10/09/89)

(I'm sorry to talk about ethernets here, but I don't think enough
knowledgable people read other forums.)

At last week's Interop, I was told that a large workstation company's
products can be configured to use non-standard retransmission-after-collision
delays.

Is this true?  Does it cause big problems, as my informant implied?
Exactly which parameter(s) is(are) changed--the exponent, the multiplier,
the algorithm?  Does it really help NFS that much?  Is there a relevant
de-facto or de-jurie standard?  Should Silicon Graphics do the same thing?
Are the Protocol Police about to squash the idea?

Vernon Schryver
Silicon Graphics
vjs@sgi.com

nagle@well.UUCP (John Nagle) (10/09/89)

In article <42686@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>
>At last week's Interop, I was told that a large workstation company's
>products can be configured to use non-standard retransmission-after-collision
>delays.
>
>Is this true?  Does it cause big problems, as my informant implied?

      The issues here are very subtle.  Hosts that use shortened backoff
delays are engaging in economic warfare with other hosts on the cable for
a bigger share of the network bandwidth.  Attempts to do this on nets with
more than a few hosts may result in congestion collapse, at the link rather
than the datagram layer.

      Packet networks are actually computational ecologies in which the hosts
are competing for resources.  Certain types of competition result in
economic instability.  Ethernet hosts contend for resources, but under a
uniform discipline that makes the contention fair.  By violating the
rules, a host can improve its share of the resource at the expense of
other hosts.  However, in doing so, it increases the number of collisions
on tries ohther than the first.  This reduces the overall capacity of the
cable, because a higher percentage of the bandwidth is lost to collisions.
Thus, the optimial strategy for the host is suboptimal for the network.
Continued pursuit of the optimal strategy for the host results in the 
network operating in a grossly suboptimal way.  In networks, we call this
congestion collapse.  In economics, it's called the "tragedy of the
commons."

      Back in 1984, somebody at Sun turned down the retransmit delay in
4.2BSD's TCP, apparently hoping to "improve performance".  The resulting
mess caused trouble in the Internet for years.  (See my paper in IEEE
Trans. on Communications, April 1987, for details.)  

      Is it Sun again?

      General advice: DON'T DO THIS unless you can establish by both
mathematical calculation and tests on large networks that you are not
introducing instability.  Operationally, the way this class of problem
manifests itself is that everything works fine until a momentary network
overload occurs, and then throughput drops while net traffic remains 
heavy.  After a while, connections fail, network traffic drops, and
things return to normal, leaving users angry and puzzled.

					John Nagle

ug051@crayamid.cray.com (Michael Nittmann) (10/10/89)

comment: ethernet retransmission upon collision is way down below IP
and on the tcp/ip level you should not bother.

the "standard" retransmission serves to scatter the retransmission events
in time space since a collision is detected always on all neighbouring
hosts on an ethernet trunc . 
It is in fact that the electrical messages spreading out on the ethernet 
trunc overlap (at least partially) and scramble. By the way the detection is
done purely on hardware level: when two signals arrive the energy on the
cable is higher and this leads to collision detect going off.

When all people use the same algorithm of retransmitting timing, which
contains a statistical element (random number dependent) that is
different for all hosts (the point is different for all hosts) you have a
pretty good guaranty that they will not (yes, NOT) retransmit at the same
time. When you start using different algorithms, there might be accidental
"resonances" between the different algorithms, causing perhaps during 
some instatnces of retransmittals two hosts to retransmit in a time interval
than might lead to new collisions.
Even if the new algorithm seems "better", in a fairly local region (about the length
of an ethernet packet or between bridges) all hosts should use about the same
algorithm to avoid accidental synchronous retransmissions.

Even in an environment where collisions are frequent, I would not change the
retransmission algorithms but scope the line to see who spoils it.

michael

philf@xymox.metaphor.com (Phil Fernandez) (10/13/89)

In article <14015@well.UUCP> nagle@well.UUCP (John Nagle) writes:
> (...re. collision backoff pros and cons)
>      General advice: DON'T DO THIS unless you can establish by both
>mathematical calculation and tests on large networks that you are not
>introducing instability...

This general area has been a topic of much conversation around
Metaphor lately, where we're looking to some excessive collision
problems.  Let me ask:

Just how does one change the Ethernet backoff algorithm when it's
implemented completely within an IC such as the AMD Lance?  Sun, for
example, uses the Lance on their desktop workstations, and I don't
know how they might have changed the backoff scheme even if they
wanted to.

How often does one even get the opportunity to reimplement this
algorithm these days?

pmf

+-----------------------------+----------------------------------------------+
| Phil Fernandez              |             philf@metaphor.com               |
|                             |     ...!{apple|decwrl}!metaphor!philf        |
| Metaphor Computer Systems   |"Does the body rule the mind, or does the mind|
| Mountain View, CA           | rule the body?  I dunno..." - Morrissey      |
+-----------------------------+----------------------------------------------+

nowicki@ENG.SUN.COM (Bill Nowicki) (10/14/89)

	From: nagle@well.UUCP (John Nagle)
	Subject: Re: ether collision backoff
	Date: 9 Oct 89 16:53:56 GMT
	References: <42686@sgi.sgi.com>

	Back in 1984, somebody at Sun turned down the retransmit delay
	in 4.2BSD's TCP, apparently hoping to "improve performance".
	The resulting mess caused trouble in the Internet for years.

The above is **NOT TRUE**.  For some reason our competitors are always
starting such false rumors.  The truth is that Sun just ported 4.2BSD,
like many other vendors.  Sun just happened to have faster hardware
than most of the others.

The 4.2BSD TCP implementors did not give much thought to the dynamics
of retransmission timers, as everybody should know by now who has read
any recent papers on the subject.  Some work was done at Berkeley to
improve this in the 4.3BSD project, which Sun incorporated into the
SunOS 3.4 release, and even more work was done for the "4.3BSD Tahoe"
release on congestion control that was included in SunOS 4.0.

I could certainly believe that Sun might have subtle bugs in one of its
Ethernet implementations.  These are not intentional.  Please report
any known violations of the spec to the vendor through the appropriate
customer support channels.  Spreading rumors is a bad idea; if it is
broken, get the vendor to fix it!

mogul@decwrl.dec.com (Jeffrey Mogul) (10/18/89)

In article <42686@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
writes:
    At last week's Interop, I was told that a large workstation company's
    products can be configured to use non-standard retransmission-after-
    collision delays.

To my knowledge this is false rumor, which has been circulating for years.

    Is this true?  Does it cause big problems, as my informant implied?
    Exactly which parameter(s) is(are) changed--the exponent, the multiplier,
    the algorithm?  Does it really help NFS that much?  Is there a relevant
    de-facto or de-jurie standard?  Should Silicon Graphics do the same thing?
    Are the Protocol Police about to squash the idea?

Others have already pointed out that this is an extremely bad idea.  It
might help to prevent misguided experimentation if I try to explain why
it is a bad idea, instead of simply stating "it's against the law" (which,
in fact, it is).

The rationale behind the "binary exponential backoff" algorithm required
by the standard is explained in Robert Metcalfe's PhD thesis, although
the original CACM paper on Ethernet might be more available.  At any rate,
the point is that if there are Q hosts simultaneously wanting to
send a packet on an Ethernet, then in order to avoid congestive instability
each should attempt to transmit with probability 1/Q.

The problem is that each individual host has no global knowledge about
the number of other hosts wanting to transmit.  It must therefore make
an estimate of this number based on the only property of the network
it can observe, namely the number of times it has tried to send the
current packet and had a collision.  Clearly, it is important that
this estimate be nearly right, and that any remaining error be biased
in the direction of inefficiency rather than congestive collapse.

It turns out (if you believe the math in Metcalfe's thesis; I don't
intend to check it) that the "binary exponential backoff" algorithm
gives the right behaviour; the "estimate" of Q embodied in the backoff
count (which is really the base-2 logarithm of Q) is close enough to
work well even under heavy load.  (See the paper I co-authored with
Dave Boggs and Chris Kent in Proc. SIGCOMM '88; we measured a real
Ethernet under moderate overload and showed that it does work.)

One aspect of the Standard Ethernet is that, although a host is supposed
to attempt transmission up to 16 times, it is supposed to stop doubling
the delay after the 10th time; otherwise, the delays would get unreasonably
large.  Of these two constants, "16" is sort of an arbitrary choice, "10"
is not!  This is because the maximum number of hosts on an Ethernet is
set to be 1024 = 2^10; truncating the delay at less than 2^10 slot times
could (if someone was stupid enough to build a net that large) conceivably
cause congestive collapse.  Truncating the delay at more than 2^10 slot
times, on the other hand, doesn't buy you anything.

The reason why one is supposed to give up after 16 attempts is that
after a while it is pointless to continue.  Moreover, in order to
maintain both fairness and reasonable performance, it is important
to limit the time over which "history" of previous network behaviour
is retained.  That is why one starts the next transmission attempt with
a delay-count of 0, rather than trying to be clever and use something
based on the delay you used for the previous packet.  Note that there
are some pathological situations where, with small numbers of very
busy hosts, the Ethernet can be measurably unfair over rather long
time scales.  In general, though, the Ethernet works right.

So: obey the law, it's for your own good.  This law, at any rate.

-Jeff