[comp.dcom.lans] Ethernet Collisions

bob@pirates.UUCP (Bob Fawcett) (12/04/90)

I would like to know what can cause excessive collisions on Ethernet.  I 
realize that collisions are a fact of life on ethernet.
My net analyzer shows a large number of local collisions (as high as 20%).
I can't isolate to just one or two machines.  I also don't have any one
machine which is completely non functional.  In other words I haven't
nailed it down to a problem with one machine (bad card or ?).  What other
things should I look for?  Ground problems?  My cable scanner doesn't show
any bad segments of cable.
MOst of the machines are on Thinnet.  There is fiber between buildings.
Total of about 150 stations campus wide.

Any Ideas?
Thanks in advance.
Bob Fawcett
Director, Academic Computing
Armstrong State College
bob@pirates.uucp

dave@monu6.cc.monash.edu.au (Dave Schwarz) (12/05/90)

In article <467@pirates.UUCP>, bob@pirates.UUCP (Bob Fawcett) writes:
> I would like to know what can cause excessive collisions on Ethernet.  I 
> realize that collisions are a fact of life on ethernet.
> My net analyzer shows a large number of local collisions (as high as 20%).
> I can't isolate to just one or two machines.  I also don't have any one
> machine which is completely non functional.  In other words I haven't
> nailed it down to a problem with one machine (bad card or ?).  What other
> things should I look for?  Ground problems?  My cable scanner doesn't show
> any bad segments of cable.
> MOst of the machines are on Thinnet.  There is fiber between buildings.
> Total of about 150 stations campus wide.
> 
> Any Ideas?

Try looking at any AUI cables you may have, we just had a very simmilar 
problem with one of out subnets. The transfer rate had dropped from aprox 
200Kbytes/sec to 20Kbytes/sec.
The problem only appeared when we swapped and Dempa for a Delni (2 dec products)

both were driving a 20m piece of AUI cable connected to a F/O repeater. When
we swapped we got the reduction in bandwidth, after much checking we discovered
that the cable was to long, we shortened it back to 5 meters and it worked ?
It seemed that the Delni didn't have enought omph to drive long bits of cable.

dave....
-- 
Dave Schwarz @ Monash Uni Caulfield Campus               This space now for hire
900 Dandynong Rd,East Caulfield,Vic,Australia         (and I know that DANDENONG
dave@monu6.cc.monash.edu.au                            doesn't have a Y in it !)
Dave@banyan.cc.monash.edu.au              Dave@vx24.cc.monash.edu.au (Yuk a vax)

gaj@hpctdja.HP.COM (Gordon Jensen) (12/06/90)

>I would like to know what can cause excessive collisions on Ethernet.  I 
>realize that collisions are a fact of life on ethernet.
>My net analyzer shows a large number of local collisions (as high as 20%).
>I can't isolate to just one or two machines.  I also don't have any one
>machine which is completely non functional.  In other words I haven't
>nailed it down to a problem with one machine (bad card or ?).  What other
>things should I look for?  Ground problems?  My cable scanner doesn't show
>any bad segments of cable.

>Bob Fawcett

First thing to check for is grounding problems, especially on your
ThinLAN.  If there are two grounds on a segment, 60 Hz can flow in
the shield.  Enough IR drop can occur to trigger a repeater's carrier
sense circuit.  Since there is no signal to lock to, repeaters that
I've seen just source 10 MHz out all ports.  This is bad.  Extra 
grounds can occur when the T connector isn't covered with it's
prophylactic.  A quick test is to throw an *analog* scope on the
cable and sync to line, with the timebase set to show multiple
cycles of 60 Hz.  


Good luck,

Gordon

iiitih@cybaswan.UUCP (Ivan Izikowitz) (12/06/90)

If you don't think you have a faulty card, then you probably have a 
heavily loaded network. Just take a look at any of the published
performance curves for the 802.3 protocol - throughput is severely
degraded once the offered load exceeds a certain value (I think about
40% of channel capacity?)

   Ivan 

 @ The Institute for Industrial Information Technology, Innovation Centre
                             Swansea SA2 8PP
    Phone: (+44) 792 295213     |    JANET:     iiitih@uk.ac.swan.pyr 
  Fax:    (+44) 792 295532      |    UUCP:  ..!ukc!cybaswan.UUCP!iiitih

john@newave.UUCP (John A. Weeks III) (12/07/90)

In article <467@pirates.UUCP> bob@pirates.UUCP (Bob Fawcett) writes:
> I would like to know what can cause excessive collisions on Ethernet.  I 
> realize that collisions are a fact of life on ethernet.

> What other things should I look for?  Ground problems?  My cable scanner
> doesn't show any bad segments of cable.

I was recently fighting a problem like this.  It turned out to be a bad
connection between a tranciever and a drop cable to a router.  Unknown to
me at the time, the number of retries was very high.  This extra traffic
lead to excessive collisions.

-john-

-- 
===============================================================================
John A. Weeks III               (612) 942-6969               john@newave.mn.org
NeWave Communications                 ...uunet!rosevax!tcnet!wd0gol!newave!john
===============================================================================

spurgeon@.uucp (Charles E. Spurgeon) (12/12/90)

In article <2184@cybaswan.UUCP> iiitih@cybaswan.UUCP (Ivan Izikowitz) writes:
>If you don't think you have a faulty card, then you probably have a 
>heavily loaded network. Just take a look at any of the published
>performance curves for the 802.3 protocol - throughput is severely
>degraded once the offered load exceeds a certain value (I think about
>40% of channel capacity?)
>

I think that the 40% figure you refer to comes from simulations of
"Ethernet" that don't happen to reflect the real Ethernet protocol all
that well.  Ethernet traffic tends to be bursty, and one second
samples of traffic showing 40% utilization would be nothing to get
worried about.  Even a constant load of 40% on an Ethernet (a
situation that is unusual) would still not be all that big a deal.

For the empirical evidence as to Ethernet's ability to move data, see
the SIGCOMM paper presented a couple of years back.  Here's the access
info from the Network Manager's Reading List:

      The following technical report from the Digital  Equip-
      ment   Corporation's  Western  Research  Lab  documents
      empirical evidence showing that the 10 megabit Ethernet
      system is capable of transmitting large amounts of data
      in a reliable fashion.  The report is also  useful  for
      its  analysis of what makes a good Ethernet implementa-
      tion.  Included is a brief set of  guidelines  for  the
      network  manager who wants their Ethernet system to run
      as well as possible.

      o    Measured Capacity of an Ethernet: Myths and  Real-
           ity
           David R. Boggs, Jeffrey C. Mogul,  Christopher  A.
           Kent.
           Proceedings of the SIGCOMM '88 Symposium  on  Com-
           munications   Architectures   and  Protocols,  ACM
           SIGCOMM, Stanford, CA., August 1988, 31 pps.

      From the Abstract:

      "Ethernet, a 10 Mbit/sec CSMA/CD network, is one of the
      most  successful LAN technologies.  Considerable confu-
      sion exists as to the actual capacity of  an  Ethernet,
      especially  since  some of the theoretical studies have
      examined operating regimes that are not  characteristic
      of actual networks.  Based on measurements of an actual
      implementation, we show that for a wide class of appli-
      cations,  Ethernet  is  capable of carrying its nominal
      bandwidth  of  useful  traffic,   and   allocates   the
      bandwidth fairly."

      This paper is also  available  over  the  Internet  via
      electronic  mail  from the DEC Western Research archive
      server.  Send a message to the following  address  with
      the  word "help" in the Subject line of the message for
      detailed   instructions.    The   address    is    WRL-
      Techreports@decwrl.dec.com.

      You may also request a copy of the report  through  the
      U.S. postal system by writing to:

           Technical Report Distribution
           DEC Western Research Laboratory, UCO-4
           100 Hamilton Avenue
	   Palo Alto, California 94301

henry@zoo.toronto.edu (Henry Spencer) (12/12/90)

In article <2184@cybaswan.UUCP> iiitih@cybaswan.UUCP (Ivan Izikowitz) writes:
>... Just take a look at any of the published
>performance curves for the 802.3 protocol - throughput is severely
>degraded once the offered load exceeds a certain value (I think about
>40% of channel capacity?)

What published performance curves for what protocol?  The throughput of
802.3, aka Ethernet, is monotonic increasing as load increases.  There
is no "severe degradation".  Even under massive overload it continues
to move data, although collisions limit it to something like 70% of the
theoretical channel capacity under those conditions.

(Note, this assumes multiple sources of traffic.  A single source of
traffic can run an Ethernet at circa 100% of theoretical, so 70% is down
somewhat compared to that ideal state.)

Many of the early simulation studies of "Ethernet" were actually studying
different protocols with inferior performance, either because the folks
involved thought they could "improve" Ethernet or because they didn't
understand it very well to begin with (often both).  The numbers and curves
from those studies are completely irrelevant to real Ethernet, although
myths derived from them are persistent among Ethernet's detractors.
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

cornutt@freedom.msfc.nasa.gov (David Cornutt) (12/15/90)

The percent utilization of the channel capacity of an Ethernet (or any
CSMA-type network) depends not so much on the total volume of traffic as
on the number of nodes that have traffic ready to transmit simultaneously. 
As Henry Spencer noted in a previous article, an Ethernet with only one
node transmitting can get pretty close to 100% utilization.  (Such
situations do occur; we have an application here where there may be about
70 nodes on a net, but only one or two nodes generating the lion's share of
the traffic.)  The limiting factor is the probability of getting a
collision, which is roughly proportional to the number of nodes that are
generating large amounts of traffic.  There is a derivation in the
Tannenbaum book (*Computer Networks*, second edition, Prentice Hall, 1988)
which can be expanded to show the expected utilization for n number of
nodes transmitting (or attempting to) simultaneously.  The theoretical
worst case occurs about n = 100, where the channel utilization is down to
about 37%.  (I have seen values close to this in a campus network that I
once worked on.)  In practice, an Ethernet starts to break down at this
point as controllers begin giving up due to exceeding their max retry
settings.

There are ways to make CSMA networks get better utilization under these
conditions by introducing a random go/no-go decision into retransmissions. 
A node attempts to retransmit picks a random number such that it has an x%
chance of attempting the retansmit; if the random draw loses, the node does
not attempt retransmission but backs off again.  The lower x gets, the
better the overall channel utilization gets.  The tradeoff is that the
average latency for individual packets becomes very long as x increases,
which is why you don't see many commercial implementations of this scheme. 

Of course, none of the above figures take into account the overhead
introduced by upper layer protocols.


-- 
David Cornutt, New Technology Inc., Huntsville, AL  (205) 461-6457
(cornutt@freedom.msfc.nasa.gov; some insane route applies)
"The opinions expressed herein are not necessarily those of my employer,
not necessarily mine, and probably not necessary."

mart@csri.toronto.edu (Mart Molle) (12/15/90)

In article <1990Dec14.191255.20529@freedom.msfc.nasa.gov> cornutt@freedom.msfc.nasa.gov (David Cornutt) writes:
>The percent utilization of the channel capacity of an Ethernet (or any
>CSMA-type network) depends not so much on the total volume of traffic as
>on the number of nodes that have traffic ready to transmit simultaneously. 
>As Henry Spencer noted in a previous article, an Ethernet with only one
>node transmitting can get pretty close to 100% utilization.  (Such
>situations do occur; we have an application here where there may be about
>70 nodes on a net, but only one or two nodes generating the lion's share of
>the traffic.)  The limiting factor is the probability of getting a
>collision, which is roughly proportional to the number of nodes that are
>generating large amounts of traffic.  There is a derivation in the
>Tannenbaum book (*Computer Networks*, second edition, Prentice Hall, 1988)
>which can be expanded to show the expected utilization for n number of
>nodes transmitting (or attempting to) simultaneously.  The theoretical
>worst case occurs about n = 100, where the channel utilization is down to
>about 37%.  (I have seen values close to this in a campus network that I
>once worked on.)  In practice, an Ethernet starts to break down at this
>point as controllers begin giving up due to exceeding their max retry
>settings.

Pay no attention to Tananenbaum's calculation of Ethernet throughput.  It is
a gross simplification, based on a model first put forward by Metcalfe and
Boggs in 1976 that assume slotted operation, ``global queue in the sky''
backoff algorithm, etc.  Also, you've neglected to include their full model
which includes the effect of packet lengths.  Basically, this model says
the channel consists of a repeating pattern of ``cycles'' each of which
consists of a run of [short] ``wasted'' slots (whose analysis is assumed
to be the same as slotted Aloha) followed by a single [long] ``useful'' slot.
The 37% figure you quote above is for minimal-length packets, where ``useful''
slots are the same size as ``wasted'' slots and the whole thing degenerates
into slotted Aloha.  If you make the ``useful'' slots bigger (i.e., you put
a non-trivial amount of data into each packet), then the model predicts
much higher attainable throughputs.  For example if the ``useful'' slots
are 1/a times longer than ``wasted'' slots, the capacity is about

	1/ (1 + a * (e-1)),

where e is the base of the natural logarithm and 1/e is the capacity of
slotted Aloha.

If you want a more accurate analysis of the throughput for unslotted
1-persistent CSMA/CD used in Ethernet, go read the articles by Sohraby,
Molle and Venetsanopoulos, and by Takagi and Kleinrock, in IEEE Transactions
on Communications, February 1987.  (BTW, neither paper appears in the 
widely referenced ``Myths and Reality'' paper from Sigcomm 88, which
includes an earlier paper by Takagi and Kleinrock that gave totally wrong
answers due to errors in the analysis....)  These papers show that CSMA/CD
can get very high channel efficiencies even in the limit of infinitely
many active stations.  However, they fail to include the truly bizarre
influences of the truncated binary exponential backoff algorithm used
on Ethernet and thus are not the last word on the subject.

>There are ways to make CSMA networks get better utilization under these
>conditions by introducing a random go/no-go decision into retransmissions. 

[Description of p-persistent CSMA deleted]

There are lots of other CSMA protocols in the world that look better than
Ethernet.  I don't think p-persistent stands out in any way in this group.
Obviously, there are other reasons (like compatibility) that make people
stick with the standard...

Mart L. Molle
Computer Systems Research Institute
University of Toronto
Toronto, Canada M5S 1A4
(416)978-4928

lws@comm.wang.com (Lyle Seaman) (12/21/90)

iiitih@cybaswan.UUCP (Ivan Izikowitz) writes:

>If you don't think you have a faulty card, then you probably have a 
>heavily loaded network. Just take a look at any of the published

Right.

>performance curves for the 802.3 protocol - throughput is severely
>degraded once the offered load exceeds a certain value (I think about
>40% of channel capacity?)

Rong.   Yeah, a lot of the papers say that, but they're talking
about the entire load evenly distributed over the entire network.
And even then, they cite figures of 65% of capacity.  I routinely
see usage approaching 90% with very little trouble  (admittedly,
an unusual LAN configuration, as well).

-- 
Lyle                  Wang           lws@capybara.comm.wang.com
508 967 2322     Lowell, MA, USA     Source code: the _ultimate_ documentation.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/22/90)

On 14 Dec 90 21:33:57 GMT, mart@csri.toronto.edu (Mart Molle) said:

mart> In article <1990Dec14.191255.20529@freedom.msfc.nasa.gov>
mart> cornutt@freedom.msfc.nasa.gov (David Cornutt) writes:

cornutt> The percent utilization of the channel capacity of an Ethernet
cornutt> (or any CSMA-type network) depends not so much on the total
cornutt> volume of traffic as on the number of nodes that have traffic
cornutt> ready to transmit simultaneously. [ ... look at Tanebaum's book
cornutt> and see that ... ] The theoretical worst case occurs about n =
cornutt> 100, where the channel utilization is down to about 37%.

mart> Pay no attention to Tananenbaum's calculation of Ethernet
mart> throughput.  It is a gross simplification, [ ... ] For example if
mart> the ``useful'' slots are 1/a times longer than ``wasted'' slots,
mart> the capacity is about 1/ (1 + a * (e-1)), where e is the base of
mart> the natural logarithm and 1/e is the capacity of slotted Aloha.

mart> If you want a more accurate analysis of the throughput for
mart> unslotted 1-persistent CSMA/CD used in Ethernet, go read the
mart> articles by Sohraby, Molle and Venetsanopoulos, and by Takagi and
mart> Kleinrock, in IEEE Transactions on Communications, February 1987.
mart> [ ... ] These papers show that CSMA/CD can get very high channel
mart> efficiencies even in the limit of infinitely many active stations.

But this is just the utlization factor of the channel. Okay, Ethernet is
not slotted Aloha, and it can get very high utlization factors basically
because latency is very small and in the occurrence of a collision the
abort is nearly istantaneous, and the retry by another station has good
chance of success.

However what about delay? The medium gets near its rated thruput, but
the average station will have to wait a time that is pretty long,
retrying quite a bit. Suppose we have 100 stations on the net, each of
them, if efficiency is 100%, will get almost 10KB per second bandwidth,
1% of the total, and wait (assuming equal sized packets) 99% of the time
for a chance to send its packet, by waiting for silence on the wire or
for retransmit timeouts. Even with 10 stations, say communicating in
pairs, things are fairly bleak.

How do we square the model that says that we should get near optimal
efficiency with the reality that we do not get it? A hint is given by
comparing:

cornutt> (I have seen values close to this in a campus network that I
cornutt> once worked on.)  In practice, an Ethernet starts to break down
cornutt> at this point as controllers begin giving up due to exceeding
cornutt> their max retry settings.

mart> However, they fail to include the truly bizarre influences of the
mart> truncated binary exponential backoff algorithm used on Ethernet
mart> and thus are not the last word on the subject.

In practice even if the medium efficiency is high, the big problem is
that the network interfaces are bad, and are not fast enough. If you
take into account the limitations of network interfaces the picture
changes suddenly, in particular for rooted communications patterns, in
which a lot of the traffic goes to a single interface, which gets
swamped.

If the idea that Ethernet-the-wire-and-protocol can achieve 100%
efficiency (but with long and variable delays) is true, and I think this
is now established, it is interesting but not much relevant, because the
real bottleneck is the network interface, and those are usually quite
horrid.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

rauletta@gmuvax2.gmu.edu (R. J. Auletta) (04/15/91)

We have been having some problems with an ethernet installation
that our Computer Network Services organization seems unwilling
to resolve. I am looking for some insight from those who might
have a sense of whether what we are seeing is normal.

The problem is characterized as follows.

1) Interactive sessions (typing etc) tends to get periodically
interrupted every couple of seconds for a tenth of a second or
more (the echo time becomes longer than the time to type a 5-10 character word.)
(This seems directly related to any burst of ethernet traffic
over about 10,000 bytes/sec as reported by etherd on a Sun.)
(When the ethernet load is low, the ethernet is very
responsive.)

2) The following indications on an American Network Connections
ANC-80 8-port fanout transceiver 802.3 while the problem is present.

RCV light intermittently active
REM COL blinks as the RCV goes out [everytime]. (Remote Collision)
LOC COL off
TRVR PRES light is dimly on.        (?)
SQE is off.

3) Traffic on the ethernet is (as reported by etherd on a
Sun workstation) about 25K-75K bytes per second running
about 50-150 packets per second when (1) is observed.
Most of the traffic is between just two Sun workstations.

4) Running netx (a tcp exerciser) on a Vax3600 to a VS2000
showing a network load of about 10% shows almost continuous collisions even
when only the two machines are active on the network.
(Every blink of the RCV light on the fanout unit results in the REM COL
blinking, the LOC COL stays off.)

Is this normal? At the described load would one expect to experience
poor interactive response that ethernets are known for? Might
this be due to the "supposed" problem with Sun's interpretation
of the ethernet standard in regards to back to back packets?

The general form of the ethernet is a thick riser with one thinnet
transceiver and several AUI transceivers with a bridge to
a fiber-optic segment.

What I am looking for are some suggestions as to what we might
look for to isolate the problem (such as "this sounds like a noise
problem", or "excessive reflections", or "load is just too high").

Characterized but still confused,

R J Auletta
rauletta@sitevax.gmu.edu

andrew@jhereg.osa.com (Andrew C. Esh) (04/16/91)

In article <4150@gmuvax2.gmu.edu> rauletta@gmuvax2.gmu.edu (R. J. Auletta) writes:
>We have been having some problems with an ethernet installation
>that our Computer Network Services organization seems unwilling
>to resolve. I am looking for some insight from those who might
>have a sense of whether what we are seeing is normal.
>

Unwilling to resolve? Pardon me, but this sounds like an attitude problem.
It's their job to resolve just this sort of thing. Maybe they are stuck,
and can't think of an approach. Keep at them.

>The problem is characterized as follows.
>
>1) Interactive sessions (typing etc) tends to get periodically
>interrupted every couple of seconds for a tenth of a second or
>more (the echo time becomes longer than the time to type a 5-10 character word.)
>(This seems directly related to any burst of ethernet traffic
>over about 10,000 bytes/sec as reported by etherd on a Sun.)
>(When the ethernet load is low, the ethernet is very
>responsive.)
>
>2) The following indications on an American Network Connections
>ANC-80 8-port fanout transceiver 802.3 while the problem is present.
>
>RCV light intermittently active
>REM COL blinks as the RCV goes out [everytime]. (Remote Collision)
>LOC COL off
>TRVR PRES light is dimly on.        (?)
>SQE is off.

Collisions! There's most of it right there!

>
>3) Traffic on the ethernet is (as reported by etherd on a
>Sun workstation) about 25K-75K bytes per second running
>about 50-150 packets per second when (1) is observed.
>Most of the traffic is between just two Sun workstations.
>
>4) Running netx (a tcp exerciser) on a Vax3600 to a VS2000
>showing a network load of about 10% shows almost continuous collisions even
>when only the two machines are active on the network.
>(Every blink of the RCV light on the fanout unit results in the REM COL
>blinking, the LOC COL stays off.)
>
>Is this normal? At the described load would one expect to experience
>poor interactive response that ethernets are known for? Might
>this be due to the "supposed" problem with Sun's interpretation
>of the ethernet standard in regards to back to back packets?
>
>The general form of the ethernet is a thick riser with one thinnet
>transceiver and several AUI transceivers with a bridge to
>a fiber-optic segment.
>
>What I am looking for are some suggestions as to what we might
>look for to isolate the problem (such as "this sounds like a noise
>problem", or "excessive reflections", or "load is just too high").
>
>Characterized but still confused,
>
>R J Auletta
>rauletta@sitevax.gmu.edu

I would suggest checking everything between the ANC-80 and the main
backbone (or whatever it connects to). I would concentrate on the cable,
but check transceivers too. If you can get a cable or a TDR, that will
probably show you that the cable is bad. Check the end connectors, and try
it with a VOM, testing for a short between the shield and the conductor.
Wiggle and twist the ends as you do this test, since it could be
intermittent. Also find a convenient ground and see if there is any voltage
potential between the shield and ground. A ground faluted sheild make
sending a signal down the wire like trying to blow a marble through a
garden hose full of holes. Bad cable will usually give you the kind of
reflactions that cause the collisions you seem to be seeing. Even mediocre
cable will run 20% with less than one collision per second.

Also, could you say more about this ANC-80 thing? Is it a Multiport
Repeater, or a Bridge, or what? 8 port fanout transciever?  Not sure what
that might be.

-- 
Andrew C. Esh			andrew@osa.com
Open Systems Architects, Inc.
Mpls, MN 55416-1528		Punch down, turn around, do a little crimpin'
(612) 525-0000			Punch down, turn around, plug it in and go ...

brian@telebit.com (Brian Lloyd) (04/16/91)

SQE simply asserts the collision signal momentarily during the
interpacket gap.  This is to let the interface know that the
transceiver is still alive.  Normally the interface ignores SQE
because it sees it at a particular time immediately following the
transmission of a packet but when you plug the transceiver into a
fanout box, everyone sees the SQE/collision signal and interprets it
as a remote collision because they didn't just send a packet.

I personally find the behavior of SQE to be annoying for this and
other reasons.  Turn off SQE and your "remote collision" problems will
be greatly reduced.

-- 
Brian Lloyd, WB6RQN                              Telebit Corporation
Network Systems Architect                        1315 Chesapeake Terrace 
brian@napa.telebit.com                           Sunnyvale, CA 94089-1100
voice (408) 745-3103                             FAX (408) 734-3333