[comp.dcom.sys.cisco] strange behaviour involving repeaters

XJELDC@gemini.ldc.lu.se (Jan Engvald LDC) (07/06/90)

>Date: Tue, 19 Jun 90 17:13:43 -0700
>From: Greg Wohletz <greg%duke.cs.unlv.edu@RELAY.CS.NET>
>Subject: strange behaviour involving repeaters
>To: cisco@spot.Colorado.EDU
>Message-id: <9006200655.AA23113@spot.Colorado.EDU>
>
>We are experiancing some very strange problems.  First let me draw
>you a diagram of part of our network.
...
>        o       our cisco has 6 ethernet interfaces, the problem exists on
>                all of them.
> 
>So, for some reason the gateway is ignoring, or throwing away most packets that
>have passed through the repeater.  However, all our other machines can send
>and recieve packets that pass through the repeater without any
>problems.

I don't think the problem is the repeater, but the Cisco. We have the same
problems in a setup where we have two MT800s cascaded, to which about 10
Retix bridges, a Dataco remote bridge, a Dec VAX and two Ciscos are connected.
Everybody can talk to anybody, except host behind the remote bridge can not
talk through the Cisco. The Cisco decides that 80% of those packets are bad,
but an Ethernet monitor and all other equipment says they are OK.

This problem was introduced when we upgraded from old Ethernet cards to the
MCI cards. We swapped cards back and forth a couple of times and with old
cards we got 0 lost packets in 100000, with MCI we got 80% lost from the
Dataco bridge and 0.1% - 0.2% from any of the Retix bridges.

(Whenever we have had an error rate above 0.01% it has always been due to
some faulty hardware or configuration, so 0.2% is MUCH too high!)

The above problems with MCI cards was reported to Cisco via our Swedish
representative in autumn 1989, but Cisco didn't believe in it then. Later
this problem was recognised, as it has occured at other places too, and I 
have been told that there will be a firmware fix to the MCI card soon that 
will correct it.

I have no information on what the MCI card is doing wrong and how that
is corrected, though. For Ethernet trancseivers there are dedicated test
equipment that can tell you if the device under test is within allowed
limits, but I have not found any such test equipment that can test the
other end of the transceiver cable, the controller.
                                             
Jan Engvald, Lund University Computing Center
________________________________________________________________________
   Address: Box 783                E-mail: xjeldc@ldc.lu.se
            S-220 07 LUND     Earn/Bitnet: xjeldc@seldc52
            SWEDEN           (Span/Hepnet: Sweden::Gemini::xjeldc)
    Office: Soelvegatan 18         VAXPSI: psi%24020031020720::xjeldc
 Telephone: +46 46 107458          (X.400: C=se; A=TeDe; P=Sunet; O=lu;
   Telefax: +46 46 138225                  OU=ldc; S=Engvald; G=Jan)
     Telex: 33533 LUNIVER S

"James_W._Morrison.ESAE"@Xerox.COM (07/06/90)

Greg;

Try changing the tranceivers on on either side of the repeater
one at a time.  Sometimes tranceivers will break packets when
heavily loaded with traffic (especially if they go flacky and
not totally break).  If the tranceivers don't fix the problem
then the repeater has to be breaking packets either because
it can't handle the traffic load or it's flacky.

If all else fails hook an o-scope up to the ethernet using
a "T" adapter in series with the terminated end and look for
broken packets (packets with an amplitude of more than -2 volts).
The broken packets may all be mangled or only a portion of the
packet near the end.  Disconnect the reapeater and see if the
broken packet condition still exists.  If so you have a machine
or tranceiver somewhere on that physical segment that's
flacky/broken.  Using the buddy system go and disconnect each
machine on taht segment one at a time until you find the
mangler.  Walla fix this guy and all will be well.

The reason this causes problems on both sides of the repeater
is that it looks like an open and repeaters only pass what they see
and don't try to "qualify" any packet, good or broken.

hope this helps,
Jim Morrison
Network Systems Analyst
Xerox Corp.
El Segundo, CA.

robelr@bronze.ucs.indiana.edu (Allen Robel) (07/07/90)

>I don't think the problem is the repeater, but the Cisco. We have the same
>problems in a setup where we have two MT800s cascaded, to which about 10

Hmmm.  We recently evaluated a product called NQA The Prophet that is
basically a Physical/MAC layer LAN analyser.  The first thing this
product told us was that the MCI on the cisco for this LAN was
clocking 3 times too fast.  I talked to cisco about this and they
requested information on this analyser.  After looking over what I had
sent them, they did admit that this was a problem with some of their
earlier interfaces.  I asked them how one could differentiate these
interfaces from their newer ones and have since not gotten a response.

Anyway, could this problem manifest itself in the symptoms mentioned
above and in earlier notes?  As this is a physical layer problem,
tools like the Sniffer, LANWatch, etc wouldn't catch it.  Just a
thought.

regards,


Allen Robel                               robelr@bronze.ucs.indiana.edu 
University Computing Services             ROBELR@IUJADE.BITNET 
Network Research & Planning               voice: (812)855-7171
Indiana University                        FAX:   (812)855-8299

hedrick@cs.rutgers.edu (07/07/90)

I'm always suspicious of vendors that claim somebody is sending data
"too fast for an Ethernet".  You may recall that a number of people
claimed that Sun was violating Ethernet specs, when it turns out that
their interfaces and/or software were simply not capable of dealing
with high traffic levels.  We heard a claim recently from a vendor of
fiber Ethernet that MCI's ran "faster than 10Mbps".  When we finally
traced this down through their technical people, we believe the
problem is that they simply can't handle as high packet rates as the
MCI.  The 10Mbps speed is a fairly fundamental feature of Ethernet
which should be determined by the controller chip (through I guess it
probably depends upon something like a crystal to do timing, so we
have to assume the cisco designer was competent enough to use the
right frequency crystal).  Apparently the firmware does determine the
minimum interpacket spacing.  If you suspect you are dealing with a
device that can't take packets as fast as the MCI can generate them,
you can always use the "transmitter-delay" interface parameter to
insert additional delay.

robelr@bronze.ucs.indiana.edu (Allen Robel) (07/07/90)

>I'm always suspicious of vendors that claim somebody is sending data
>"too fast for an Ethernet".  You may recall that a number of people
>claimed that Sun was violating Ethernet specs, when it turns out that


The problem WAS with the crystal cisco was using for timing and it is
a problem that cisco has admitted to.


Allen Robel                               robelr@bronze.ucs.indiana.edu 
University Computing Services             ROBELR@IUJADE.BITNET 
Network Research & Planning               voice: (812)855-7171
Indiana University                        FAX:   (812)855-8299

BILLW@mathom.cisco.com (WilliamChops Westfield) (07/07/90)

Ok, here is the complete story.

In early 1989, a batch of MCI boards were built with an incorrect TYPE
of crystal.  This incorrect crystal caused the wire clocking to run
.03% faster than 10 Mhz.  The Ethernet Spec only allows .01% variation,
so we were 3x over the variation limit (Which is not nearly the same
thing as being 3x too fast!)  In most cases, the interfaces continued
to work just fine, and interoperated with all other devices on the
ethernet cable.  However, some devices, notably some twisted pair
ethernet transceivers, didn't like it.  Most of the MCIs that were
causing problems in the field have been fixed.

All boards manufactured since May 1989 should have the correct
crystals, including all version 3 MCIs.

Boards with the wrong crystals still interoperate with most other
equipment.

You can identify suspect boards by inspecting the ethernet encoder
crystals, which are near the ethernet connectors on the board.  The
out-of-spec crystals are tiny little things (.2 x .2 x .5 inches or so)
labeled "fs200".  Anything else is correct.

William  Westfield
cisco Engineering.
-------

hedrick@athos.rutgers.edu (Chuck Hedrick) (07/10/90)

One thing to be careful about with MCI's:  in normal operation and
MCI will report more errors than older interfaces.  Collisions cause
fragments.  I have a feeling that the MCI sees these as individual
packets with errors, and older cards don't see them at all.  At any
rate, on busy networks, our MCI's show .1 to 1% input errors.
However tests with ping show pretty clearly that there are no actual
errors occuring.  We've tended to ignore input errors unless it gets
over 1% or there are other symptoms of problems.  This doesn't mean
that you have nothing to worry about.  I don't know your situation,
so I can't tell.  But simply the fact that you have higher rates
reported by the MCI than by older interfaces does not automatically
mean there are problems with the MCI's.