[comp.dcom.sys.cisco] help needed debugging DDS 56kb prob

dyer@spdcc.COM (Steve Dyer) (09/08/90)

This isn't specifically Cisco-related, but I thought that the readers
here might be familiar with this kind of scenario.

I've just installed a 56kb DDS circuit with DSU/CSUs on each end and a
Cisco CGS/2 on my end (there's a larger Cisco on the other end.)  The
line itself came up with only a little difficulty, but I am having
regular trouble with the serial line staying up for any length of time
without the Cisco dropping DTR and resetting the line.  The number of
carrier transitions reported by "show interface" is almost mindboggling--
after 10 minutes from powerup, it reports 817 carrier transitions and
8 interface resets.  Naturally, this is wreaking havoc with applications
getting ICMP unreachables and prematurely closing.

I had the telco run loopback tests on both ends of the line for as long
as 15 minutes and they reported that it looked as clean as a whistle,
with no data or clocking errors.

I'm really at a loss for where to look next.  Anyone have any pointers?

-- 
Steve Dyer
dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
dyer@arktouros.mit.edu, dyer@hstbme.mit.edu

dyer@spdcc.COM (Steve Dyer) (09/09/90)

In article <3951@ursa-major.SPDCC.COM> dyer@ursa-major.spdcc.COM (Steve Dyer) writes:
>[problems of hundreds of carrier transitions and serial line resets]

The problem has disappeared, and the superficial solution was simple
(if that was in fact the source of the problem.)  I'd specified the
high-speed serial interface on my CGS/2 even though I was planning to
run at 56kb so that I could upgrade simply to fractional T1 speeds if my
downstream link ever upgraded also.  During the initial configuration
of the serial interface, we neglected to specify the line bandwidth as 56kb
(since that was what had been assumed for other CGS/2 models, and was
the default for the Cisco on the other end of the DDS line.  Apparently
though, the default for the high-speed serial interface is T1 rate.
Once I reset this to 56kb, things settled down within minutes, and
I've had only a few carrier transitions and no interface resets since.

I suppose that the "bandwidth" field is used for the initial values
of the HDLC timers, and having such a mismatch between the two sides
(not to mention the actual bandwidth) could cause problems, no? 

By the way, I really have to say that I'm impressed how easily it
is to install one of these gateways and have it all work--it's
practically a "turnkey" operation.  Things have come a long way
since the original LSI-11 gateways when I was at BBN... :-)

-- 
Steve Dyer
dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
dyer@arktouros.mit.edu, dyer@hstbme.mit.edu

pte900@jatz.aarnet.edu.au (Peter Elford) (09/10/90)

In article <3961@ursa-major.SPDCC.COM>, dyer@spdcc.COM (Steve Dyer) writes:
|> In article <3951@ursa-major.SPDCC.COM> dyer@ursa-major.spdcc.COM (Steve Dyer) writes:
|> >[problems of hundreds of carrier transitions and serial line resets]
|> 
|> The problem has disappeared, and the superficial solution was simple
|> (if that was in fact the source of the problem.)  I'd specified the
|> high-speed serial interface on my CGS/2 even though I was planning to
|> run at 56kb so that I could upgrade simply to fractional T1 speeds if my
|> downstream link ever upgraded also.  During the initial configuration
|> of the serial interface, we neglected to specify the line bandwidth as 56kb
|> (since that was what had been assumed for other CGS/2 models, and was
|> the default for the Cisco on the other end of the DDS line.  Apparently
|> though, the default for the high-speed serial interface is T1 rate.
|> Once I reset this to 56kb, things settled down within minutes, and
|> I've had only a few carrier transitions and no interface resets since.
|> 
|> I suppose that the "bandwidth" field is used for the initial values
|> of the HDLC timers, and having such a mismatch between the two sides
|> (not to mention the actual bandwidth) could cause problems, no? 

I understood that the bandwidth sub-command "sets an informational parameter
only; you cannot adjust the actual bandwidth of an interface with this command"
(Gateway Server Manual p. 4-22). If this is not the case, then I would like
to know about it, because on some of our 48K DDS services we see similar
very high transition and reset counts,

Peter Elford,                           	e-mail: P.Elford@aarnet.edu.au
Network Co-ordinator,	 			phone: +61 6 249 3542
Australian Academic Research Network,		fax: +61 6 247 3425
c/o, Computer Services Centre,			post: PO Box 4
Australian National University			      Canberra 2601
Canberra, AUSTRALIA

dyer@spdcc.COM (Steve Dyer) (09/10/90)

In article <25924@boulder.Colorado.EDU> pte900@jatz.aarnet.edu.au (Peter Elford) writes:
>I understood that the bandwidth sub-command "sets an informational parameter
>only; you cannot adjust the actual bandwidth of an interface with this command"
>(Gateway Server Manual p. 4-22). If this is not the case, then I would like
>to know about it, because on some of our 48K DDS services we see similar
>very high transition and reset counts,

I wasn't suggesting that it adjusted the actual bandwidth.  I was
hypothesizing that the "informational parameter" might have been used
for HDLC timers and that a severe mismatch might cause the line to go
down or behave erratically.  I've since been told by folks at Cisco
that the bandwidth parameter is only used within IGRP for routing
information.

In any event, the "fix" appears to have been a coincidence.  Having moved the
equipment to its permanent resting place, the carrier transitions and
interface resets (after a stable, peaceful weekend) have returned with a
vengeance, even with the parameter set to 56kb.  I think I'm in the
nether world of flaky cables/modems, and will pursue it on that level.

-- 
Steve Dyer
dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
dyer@arktouros.mit.edu, dyer@hstbme.mit.edu

kannan@osc.edu (Kannan Varadhan) (09/10/90)

Thus spake dyer@ursa-major.spdcc.COM (Steve Dyer)
>In any event, the "fix" appears to have been a coincidence.  Having moved the
>equipment to its permanent resting place, the carrier transitions and
>interface resets (after a stable, peaceful weekend) have returned with a
>vengeance, even with the parameter set to 56kb.  I think I'm in the
>nether world of flaky cables/modems, and will pursue it on that level.

We have a 9.6kB line on which we see such a similar occurence.

We hypothesized that what's happenning is that the line is saturating
with data, and then some, causing even the router's memory to fill up.
Thereafter, once the router's miss 3 hdlc keepalive patterns, they reset
the line etc. etc. etc.

The salesperson's suggestion was that we turn off the keepalives on that
circuit, so that such interface resets did not happen.  You might try
that.

We did not try that, because we were actually working on a different
problem, and thought this was the cause of it, when it wasn't.  Having
fixed the other problem, we curled into our own cute li'll cubby holes,
and went back to hibernation yet again :-).

Check the throughput and errors on the line, and see if they occur when
there is a large volume of data flowing through them.   You could
recreate this by, say, flood-pinging the line with ultra-large packets,
and watching the status of the line.  Track traffic patterns for a
while, and see if there is any co-relation between these and interface
resets.


Best I can think of in a pinch...hope these help,

Kannan
-- 
Kannan Varadhan, Internet Engineer, OARNet
Ohio Supercomputer Center, Columbus, OH 43212	+1 (614) 292-4137
email:	kannan@oar.net	|  osu-cis!malgudi.oar.net!kannan

fortinp@bcars223.bnr.ca (Pierre Fortin) (09/11/90)

In article <3970@ursa-major.SPDCC.COM>, dyer@spdcc.COM (Steve Dyer) writes:
> 
> In any event, the "fix" appears to have been a coincidence.  Having moved the
> equipment to its permanent resting place, the carrier transitions and
> interface resets (after a stable, peaceful weekend) have returned with a
> vengeance, even with the parameter set to 56kb.  I think I'm in the
> nether world of flaky cables/modems, and will pursue it on that level.

Here's a short list of the V.35 problems I've encountered over the last 
17 months:
 
  - V.35 applique Rev 3: inverted clocks; mods applied to some boards

  - V.35 applique Rev 4: inverted clocks; mods for Rev 3 boards applied
                         inadvertently to these boards (I have personally
                         seen at least three different attempts at clearing
                         up this problem)
                         We even had one unit in which the RxD line was 
                         leaking back out over one of the clock leads,
                         giving the appearance of a bad DCE.

  - DL551V T1 CSU/DSU:   most units bad (power supply drifting, incorrect
                         factory options, bad repairs, poor QC, etc.)  All
                         units were to be returned to Digital Link for 
                         checkout; don't know current status of this.

  - Screws (Yes, SCREWS!):  WHY is it that something as simple as a screw 
                         (actually screw threads) can cause problems; I 
                         guess we need patience testers...   V.35 connectors
                         are available with EITHER single- or double-helix
                         retaining screws!  Go figure...

  - V.35 cables:         Nearly all cables we tested were of poor quality.
                         We designed our own cable (verified to 70 feet)
                         where each pair is individually shielded (with the
                         source end only grounded); then an overall shield
                         grounded at both ends via a six-inch pig-tail to 
                         a spade-lug.  Major improvements!!!

  - MCI HDLC controller: The Rockwell HDLC controller chips with a date-code
                         prior to a certain date (contact cisco for date)
                         had bugs when the applied voltage was less than 
                         EXACTLY 5.00V.  In some literature I received from
                         cisco, there was a small piece of pink paper which
                         said that this should only affect short X.25 packets
                         with odd packet lengths; but, where there's smoke...

  - V.35 applique Rev 6: This applique is correct.  We are continuing to 
                         replace all pre-Rev-6 appliques with these newer ones.

  - Operations:          Our operations personnel used to make statements 
                         like:  "The CSU/DSUs never work in loopback"; this
                         has since been corrected.  Any comm gear which does
                         not work in loopback when the manufacturer claims
                         it does should be highly suspect.

                         In another instance, two links from different 
                         remotes (should have been [A]---[B0&B1]---[C]) were
                         connected incorrectly ([A]---[B1&B0]--[C]).  With
                         our fully redundant mesh topology, the ciscos never
                         complained, but you should have seen the highly
                         inefficient routing, BUT IT WORKED from the users'
                         perspective (a tip of the hat to cisco!!).

  - "Nah!" ;^)           Someone I talked to in another company which shall
                         forever remain nameless managed to find a cable and
                         screw it in.  Upon closer inspection, the cable was
                         found to have a female connector.  If you fail to 
                         spot the humor in this, check the sex of the V.35
                         appliques on your cisco...
  
These are the major points I can think of off the top of my head at this time
(it's 02:45).  
 
The general thrust of my message here is that, for what should be a mature
interface, expect problems ANYWHERE.  Likely you will find MORE than one 
problem.  Perhaps this has to do with the fact (someone please prove me wrong)
that there is no V.35 standard beyond the original which only covers the
approx. 40KB speed.  
 
Other things to note about V.35:
  - the DCE supplies both clocks SCR and SCT
  - the DTE *returns* the Tx clock (SCTE) to the DCE to account for 
    varying cable lengths at higher speeds 
  - double-check cable polarities on ALL pairs
 
Otherwise, V.35 was a piece of cake....  in the face!

...and now we're getting ready (hah!!) for the onslaught of T3 equipment...
Huh?  Why am I standing on this chair with a V.35 cable around my neck?

  
> 
> -- 
> Steve Dyer
> dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
> dyer@arktouros.mit.edu, dyer@hstbme.mit.edu

Pierre Fortin
fortinp@bnr.ca

P.S.: If enough people send in their $35 (easy to remember ;^) ), we will
      seriously consider publishing  "Living with V.35" complete with over
      400 pages of tests, results, multi-channel scope traces, levels,
      power supply voltages, pictures of modifications, plus much more...
      :^)    :^)    P^)   ;^)

fortinp@bcars223.bnr.ca (Pierre Fortin) (09/11/90)

In article <890@manhandler.osc.edu>, kannan@osc.edu (Kannan Varadhan) writes:
> We hypothesized that what's happenning is that the line is saturating
> with data, and then some, causing even the router's memory to fill up.
> Thereafter, once the router's miss 3 hdlc keepalive patterns, they reset
> the line etc. etc. etc.

This reminds me:  when using the older CSC-T cards, always "shutdown" any
unused interfaces.  I don't recall whether the cisco crashed, or the in-service
links had problems, but issuing the shutdown command on all unused links 
cleared up our problems (this was about a year ago).  I don't recall any 
similar problems with the cisco designed cards (MCI & SCI), but I wasn't 
about to take a chance, so the rule here is "shutdown".

> 
> Best I can think of in a pinch...hope these help,

See my other posting re "V.35 problems"...
> 
> Kannan
> -- 
> Kannan Varadhan, Internet Engineer, OARNet
> Ohio Supercomputer Center, Columbus, OH 43212	+1 (614) 292-4137
> email:	kannan@oar.net	|  osu-cis!malgudi.oar.net!kannan

Pierre Fortin
fortinp@bnr.ca