[comp.sys.proteon] Serial Line Problems

stan@H1.GCY.NYTEL.COM (07/24/88)

For the rest of you on the p4200 net, Cliff Frost and I had a long telephone
conversation on Friday about his message regarding serial line problems.
We are experiencing identical, unexplained failures on some of our DDS serial
lines.  Other DDS lines that appear to be exactly the same, do not exhibit
any of the maintenance/self-test failure modes.  One of Cliff's ideas was
that Proteon may be executing the self-test mode after one maintenance
failure, which contradicts the literature that says it does it after three
maintenance failures.  

I have access to a lot of test equipment and a DDS link that runs
approximately 25 miles from my location to a relatively lightly loaded p4200
with two ethernets on it.  Both boxes are running 8.0 software with all the 
known ECO's.  Both boxes are using COM-2 boards for their serial links. I
hooked up an HP4955 HDLC monitor to the link and waited for an idle traffic
sequence.  I then disabled the serial port at the local end.  

	local p4200	     |	 remote p4200
---|-------------------------|----------------------------
 1 |			     | <- 08 00 maint
 2 |   08 00 maint ->        |
 3 |  port disabled          |
 4 |			     | <- 08 00 maint
 5 |			     | <- 08 00 maint
 6 |			     | <- 08 00 maint
 7 |			     | <- 01 00 - 3E self test

As you can see, the remote Proteon sent three maintenance packets and then
went into the self-test mode.  Sorry, Cliff, but that doesn't seem to be the
problem.

I then wrote a small program on the HP analyzer to trap the self-test frames
and it wasn't long before I captured one.  The link that I am testing has
an error rate of about 10**-5 at the remote receive.  The local receive is
better than 10**-8.  

	local p4200	     |	 remote p4200
---|-------------------------|----------------------------
 1 |			     | <- 08 00 maint
 2 |   08 00 maint ->        |
 3 |   01 00 - 45 data ->    |
 4 |   01 00 - 45 data ->    |
 5 |   01 00 - 45 data ->    |
 6 |   01 00 - 45 data ->    |
 7 |   01 00 - 45 data ->    |
 8 |   01 00 - 45 data ->    |
 9 |   01 00 - 45 data ->    |
10 |   01 00 - 45 data ->    |
11 |   01 00 - 45 data ->    |
12 |   01 00 - 45 data ->    |
13 |			     | <- 08 00 maint
14 |   08 00 maint  ->	     |
15 |   08 00 maint  ->	     |
16 |			     | <- 01 00 - 3E self test
17 |   02 00 ACK   ->        |
18 |   08 00 maint  ->	     |
19 |			     | <-  08 00 maint
20 |   08 00 maint  ->	     |
21 |			     | <-  08 00 maint


As you can see from the above sequence, everything was normal in exchanges
one and two.  The local Proteon then sent two ping frames, followed by 
eight RIP frames (3 through 12).  The remote sent a maintenance and the 
local sent two maintenances immediately thereafter, but the remote Proteon
then went into self-test.  I have two theories about this problem.

A.  One or more maintenance packets from the local Proteon were lost due to
DDS line errors.  I can't tell what actually happened because I don't know
what was received at the remote end.

B.  The theory I like the most.  After watching these two boxes for a couple
of hours on the monitor in an attempt to figure out the protocol, timing
patterns started to materialize.  It appears that data has a higher priority
for transmit than maintenance packets.  I would see a large exchange of data
followed by numerous maintenance packets from both ends, not spaced evenly
over their idle line time of four seconds. It might be that frames 14 and
15 from the local Proteon did not make it to the remote Proteon in time to 
prevent the self-test sequence due to the number of consecutive data frames
it had just sent. 

Could it be that Proteon has a queuing/timing problem that can be 
exacerbated by serial line errors?  Any ideas/comments?

Stan
-----------------------------------------------------------------------
Real Name : Stanfield L. Smith		 E-mail : stan@h1.gcy.nytel.com
Company   : New York Telephone Co.	 USmail : Room 203 LAB
Phone     : 516-294-7170	       		: 100 Garden City Plaza
FAX G3	  : 516-248-8489			: Garden City NY, 11530
-----------------------------------------------------------------------

CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) (07/25/88)

Hi,
Stan's work with the monitor is certainly helpful.  My theory that a
single maint-failure caused a self-test was based on the fact that
the logging in the p4200 (T 2) would show a single maint-fail and then
go into self-test.  Also, the Statistics counters climb in sequence.
Now, clearly, this could just be that no one bothers to report a maint-fail
condition until 3 maint-tests have failed in a row, but I haven't been
able to get confirmation of this from Proteon.

Next week we will be taking Torben's advice (and Stan's also) and spend
a lot of time on our cables.  I'm a little concerned because I think that
Stan has already done this at NYSERNet and it hasn't helped them, but it
sure won't hurt to try.
        Many thanks for the help,
                              Cliff

dlw@VIOLET.BERKELEY.EDU (David Wasley) (07/25/88)

I believe Proteon listens to this list: I would like to hear their
recommendation regarding cables, both for low speed (<= 64Kb/s) and
high speed connections between the COM boards and external equipment.
	David

jas@proteon.COM (John A. Shriver) (07/25/88)

We are indeed listening to this discussion very carefully, you may be
assured.  We want to see this problem solved.

We certainly recommend that your cables be built correctly.  The pairs
just have to be twisted correctly.  I guess a lot of vendors are silly
and pair the RS-449 cables 2-3, 4-5, 6-7, instead of the correct (but
more subtle) 2-20, 3-21, 4-22, 5-23, 6-24.  There are some specs on
cable in RS-422 in sections 4.3 and 7.1.  However, they never
explicitly say that you should pair the wires of one differential
pair, they assume common sense here.

V.35 is not as explicit about cables (in fact, the pinout is not in
V.35, only AT&T PUB 41450).  PUB 41450 says about the same thing about
cables as RS-422 does.  The V.35 voltages are much lower than RS-422
(+/- 0.55V as compared to +/- 6.0V).

Individual shields might be overkill, but it would depend on your
electrical environment, cable length, and common-mode rejection ratio
of the DSU/CSU.  Our DDS line works fine with just twisted pairs, but
our cable is maybe 5 or 10 meters long.  We use a "genuine Bell" AT&T
2556 DSU/CSU.

Obviously, everything is more critical at T1.

eshop@JUPITER.UCSC.EDU (Jim Warner) (07/26/88)

>Obviously, everything is more critical at T1.

Statements like this can be misleading.  Problems with cross
talk are related to the speed of the edges and not to e.g. the
fundamental frequency of square wave clocks.  

The pairs to the *most* careful with are the ones that are edge
sensitive.  In RS-449, that's TT, ST and RT.

jim warner