stan@H1.GCY.NYTEL.COM (07/24/88)
For the rest of you on the p4200 net, Cliff Frost and I had a long telephone conversation on Friday about his message regarding serial line problems. We are experiencing identical, unexplained failures on some of our DDS serial lines. Other DDS lines that appear to be exactly the same, do not exhibit any of the maintenance/self-test failure modes. One of Cliff's ideas was that Proteon may be executing the self-test mode after one maintenance failure, which contradicts the literature that says it does it after three maintenance failures. I have access to a lot of test equipment and a DDS link that runs approximately 25 miles from my location to a relatively lightly loaded p4200 with two ethernets on it. Both boxes are running 8.0 software with all the known ECO's. Both boxes are using COM-2 boards for their serial links. I hooked up an HP4955 HDLC monitor to the link and waited for an idle traffic sequence. I then disabled the serial port at the local end. local p4200 | remote p4200 ---|-------------------------|---------------------------- 1 | | <- 08 00 maint 2 | 08 00 maint -> | 3 | port disabled | 4 | | <- 08 00 maint 5 | | <- 08 00 maint 6 | | <- 08 00 maint 7 | | <- 01 00 - 3E self test As you can see, the remote Proteon sent three maintenance packets and then went into the self-test mode. Sorry, Cliff, but that doesn't seem to be the problem. I then wrote a small program on the HP analyzer to trap the self-test frames and it wasn't long before I captured one. The link that I am testing has an error rate of about 10**-5 at the remote receive. The local receive is better than 10**-8. local p4200 | remote p4200 ---|-------------------------|---------------------------- 1 | | <- 08 00 maint 2 | 08 00 maint -> | 3 | 01 00 - 45 data -> | 4 | 01 00 - 45 data -> | 5 | 01 00 - 45 data -> | 6 | 01 00 - 45 data -> | 7 | 01 00 - 45 data -> | 8 | 01 00 - 45 data -> | 9 | 01 00 - 45 data -> | 10 | 01 00 - 45 data -> | 11 | 01 00 - 45 data -> | 12 | 01 00 - 45 data -> | 13 | | <- 08 00 maint 14 | 08 00 maint -> | 15 | 08 00 maint -> | 16 | | <- 01 00 - 3E self test 17 | 02 00 ACK -> | 18 | 08 00 maint -> | 19 | | <- 08 00 maint 20 | 08 00 maint -> | 21 | | <- 08 00 maint As you can see from the above sequence, everything was normal in exchanges one and two. The local Proteon then sent two ping frames, followed by eight RIP frames (3 through 12). The remote sent a maintenance and the local sent two maintenances immediately thereafter, but the remote Proteon then went into self-test. I have two theories about this problem. A. One or more maintenance packets from the local Proteon were lost due to DDS line errors. I can't tell what actually happened because I don't know what was received at the remote end. B. The theory I like the most. After watching these two boxes for a couple of hours on the monitor in an attempt to figure out the protocol, timing patterns started to materialize. It appears that data has a higher priority for transmit than maintenance packets. I would see a large exchange of data followed by numerous maintenance packets from both ends, not spaced evenly over their idle line time of four seconds. It might be that frames 14 and 15 from the local Proteon did not make it to the remote Proteon in time to prevent the self-test sequence due to the number of consecutive data frames it had just sent. Could it be that Proteon has a queuing/timing problem that can be exacerbated by serial line errors? Any ideas/comments? Stan ----------------------------------------------------------------------- Real Name : Stanfield L. Smith E-mail : stan@h1.gcy.nytel.com Company : New York Telephone Co. USmail : Room 203 LAB Phone : 516-294-7170 : 100 Garden City Plaza FAX G3 : 516-248-8489 : Garden City NY, 11530 -----------------------------------------------------------------------
CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) (07/25/88)
Hi, Stan's work with the monitor is certainly helpful. My theory that a single maint-failure caused a self-test was based on the fact that the logging in the p4200 (T 2) would show a single maint-fail and then go into self-test. Also, the Statistics counters climb in sequence. Now, clearly, this could just be that no one bothers to report a maint-fail condition until 3 maint-tests have failed in a row, but I haven't been able to get confirmation of this from Proteon. Next week we will be taking Torben's advice (and Stan's also) and spend a lot of time on our cables. I'm a little concerned because I think that Stan has already done this at NYSERNet and it hasn't helped them, but it sure won't hurt to try. Many thanks for the help, Cliff
dlw@VIOLET.BERKELEY.EDU (David Wasley) (07/25/88)
I believe Proteon listens to this list: I would like to hear their recommendation regarding cables, both for low speed (<= 64Kb/s) and high speed connections between the COM boards and external equipment. David
jas@proteon.COM (John A. Shriver) (07/25/88)
We are indeed listening to this discussion very carefully, you may be assured. We want to see this problem solved. We certainly recommend that your cables be built correctly. The pairs just have to be twisted correctly. I guess a lot of vendors are silly and pair the RS-449 cables 2-3, 4-5, 6-7, instead of the correct (but more subtle) 2-20, 3-21, 4-22, 5-23, 6-24. There are some specs on cable in RS-422 in sections 4.3 and 7.1. However, they never explicitly say that you should pair the wires of one differential pair, they assume common sense here. V.35 is not as explicit about cables (in fact, the pinout is not in V.35, only AT&T PUB 41450). PUB 41450 says about the same thing about cables as RS-422 does. The V.35 voltages are much lower than RS-422 (+/- 0.55V as compared to +/- 6.0V). Individual shields might be overkill, but it would depend on your electrical environment, cable length, and common-mode rejection ratio of the DSU/CSU. Our DDS line works fine with just twisted pairs, but our cable is maybe 5 or 10 meters long. We use a "genuine Bell" AT&T 2556 DSU/CSU. Obviously, everything is more critical at T1.
eshop@JUPITER.UCSC.EDU (Jim Warner) (07/26/88)
>Obviously, everything is more critical at T1.
Statements like this can be misleading. Problems with cross
talk are related to the speed of the edges and not to e.g. the
fundamental frequency of square wave clocks.
The pairs to the *most* careful with are the ones that are edge
sensitive. In RS-449, that's TT, ST and RT.
jim warner