enger@SEKA.SCC.COM (Robert M. Enger) (03/19/90)
Folks: As many of you unfortunately know, Pronet-80 boards can exhibit a sensitivity to the contents of the frames they are asked to convey. 33hex is always specified as one of the data types that annoys the Pronet-80 boards. Does anyone know what 33Hex maps into under the 4-into-6 coding used by Proteon? Is 33hex the worst case test for the Pronet-80 given the 4-into-6 mapping they use? If not, is there a more sensitive test that we can use to detect when our boards are on their way to the great repair-depot in the sky? Can anyone offer a technical explanation of the situation? Is it that the "rf" (120mbps) stages become miss-tuned, causing waveform deformation beyond the ability of the receivers to acurately "demodulate" (loose terminology) the signal? Why is their design so sensitive? These boards keep going bad. Are the drivers (amplifiers) too weak to fight the cable capacitance or something? Has anyone found a way to "help" the situation? Perhaps a way to appease the deficiency of the design (or whatever the problem is) by using lower-capacitance cabling, or cabling with a different characteristic impedence, etc?? Thanks for any insight anyone has to offer, Bob Enger Contel Federal Systems enger@seka.scc.com
sting@LAOTSE.CAM.NIST.GOV (s. ting) (03/19/90)
Our ProNET-80 ring has experienced the problem. It has not been resolved yet. I too would like Proteon to explain the problem in great detail and present, if there is one, an effective solution. Questions: . What cause the p3280 or ProNET-80 CTL-card out of alignment? . Why they become out of alignment so easily? . Why they have problems with those characters like x33? . What is the quick and effective way to find which p3280 or CTL card among the many on the ring already out of alignment? . Is it true that Proteon, concentrating on FDDI replacing ProNET-80, does not give full effort in resolving the problem? . Should a customer without hardware maintenance agreement for those out-of-aligned units pay for the repair cost? Michael Ting, NIST
CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) (03/20/90)
Hi, We have had a fair amount of experience with this problem here at UC Berkeley, and I think we have essentially banished it in this form. With Proteon's help, you should be able to also. > Does anyone know what 33Hex maps into under the 4-into-6 bit coding > used by Proteon? etc... Hex 33 is useful because it maps to the ascii alpha character "3", so you can easily fill a file with the letter "3" (but don't put too many newlines or carriage returns in). Hex 33 maps into: 100011100011 which when followed by another hex 33 becomes a series of 3 zeros followed by 3 ones. I believe it is the lack of transitions over 3 bits that is hard for controllers that are drifting out of spec. There are several other data patterns that are at least as bad as this, hex: 36, 63, 66 BE, BB, EB, EE, and undoubtedly more. > Can anyone offer a technical explanation of the situation? Is it that > the "rf" (120mbps) stages become miss-tuned, ...etc? Well, I'm a software kind of guy, and our hardware techs sometimes use the phrase "programmer with a screwdriver" in a sarcastic way, so take what I say with some sized grain of salt. ;-) Each active device on the p80 ring reads the data that comes in using its own clock to decode it. If the data is for a node downstream, the device regenerates it, again using its own clock. This means that all the devices on the ring had better have clocks that are in close alignment with eachother. The clocks are all supposed to be at 120Mhz +/- a tiny fraction (10Khz?). These clocks tick totally independently of eachother, there is no "master" clock. This design appears (to me) to lead to some difficult debugging situations. You can have a ring that is working ok but has some clocks at the ragged edge, introduce a new node and all of a sudden your ring is shot. The new node may actually be OK, but you might "fix" the problems by putting in a different controller. Or you might "fix" the problems by plugging the controllers in in a different order. P3280s seem to have the worst problems. Maybe it's because they have two independent clocks, or maybe because they get too hot in their little boxes or maybe their circuitry is really different (big help, huh?). > Has anyone found a way to "help" the situation? >> What is the quick and effective way to find which p3280 or CTL >> card among the many on the ring already out of alignment? The only way I know how to deal with this requires real work, but it is what you have to do: 1) First you have to determine what order the nodes are in the ring. This is crucial because of the way the data is clocked and regenerated by each node. In order to pinpoint a problem node you have to know the exact path that data will take through your ring. To do this, you go and look at your wire center. Data will flow around it in a counter-clockwise direction. IMPORTANT: You have to realize that at the link level each packet is going to go all the way around the ring. Node A sends it to node B, and if all goes well node B sends it back with the ACK bit set. If all doesn't go well (either the ACK bit is off or the packet is trashed), node A will retransmit the packet (up to several times). You need to keep this in mind. This is the root mechanism that causes duplicate packets to show up. Also, if the path from B to A is bad, A will spend a certain amount of time retransmitting unnecessarily and this will slow down throughput from A to B--although not nearly as much as from B to A. 2) Next you have to have a way to test each node. Let's say you have p4200 routers which have a p80 interface and some others, say an ethernet. You need access to one of the ethernets from each router. What you do is ship data across the your ring. From point A to point B you ship (eg) a file with nothing but 3's in it. Then you ship the same size file with 1's in it (1's are inocuous). Then do the same tests from B to A. -If the 3's are causing problems, you will see very different throughput rates. -If there is only one broken node in the ring you will see that the throughput for the 3's file is dramatically worse in one direction than the other. -If there are several broken nodes in the ring you have a much more difficult hunt, but you can USUALLY get pretty far if not all the way. I've seen some strange things with this. Sometimes I've had to reorder things in the ring to find a bad component. 3) If you note any funnyness across your p3280 links get your p3280s upgraded to the latest revs. We have not had this problem with our p3280s since we did this. (We have had a couple of total failures, but that is at least pretty easily identifiable.) ===== I have some tools that can help. They are available for anonymous ftp from jade.berkeley.edu (128.32.136.9). 1) pub/ping.c and pub/ping.8: This lets you specify the data fill problem for the packets sent. This helps you spot the problem early. Since each ping packet goes in both directions it is no help in pinpointing the problem. 2) pub/netout.c: This sends data to the TCP discard port of a remote machine. You can specify the data fill pattern. This is easier to use for pinpointing things than ftp, since you don't need an account on the remote host. Unfortunately, not everybody has implemented the TCP discard port code. ===== We can identify when we are starting to have problems in a couple of ways. One is from SNMP collected output errors on the p80 ring interfaces. Another is looking at "T 2" in the router consoles and seeing lots of 8704 errors on the p80 interfaces. "Lots" is defined very fuzzily in my mind--it's based on experience... I don't mind discussing these problems with folks. I hope this is helpful to someone, my hands are tired. ;-) Cliff Frost (415) 642-5360 Central Computing Services <cliff@berkeley.edu> University of California CLIFF AT UCBCMSA Berkeley, CA 94720
enger@SEKA.SCC.COM (Robert M. Enger) (03/20/90)
Cliff:
Thanks for the info. Yes, it is helpfull.
Do you have the mapping table for the 4-> coding used by proteon?
Can I get a copy?
Do any of the allowed codes result in strings of four zeros, four ones?
Are any of the pronet-80 interface statistics of any real use in
locating the culprit? Given the frequency and magnitude of the
problem, I would hope that some parameter they report is usefull.
I do not have any fiber equipment here to blame the problem on.
I have 8 P4200s, sitting in the same room, connected to a
ganged wire-center. Intuition would lead one to believe
that this would be a "piece of cake" installation.
Since all boxes are subject to the same environmental conditions
all the clocks and chips should suffer temperature drift in the
same direction, if not the same amount. The AC power supplied
to all the units should be closely matched. Even the pronet-80
cables are short, so capacitive effects should be small.
I guess I should express an overdue thanks. We have been using
the special ping program from jade for quite some time.
That is how we have been poking at the ring!
>From you description of the problem, is it correct to sum up that
the problem is not one of wave-form deformation, but rather
receiver time-base instability? (ie, receiver can't be trusted
to sample signal near the middle of the bit-time?)
thanks,
Bob
CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) (03/20/90)
Bob, > From you description of the problem, is it correct to sum up that > the problem is not one of wave-form deformation, but rather > receiver time-base instability? (ie, receiver can't be trusted > to sample signal near the middle of the bit-time?) Keep in mind what I said about grains of salt. I *think* this is correct, but I'm not an engineer by any means. There may be more than one thing going on, this is the only one we've gotten a handle on. > Do you have the mapping table for the 4-> coding used by proteon? > Can I get a copy? I have a reprint of an article by Howard Salwen, Alan C. Marshall, and Nathan K. Salwen. I have no recollection of where it came from, you should ask Proteon for a copy. It's called "ProNET An 80 MBIT/S Token Ring For High-Speed LAN Applications". > Do any of the allowed codes result in strings of four zeros, four ones? No. I think you can get runs of 4 and 3, but not 4 and 4. > I do not have any fiber equipment here to blame the problem on. > I have 8 P4200s, sitting in the same room, connected to a > ganged wire-center. Intuition would lead one to believe > that this would be a "piece of cake" installation. Are you hinting something nasty about Proteon's Quality Assurance? Or, are you asking for advice on how to proceed? ;-) I would try the file transfer method. If it works you will be left with the question: "is it the transmitter or the receiver?". Ie--it won't tell you exactly which p4200 is having the problem. What you do then is swap the p4200's positions in the wire center. In the "classic" case the problem follows one or the other. In the nasty case the problem goes away for a while or does something else weird. > Are any of the pronet-80 interface statistics of any real use in > locating the culprit? Given the frequency and magnitude of the > problem, I would hope that some parameter they report is usefull. The "Output bad format" and "Input parity error" counters are useful to watch. Proteon will tell you a trouble-shooting method that uses the Input parity error counter. It has sometimes been useful here, and I should have mentioned it. Cliff