JMWOBUS@SUVM.BITNET ("John M. Wobus") (06/02/89)
We've had trouble with our P4200s dropping routes, presumably because they are discarding incoming RIP information. Proteon suggested we spend more money replacing what they just sold us with their newer equipment, which we have done to some extent. We have also eliminated all RIP information about networks other than or own subnets from our network (using static default routes instead). Has anyone else had this sort of problem with P4200s? If so, how did you solve it? It has bothered us to no end that we bought into this stuff, then had to spend additional money buying new versions before we got a working network (by "working network", I mean a network which would not spontaneously disconnect telnet users several times a day). Also, it strikes me that software can be written to do other things besides drop routes when things get busy--we've never had such a problem with our non-Proteon routers. It seems to me that each of our remaining P4200-10s is a time-bomb ready to start killing routes again when things get busy, and that our P4200-31s would do the same at some busier level, given the priorities of the software. John Wobus Syracuse University
leehm@ITSGW.RPI.EDU (Herb Lee) (06/02/89)
> We've had trouble with our P4200s dropping routes, presumably because they > are discarding incoming RIP information... We have seen similar problems here, but they have been sporadic and therefore difficult to track down. In our case, however, the problem did not seem to be related to unusually busy conditions - telnet calls were cleared under both heavy and ligh load periods. Not all of our P4200s are at rev 8 load, and so I have been biding time until all the routers were at the same rev before pursuing this. But can you elaborate a bit on the recommended "new equipment"? Is Proteon's position that they will not address the problem in "old" boxes? We did considered but did not use the static route solution, and for the past few months (including end of semester crunch) have not had the problem recur. I do share your concerns about a lurking problem, and hope that we can identify a more proactive approach. Herb Lee Rensselaer Polytechnic Institute
kwe@BU-IT.BU.EDU (06/03/89)
In article <8906021509.AA21099@devvax.TN.CORNELL.EDU> you write: >We've had trouble with our P4200s dropping routes, presumably because they >are discarding incoming RIP information. Proteon suggested we spend more >money replacing what they just sold us with their newer equipment, which >we have done to some extent. We have also eliminated all RIP information >about networks other than or own subnets from our network (using static >default routes instead). > >Has anyone else had this sort of problem with P4200s? Not exactly. More info needed. Are you using a default *net* route? What about a default subnet route? Are the routes that are getting dropped subnet routes or net routes? If the problem is with net routes, how many net routes are there? All the NSFnet or all the arpanet or both? It is possible that some router is overloaded and is not sending out routing updates sufficiently often. This is particularly likely with gated routers, since the RIP process is a user process and routing is in the kernel. We have been bitten by this when the gated router gets clogged with traffic. Look to the default net router and its interaction with the external net router. If it is subnet routes getting dropped, make sure it isn't something simple. Make sure that subnets are being advertised everywhere ["enable sending subnet routes" on all your subnetted interfaces.] and that there is no default subnet (at least until you fix the problem). I apologize if this is so trivial that you have already done all these things. If it isn't something obvious like this, then I can say I have never seen anything like that with subnet routes before. :-) BTW, what ever happened to the p4200 and info-proteon merger? Is it like the CSnet/BITnet merger? :-) Kent England, Boston University
medin@NSIPO.NASA.GOV ("Milo S. Medin", NASA ARC NSI Project Office) (06/03/89)
I've not seen this myself, and if you have relatively few nets bouncing around your swamp, you shouldn't be seeing this problem. We run p4200-10's in a fairly large net with no problems. Do you have a new ethernet controller? Are you running DECNET? How many routes are you dealing with? As for not deleting routes upon the timeout of that route, the RIP spec says you're supposed to do that. Ford algorithms have enough problems as it is without 'hold-ups' causing more pain. They might buy you something in your case, but could mess up the net you are connected to. Thanks, Milo
kurt@pprg.unm.edu (Kurt Zeilenga [LANL]) (06/03/89)
We are having similiar problems.... We have a P4200 with two 56kbps, a T1, and two ethernets. The routing information coming in (RIP) one of the 56kbps is always getting timed out. None of the interfaces on any of the four p4200s have this problem. Our temporary solution was to use static routing... (we already had infinity static routes used to restrict outgoing info... so just changed the 16 to a 6). We don't have a real solution... We also went from using the "psuedo" IP addresses to using real I addresses and that didn't help. Here is our config.... the problem is with the routes is with intf 1 (192.43.188.193) and routes coming from (192.43.188.194). Any help would be greatly appreciated.... Kurt UNM> Hostname: pro.unm.edu UNM> Maximum packet size: [autoconfigured] UNM> Maximum number of buffers: [autoconfigured] UNM> Number of Restarts before a Reload/Dump: 64 UNM> Logging disposition: detached UNM> Console inactivity timer (minutes): 30 UNM> Physical console login: enabled UNM> Modem control: disabled UNM> UNM> Configurable Protocols: UNM> 0 DOD-IP UNM> 3 Address Resolution UNM> 9 Authentication UNM> 10 Simple GW Monitoring UNM> UNM> 3226 bytes of configuration memory free UNM UNM> Portable MC68010 C Gateway pro.unm.edu S/N 1654 V8.0 UNM> Boot ROM version 5.1 UNM> UNM> 4 Protocols: UNM> Num Nm Protocol UNM> 0 IP DOD-IP UNM> 3 ARP Address Resolution UNM> 9 AP Authentication UNM> 10 GMP Simple GW Monitoring UNM> UNM> Ifc 0 (COM-2 High speed sync line): CSR 900500, CSR2 FF0000, vector 0 UNM> Ifc 1 (COM-2 High speed sync line): CSR 900540, CSR2 FF0000, vector 0 UNM> Ifc 2 (COM-2 High speed sync line): CSR A00500, CSR2 FF0010, vector 1 UNM> Ifc 3 (Proteon Ethernet): CSR FF4280, vector 2 UNM> Ifc 4 (Proteon Ethernet): CSR FF4380, vector 3 UNM UNM> Interface addresses UNM> IP addresses for each interface: UNM> intf 0 192.43.188.65 Local wire broadcast, fill 1 UNM> intf 1 192.43.188.193 Local wire broadcast, fill 1 UNM> intf 2 192.43.188.129 Local wire broadcast, fill 1 UNM> intf 3 192.31.154.9 Network broadcast, fill 1 UNM> intf 4 129.24.13.39 Network broadcast, fill 1 UNM> UNM> Routing UNM> Default network gateway: 192.43.188.130, cost 14 UNM> UNM> Subnetted net Subnet mask Default gateway UNM> 129.24.0.0 FFFFF800 129.24.8.193 UNM> 192.43.188.0 FFFFFFFC 192.43.188.129 UNM> UNM> route to 192.5.175.0 via 192.43.188.66, 16 hops UNM> route to 192.5.196.0 via 192.43.188.66, 16 hops UNM> route to 192.5.197.0 via 192.43.188.66, 16 hops UNM> route to 192.5.198.0 via 192.43.188.66, 16 hops UNM> route to 192.5.199.0 via 192.43.188.66, 16 hops UNM> route to 192.5.200.0 via 192.43.188.66, 16 hops UNM> route to 128.149.0.0 via 192.43.188.194, 6 hops UNM> route to 192.12.18.0 via 192.43.188.194, 6 hops UNM> route to 192.12.19.0 via 192.43.188.194, 6 hops UNM> route to 192.12.20.0 via 192.43.188.194, 6 hops UNM> route to 192.31.43.0 via 192.43.188.194, 6 hops UNM> route to 192.31.92.0 via 192.43.188.194, 6 hops UNM> route to 192.31.93.0 via 192.43.188.194, 6 hops UNM> route to 192.31.237.0 via 192.43.188.194, 6 hops UNM> route to 192.33.14.0 via 192.43.188.194, 6 hops UNM> route to 192.33.17.0 via 192.43.188.194, 6 hops UNM> route to 192.31.163.0 via 192.43.188.194, 6 hops UNM> route to 192.41.208.0 via 192.43.188.194, 6 hops UNM> route to 192.12.184.0 via 192.43.188.130, 16 hops UNM> route to 192.41.163.0 via 192.43.188.194, 6 hops UNM> route to 130.202.0.0 via 192.43.188.66, 16 hops UNM> route to 131.215.0.0 via 192.43.188.194, 2 hops UNM> UNM> Protocols UNM> ARP Subnet routing: disabled UNM> RIP: enabled UNM> Originate default on UNM> Per-interface address flags: UNM> intf 0 192.43.188.65 Send static routes UNM> Override static routes UNM> intf 1 192.43.188.193 Send static routes UNM> Override static routes UNM> intf 2 192.43.188.129 Send static routes UNM> Override static routes UNM> intf 3 192.31.154.9 Send static and default routes UNM> Received RIP packets are ignored. UNM> Override static routes UNM> intf 4 129.24.13.39 Send subnet and static routes UNM> Override static routes UNM> EGP: disabled UNM> Routing information interchange: off UNM> Advertise RIP or static route hop count as EGP metric. UNM> UNM> EGP neighbors: UNM> [NONE] I cannot seem to get to the other box at the moment..... mmmmm.... well, you get the idea, I hope.
kurt@pprg.unm.edu (Kurt Zeilenga [LANL]) (06/03/89)
More information.... The UNM proteon sees 480 routes... most come from a cisco via RIP on one of the ethernets. The rest come in via the serial lines (the ones I have statics for). We've checked everything simple... like broadcasting addresses (remember, we do get occasional updates, but we miss more then we get).. etc. The sending proteon has about 30 routes, most from the UNM proteon. Others come from various sources... all regularly (including those from UNM). UNM just doesn't see most of the sending RIP packets from this puppy. Oh, they are running 8.1 now (but this problem existed when we were running 8.0 and 7.whatever). Errors/ovfl are minimal.... Kurt
kurt@WON-TON.DSPO.GOV (Kurt Zeilenga) (06/03/89)
I guess I should clarify, the net looks like this: DSPO p4200 | | | Caltech p4200 ------- UNM p4200 ------- ANL p42000 / \ ====PPRGnet==== ======UNMnet==== \ / \ PPRG Sun GW \ UNM Cisco (Westnet) This is a backdoor network... only "local" routing info is exchanged. Right now, only 20 nets or so are configured to be passed between the p4200s. The UNM p4200 is dropping routes coming from Caltech (though not all updates). The UNM p4200 does see about 500 nets coming from the UNM Cisco (same as PPRG Sun does), but these routes are NOT exchanged via the backdoors. The other p4200 all pick up non-local routes, but only on the order of 30-50 routes (because they do more "default" routing then UNM does). Thanks for the help. Kurt
kwe@BU-IT.BU.EDU (06/03/89)
The problem is most likely between the unm p4200 and the westnet cisco. The unm p4200 is hitting the westnet cisco with 500 routes (500 + 20 from behind unm-p4200) most of which are poison reverse (metric 16) routes. The cisco could be losing some of the 20 routes from behind unm-p4200 in the mass of poison-routes. The p4200 should not be losing updates from the westnet cisco. You should have a look at the RIP exchanges going on on UNMnet. You could use the PPRG Sun GW and tcpdump. Then you should have a look at the debugging console data on the westnet-cisco and watch your 20 local nets behind unm-p4200 come and go. Do the same on the unm-p4200 and see if any interesting routes are coming and going. Also, the unm-p4200 and the PPRG-Sun-GW could be sync'ed on RIP updates leading to ~1000 routes hitting the cisco at the same time. Tough on the old bird. Likewise the Sun and cisco could be pounding the p4200. See what tcpdump says. You may have to turn off routing on the Sun-GW to make sure that tcpdump gets enough cpu and all the packets it can. Good luck. Kent England, Boston University
dlw@VIOLET.BERKELEY.EDU (David Wasley) (06/03/89)
Depending on how old the software is in the cisco, it may be dropping RIP packets. There was a bug whereby they could only queue at most 5 packets for the RIP process before starting to drop them. David (This was fixed about 9 months ago, as I recall.)
rrv@uxc.cso.uiuc.edu (06/04/89)
Hi Kurt, We struggled for a long time with problems that somewhat resemble yours. It turned out to be some kind of hardware problem in the slow speed serial boards of the p4200. As I recall the receiver side of the USART chip went deaf until it was reset as the result of a maintainence test failure. The t 2 process offers some clue that the tests are failing. While the chip was deaf, routes would be dropped and unreachables would be issued. If your Proteon person can't help, I'll put you in touch with mine. Ross