kwe@bu-cs.BU.EDU (kwe@bu-it.bu.edu (Kent W. England)) (01/13/89)
I tracked some local reachability problems back to a RIP interaction between my cisco jvnc-net router and one of my local p4200s. I had one or two of my 18 local subnets that were coming and going in the cisco routing table. My campus backbone is a Pronet-80 with p4200s on it. The jvnc-gw lives on one subnet and gets its other local subnet routes from a backbone p4200. This p4200 insisted it had routes to all my valid subnets and all the other hosts on campus agreed. Why were some local subnet routes in my cisco jvnc-gw coming and going? My cisco jvnc-gw advertises, using RIP, about 325 networks onto this subnet that are reachable thru jvncnet. My p4200 advertises all the other 17 local subnets, one other external link, and was sending poison reverse entries for all the jvncnet nets (325 entries!) back out onto this poor subnet. Turns out I had the p4200 in question configured to send net routes, which is unnecessary, but it set up this interesting problem. So here was a cisco and a p4200 each RIPing over 300 entries every 30 sec. To make matters worse, the two routers were synchronized, spewing forth RIP updates simultaneously. I am not sure whether the p4200 is able to process all the cisco table entries. There are 325 entries and I haven't done a line-by-line comparison. (A little more time with SNMP would help.) However, I know the cisco wasn't able to process all the p4200 entries. I know that because I could easily eyeball 18 subnet entries. The Sun hosts on this same subnet had no trouble keeping all 18 subnets in their tables, but they weren't trying to process 325 table entries at the same time they were trying to send 325 table entries. I reconfigured the p4200 to stop sending all those poison reverse routes back out to the jvnc-gw. That dropped the update size to 19 entries. The cisco has no trouble now. So long as only one router is sending mega RIP updates there is no problem. I fixed the situation by reconfiguring my p4200, but what if this p4200 needed to advertise all the arpanet connectivity? The regional router and the arpanet router would get sync'ed together, get overloaded trying to send and receive at the same time, and reachability would get flaky. Has anyone else seen symptoms like these trying to juggle 300 net entries between external well-connected routers using RIP? How are others dealing with this situation? EGP? Defaults? Which router is improperly synchronizing? Kent England, Boston University
fedor@PATTON.NYSER.NET (01/14/89)
Kent, This is an old story, but..... While doing gated development back at cornell, I noticed routes mysteriously oscillating. Routes were even being deleted every now and then. It took many "netstat -r -n | grep <network>", gated logs, "t 2's" on the proteon, and a bunch of windows on my 3B2 (gag here) to figure out that our VAX 750 gateways (yes, gag again) were dropping RIP packets off the end of the receive buffer queue. This was back at 100-150 networks. The p4200's kept up fine. The Vaxen couldn't process the RIP packets fast enough. I just increased the receive buffer size, but that was only delaying the inevitable. I'm sure that the cisco's and proteons have a limit also.... Who would have ever thought that we would be RIPping 350+ networks around. Sounds like a real-life Stephen King Horror...... Cheers, Mark