[comp.sys.proteon] Humongous RIP exchanges

kwe@bu-cs.BU.EDU (kwe@bu-it.bu.edu (Kent W. England)) (01/13/89)

	I tracked some local reachability problems back to a RIP
interaction between my cisco jvnc-net router and one of my local
p4200s.  I had one or two of my 18 local subnets that were coming and
going in the cisco routing table.
	My campus backbone is a Pronet-80 with p4200s on it.  The
jvnc-gw lives on one subnet and gets its other local subnet routes
from a backbone p4200.  This p4200 insisted it had routes to all my
valid subnets and all the other hosts on campus agreed.  Why were some
local subnet routes in my cisco jvnc-gw coming and going?

	My cisco jvnc-gw advertises, using RIP, about 325 networks
onto this subnet that are reachable thru jvncnet.  My p4200 advertises
all the other 17 local subnets, one other external link, and was
sending poison reverse entries for all the jvncnet nets (325 entries!)
back out onto this poor subnet.  Turns out I had the p4200 in question
configured to send net routes, which is unnecessary, but it set up
this interesting problem.  So here was a cisco and a p4200 each RIPing
over 300 entries every 30 sec.  To make matters worse, the two routers
were synchronized, spewing forth RIP updates simultaneously.

	I am not sure whether the p4200 is able to process all the
cisco table entries.  There are 325 entries and I haven't done a
line-by-line comparison.  (A little more time with SNMP would help.)
However, I know the cisco wasn't able to process all the p4200
entries.  I know that because I could easily eyeball 18 subnet
entries.  The Sun hosts on this same subnet had no trouble keeping all
18 subnets in their tables, but they weren't trying to process 325
table entries at the same time they were trying to send 325 table
entries.
	I reconfigured the p4200 to stop sending all those poison
reverse routes back out to the jvnc-gw.  That dropped the update size
to 19 entries.  The cisco has no trouble now.

	So long as only one router is sending mega RIP updates there
is no problem.  I fixed the situation by reconfiguring my p4200, but
what if this p4200 needed to advertise all the arpanet connectivity?
The regional router and the arpanet router would get sync'ed together,
get overloaded trying to send and receive at the same time, and
reachability would get flaky.

	Has anyone else seen symptoms like these trying to juggle 300
net entries between external well-connected routers using RIP?  How
are others dealing with this situation?  EGP?  Defaults?  Which router
is improperly synchronizing?

	Kent England, Boston University

fedor@PATTON.NYSER.NET (01/14/89)

	Kent,

	This is an old story, but.....  While doing gated development
	back at cornell,  I noticed routes mysteriously oscillating.
	Routes were even being deleted every now and then.

	It took many "netstat -r -n | grep <network>", gated logs,
	"t 2's" on the proteon, and a bunch of windows on my 3B2 (gag here)
	to figure out that our VAX 750 gateways (yes, gag again) were
	dropping RIP packets off the end of the receive buffer queue.

	This was back at 100-150 networks.  The p4200's kept up fine.
	The Vaxen couldn't process the RIP packets fast enough.  I just
	increased the receive buffer size, but that was only delaying the
	inevitable.

	I'm sure that the cisco's and proteons have a limit also....

	Who would have ever thought that we would be RIPping 350+ networks
	around.  Sounds like a real-life Stephen King Horror......

	Cheers,

	Mark