[comp.sys.proteon] P4200 problems.

JMWOBUS@SUVM.BITNET ("John M. Wobus") (06/02/89)

We've had trouble with our P4200s dropping routes, presumably because they
are discarding incoming RIP information.  Proteon suggested we spend more
money replacing what they just sold us with their newer equipment, which
we have done to some extent.  We have also eliminated all RIP information
about networks other than or own subnets from our network (using static
default routes instead).

Has anyone else had this sort of problem with P4200s?  If so, how did you
solve it?  It has bothered us to no end that we bought into this stuff,
then had to spend additional money buying new versions before we got a
working network (by "working network", I mean a network which would not
spontaneously disconnect telnet users several times a day).

Also, it strikes me that software can be written to do other things
besides drop routes when things get busy--we've never had such a problem
with our non-Proteon routers.  It seems to me that each of our remaining
P4200-10s is a time-bomb ready to start killing routes again when things
get busy, and that our P4200-31s would do the same at some busier level,
given the priorities of the software.

John Wobus
Syracuse University

leehm@ITSGW.RPI.EDU (Herb Lee) (06/02/89)

> We've had trouble with our P4200s dropping routes, presumably because they
> are discarding incoming RIP information...

We have seen similar problems here, but they have been sporadic and
therefore difficult to track down. In our case, however, the problem
did not seem to be related to unusually busy conditions - telnet calls
were cleared under both heavy and ligh load periods. Not all of our
P4200s are at rev 8 load, and so I have been biding time until all the
routers were at the same rev before pursuing this. But can you
elaborate a bit on the recommended "new equipment"? Is Proteon's
position that they will not address the problem in "old" boxes?

We did considered but did not use the static route solution, and for
the past few months (including end of semester crunch) have not had
the problem recur. I do share your concerns about a lurking problem,
and hope that we can identify a more proactive approach.

Herb Lee
Rensselaer Polytechnic Institute

kwe@BU-IT.BU.EDU (06/03/89)

In article <8906021509.AA21099@devvax.TN.CORNELL.EDU> you write:
>We've had trouble with our P4200s dropping routes, presumably because they
>are discarding incoming RIP information.  Proteon suggested we spend more
>money replacing what they just sold us with their newer equipment, which
>we have done to some extent.  We have also eliminated all RIP information
>about networks other than or own subnets from our network (using static
>default routes instead).
>
>Has anyone else had this sort of problem with P4200s? 

	Not exactly.  More info needed.  Are you using a default *net*
route?  What about a default subnet route?  Are the routes that are
getting dropped subnet routes or net routes?  

	If the problem is with net routes, how many net routes are
there?  All the NSFnet or all the arpanet or both?  It is possible
that some router is overloaded and is not sending out routing updates
sufficiently often.  This is particularly likely with gated routers,
since the RIP process is a user process and routing is in the kernel.
We have been bitten by this when the gated router gets clogged with
traffic.  Look to the default net router and its interaction with the
external net router.

	If it is subnet routes getting dropped, make sure it isn't
something simple.  Make sure that subnets are being advertised
everywhere ["enable sending subnet routes" on all your subnetted
interfaces.] and that there is no default subnet (at least until you
fix the problem).  I apologize if this is so trivial that you have
already done all these things.  If it isn't something obvious like
this, then I can say I have never seen anything like that with subnet
routes before.  :-)

	BTW, what ever happened to the p4200 and info-proteon merger?
Is it like the CSnet/BITnet merger?  :-)

	Kent England, Boston University

medin@NSIPO.NASA.GOV ("Milo S. Medin", NASA ARC NSI Project Office) (06/03/89)

I've not seen this myself, and if you have relatively few nets bouncing
around your swamp, you shouldn't be seeing this problem.  We run p4200-10's
in a fairly large net with no problems.  Do you have a new ethernet controller?
Are you running DECNET?  How many routes are you dealing with?

As for not deleting routes upon the timeout of that route, the RIP spec 
says you're supposed to do that.  Ford algorithms have enough problems
as it is without 'hold-ups' causing more pain.  They might buy you something
in your case, but could mess up the net you are connected to.

					Thanks,
					  Milo

kurt@pprg.unm.edu (Kurt Zeilenga [LANL]) (06/03/89)

We are having similiar problems....  We have a P4200 with two
56kbps, a T1, and two ethernets.  The routing information coming
in (RIP) one of the 56kbps is always getting timed out.  None of
the interfaces on any of the four p4200s have this problem.

Our temporary solution was to use static routing...  (we already
had infinity static routes used to restrict outgoing info... so
just changed the 16 to a 6).  We don't have a real solution...

We also went from using the "psuedo" IP addresses to using real
I addresses and that didn't help.

Here is our config.... the problem is with the routes is with
intf 1 (192.43.188.193) and routes coming from (192.43.188.194).

Any help would be greatly appreciated....

	Kurt

UNM> Hostname: pro.unm.edu
UNM> Maximum packet size: [autoconfigured]
UNM> Maximum number of buffers: [autoconfigured]
UNM> Number of Restarts before a Reload/Dump: 64
UNM> Logging disposition: detached
UNM> Console inactivity timer (minutes): 30
UNM> Physical console login: enabled
UNM> Modem control: disabled
UNM> 
UNM> Configurable Protocols:
UNM>   0  DOD-IP
UNM>   3  Address Resolution
UNM>   9  Authentication
UNM>  10  Simple GW Monitoring
UNM> 
UNM> 3226 bytes of configuration memory free
UNM
UNM> Portable MC68010 C Gateway pro.unm.edu S/N 1654 V8.0
UNM> Boot ROM version 5.1
UNM> 
UNM> 4 Protocols:
UNM> Num Nm  Protocol
UNM> 0   IP  DOD-IP
UNM> 3   ARP Address Resolution
UNM> 9   AP  Authentication
UNM> 10  GMP Simple GW Monitoring
UNM>
UNM> Ifc 0 (COM-2 High speed sync line): CSR 900500, CSR2 FF0000, vector 0
UNM> Ifc 1 (COM-2 High speed sync line): CSR 900540, CSR2 FF0000, vector 0
UNM> Ifc 2 (COM-2 High speed sync line): CSR A00500, CSR2 FF0010, vector 1
UNM> Ifc 3 (Proteon Ethernet): CSR FF4280, vector 2
UNM> Ifc 4 (Proteon Ethernet): CSR FF4380, vector 3
UNM
UNM> Interface addresses
UNM> IP addresses for each interface:
UNM>      intf  0   192.43.188.65    Local wire broadcast, fill 1    
UNM>      intf  1   192.43.188.193   Local wire broadcast, fill 1    
UNM>      intf  2   192.43.188.129   Local wire broadcast, fill 1    
UNM>      intf  3   192.31.154.9     Network broadcast,    fill 1    
UNM>      intf  4   129.24.13.39     Network broadcast,    fill 1    
UNM> 
UNM> Routing
UNM> Default network gateway: 192.43.188.130, cost 14
UNM> 
UNM> Subnetted net  Subnet mask    Default gateway
UNM> 129.24.0.0     FFFFF800       129.24.8.193   
UNM> 192.43.188.0   FFFFFFFC       192.43.188.129 
UNM> 
UNM> route to 192.5.175.0 via 192.43.188.66, 16 hops
UNM> route to 192.5.196.0 via 192.43.188.66, 16 hops
UNM> route to 192.5.197.0 via 192.43.188.66, 16 hops
UNM> route to 192.5.198.0 via 192.43.188.66, 16 hops
UNM> route to 192.5.199.0 via 192.43.188.66, 16 hops
UNM> route to 192.5.200.0 via 192.43.188.66, 16 hops
UNM> route to 128.149.0.0 via 192.43.188.194, 6 hops
UNM> route to 192.12.18.0 via 192.43.188.194, 6 hops
UNM> route to 192.12.19.0 via 192.43.188.194, 6 hops
UNM> route to 192.12.20.0 via 192.43.188.194, 6 hops
UNM> route to 192.31.43.0 via 192.43.188.194, 6 hops
UNM> route to 192.31.92.0 via 192.43.188.194, 6 hops
UNM> route to 192.31.93.0 via 192.43.188.194, 6 hops
UNM> route to 192.31.237.0 via 192.43.188.194, 6 hops
UNM> route to 192.33.14.0 via 192.43.188.194, 6 hops
UNM> route to 192.33.17.0 via 192.43.188.194, 6 hops
UNM> route to 192.31.163.0 via 192.43.188.194, 6 hops
UNM> route to 192.41.208.0 via 192.43.188.194, 6 hops
UNM> route to 192.12.184.0 via 192.43.188.130, 16 hops
UNM> route to 192.41.163.0 via 192.43.188.194, 6 hops
UNM> route to 130.202.0.0 via 192.43.188.66, 16 hops
UNM> route to 131.215.0.0 via 192.43.188.194, 2 hops
UNM> 
UNM> Protocols
UNM> ARP Subnet routing: disabled
UNM> RIP: enabled
UNM>   Originate default on
UNM>   Per-interface address flags:
UNM>      intf  0   192.43.188.65    Send static routes
UNM>                                 Override static routes
UNM>      intf  1   192.43.188.193   Send static routes
UNM>                                 Override static routes
UNM>      intf  2   192.43.188.129   Send static routes
UNM>                                 Override static routes
UNM>      intf  3   192.31.154.9     Send static and default routes
UNM>                                 Received RIP packets are ignored.
UNM>                                 Override static routes
UNM>      intf  4   129.24.13.39     Send subnet and static routes
UNM>                                 Override static routes
UNM> EGP: disabled
UNM> Routing information interchange: off
UNM> Advertise RIP or static route hop count as EGP metric.
UNM> 
UNM> EGP neighbors:
UNM> [NONE]

I cannot seem to get to the other box at the moment.....  mmmmm....
well, you get the idea, I hope.

kurt@pprg.unm.edu (Kurt Zeilenga [LANL]) (06/03/89)

More information....

The UNM proteon sees 480 routes... most come from a cisco via RIP on
one of the ethernets.  The rest come in via the serial lines (the
ones I have statics for).

We've checked everything simple... like broadcasting addresses (remember,
we do get occasional updates, but we miss more then we get).. etc.

The sending proteon has about 30 routes, most from the UNM proteon.
Others come from various sources... all regularly (including those from
UNM).  UNM just doesn't see most of the sending RIP packets from this
puppy.  Oh, they are running 8.1 now (but this problem existed when
we were running 8.0 and 7.whatever).

Errors/ovfl are minimal....

	Kurt

kurt@WON-TON.DSPO.GOV (Kurt Zeilenga) (06/03/89)

I guess I should clarify, the net looks like this:

                       DSPO p4200
                            |
							|
                            |
Caltech p4200 -------   UNM p4200 -------  ANL p42000
                        /      \
         ====PPRGnet====        ======UNMnet====
                        \      /                \
                      PPRG Sun GW                \
                                     UNM Cisco (Westnet)

This is a backdoor network... only "local" routing info is
exchanged.  Right now, only 20 nets or so are configured to be
passed between the p4200s.  The UNM p4200 is dropping routes
coming from Caltech (though not all updates).

The UNM p4200 does see about 500 nets coming from the UNM Cisco
(same as PPRG Sun does), but these routes are NOT exchanged via
the backdoors.  The other p4200 all pick up non-local routes,
but only on the order of 30-50 routes (because they do more
"default" routing then UNM does).

Thanks for the help.

    Kurt

kwe@BU-IT.BU.EDU (06/03/89)

	The problem is most likely between the unm p4200 and
the westnet cisco.
	The unm p4200 is hitting the westnet cisco with 500
routes (500 + 20 from behind unm-p4200) most of which are
poison reverse (metric 16) routes.
	The cisco could be losing some of the 20 routes from
behind unm-p4200 in the mass of poison-routes.

	The p4200 should not be losing updates from the westnet
cisco.

	You should have a look at the RIP exchanges going on
on UNMnet.  You could use the PPRG Sun GW and tcpdump.

	Then you should have a look at the debugging console
data on the westnet-cisco and watch your 20 local nets behind
unm-p4200 come and go.  Do the same on the unm-p4200 and
see if any interesting routes are coming and going.

	Also, the unm-p4200 and the PPRG-Sun-GW could be
sync'ed on RIP updates leading to ~1000 routes hitting
the cisco at the same time.  Tough on the old bird.  Likewise
the Sun and cisco could be pounding the p4200.  See what
tcpdump says.  You may have to turn off routing on the Sun-GW
to make sure that tcpdump gets enough cpu and all the packets
it can.

	Good luck.

	Kent England, Boston University

dlw@VIOLET.BERKELEY.EDU (David Wasley) (06/03/89)

Depending on how old the software is in the cisco, it may be dropping RIP
packets. There was a bug whereby they could only queue at most 5 packets
for the RIP process before starting to drop them.
	David
(This was fixed about 9 months ago, as I recall.)

rrv@uxc.cso.uiuc.edu (06/04/89)

Hi Kurt,

We struggled for a long time with problems that somewhat resemble yours.

It turned out to be some kind of hardware problem in the slow speed serial
boards of the p4200.  As I recall the receiver side of the USART chip went
deaf until it was reset as the result of a maintainence test failure.
The t 2 process offers some clue that the tests are failing.

While the chip was deaf, routes would be dropped and unreachables would
be issued.

If your Proteon person can't help, I'll put you in touch with mine.

Ross