[mod.protocols.tcp-ip] how to make use of redundant routes

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (01/20/87)

We are just now beginning to look at making use of redundant routes,
to provide some extra reliability in our network.  For our internal
routing, we have dedicated gateways that handle most of the traffic.
(2 Cisco gateways, plus a home-brew gateway that is similar in
technology to Cisco's.)  There is no redundancy among those gateways.
However we also have various Unix machines with multiple interfaces.
If we set them all up to act as gateways, we could survive any gateway
being down.  However I'm somewhat unclear how to maintain our hosts'
routing tables.  Unix seems to be set up to get routing information
through both routed and routing directs.  Unfortunately, it seems that
these two techniques interact unfavorably.  Initially, I had the idea
that redirects alone might do the job.  But that clearly can't work.
If the current gateway goes down, there's nobody to issue a redirect
to someplace else.  4.3 makes this somewhat better by killing the
current route when a TCP connection is about to time out.  But that
doesn't help us with NFS, which uses only UDP.  The same problem
applies to our current kludge of using proxy ARP's.  I am beginning to
come to the view that there is no real alternative to routed or
something like that.  However the vanilla routed still has potential
problems.  If a gateway issues a host redirect, the kernel makes an
entry in the routing table that routed doesn't know about.  Should the
gateway named in the redirect go down, routed will not know that it
should remove that host route.  The only complete solution I have seen
is Cornell's gated.  If you ignore its support for EGP and HELLO
(which are not relevant in our situation), it can be thought of as a
souped-up routed.  Unlike routed, it uses a raw socket which gets
copies of every redirect received by the kernel.  Thus it is able to
maintain a model of exactly what routes the kernel has.  Presumably
this would allow gated to remove routes that no longer apply.  Some
folks here are reluctant to use gated on every one of our
workstations.  We have an aversion to regularly activating daemons on
diskless Suns.  (Although we don't have any real problems with them,
swapping over the network provides a certain incentive to minimize the
number of programs that get swapped in at regular intervals.)  There
is also a feeling that gated is large enough that we are probably not
going to understand what it is doing.  However I'm beginning to think
that its approach is inevitable.  I have thought of one alternative,
but I'm not sure that it has enough advantages to be worth coding.
That is a program that monitors routed traffic and keeps track of what
gateways are up.  It would not do anything with the contents of the
packets -- just remember what gateways are currently sending them.
The program would guarantee that there is always a default route that
points to a gateway that is up.  It would periodically examine the
routes in the kernel (using code stolen from netstat, presumably), and
kill any that involved gateways that are no longer up.  One might also
look at the use count field, and get rid of routes that haven't been
used for a certain period of time.  Presumably this program would be
smaller than gated, and I think I would also be more likely to
understand exactly what it is doing.  But I'm not sure I want to add
yet another routing daemon.  (The only approach I can think of other
than monitoring the gateways' routing protocol is to ping each gateway
periodically.  That would work, of course, but it would create more
network traffic.)  I'm reasonably convinced than any system acting as
an actual gateway should run gated.

I'd be curious to hear comments from places that have been using
dynamic routing for some time.  By the way, I am willing to assume
that all gateways participate in routing using routed.  (Cisco now
supports routed.)

DCP@QUABBIN.SCRC.SYMBOLICS.COM (David C. Plummer) (01/20/87)

    Date: Tue, 20 Jan 87 01:07:40 est
    From: hedrick@topaz.rutgers.edu (Charles Hedrick)

    I'd be curious to hear comments from places that have been using
    dynamic routing for some time.  By the way, I am willing to assume
    that all gateways participate in routing using routed.  (Cisco now
    supports routed.)

Users of the Chaosnet protocol have been using dynamic routing for
nearly 10 years now.  Chaosnet is MIT AI memo number 628.  I think it is
online at MIT someplace, but can't find it.  (Snit: it's always bothered
me that IP didn't address this issue from the start.)

A very brief description of Chaos routing follows, so those wishing to
type D(elete) now can go ahead.

Chaosnet only has 255 subnets (which is a problem for very large
configurations, and therefore this may not prove useful for IP, but I
had ideas back in '81 or so on how to extend this kind of routine to IP.
JNC may remember some of them.)  With only 256 subnets, keeping a full
subnet routing table was not hard.  Periodically (every 15 seconds) a
multiple interface bridge would broadcast a routing packet on each
interface.  The routing packet contained several pairs: a subnet number
and a "cost" to get to that subnet.  Periodically (every 4 seconds) each
machine would increment all costs in the table by 1.  When processing a
routing packet, an entry would get replaced if the cost in the routing
packet is smaller than the cost in the current entry.

If you are interested in my extensions to IP / heirarchical addressing
schemes, I can try to dig them up.

JNC@XX.LCS.MIT.EDU.UUCP (01/21/87)

	For the 59th time, an RFC (RFC816) in the 'Internet Protocol
Implementors Guide' discusses exactly how to use Redirects, and how to
figure out that your gateway is dead. The issue was addressed years
ago in detail, but apparently nobody bothers to read the specs.
	I won't both to waste my time pointing out that having hosts
listening to routing protocols is a terrible idea; nobody ever
believes me. Also, RIP is a piece of junk. It doesn't even work with
the current EGP, let alone with any followon. It's too bad that a) it
was around before any other IGP was, and b) it was in the Berkeley
system, because now we'll never get rid of it.

	Noel
-------

hedrick@TOPAZ.RUTGERS.EDU.UUCP (01/21/87)

Noel:

I appreciate your advice, but really, the people you should be flaming
at are not me, but Berkeley and the various vendors.  My problem is
very different than Berkeley's: I have to deliver reliable service
given products that actually exist. RFC816 says basically

  - somehow you keep track of what gateways are up
  - when you want to talk to somebody, try a gateway, and depend
	upon redirects to get the actual address
  - when a gateway goes down, just get rid of routes that use it.
	This will result in trying another random gateway, which
	will again tell you if there is a better one.

For keeping track of what gateways are up, RFC816 mentions

  - depend upon the network to tell you when a packet isn't
	delivered.
  - ping them regularly
  - depend upon the upper layers [for us, TCP and NFS] to tell
	you when a route no longer works.  When a route
	stops working, assume that its gateway is down.

Now, let's look at how much of that advice I can actually use.  I have
mostly Suns.  This means I have 4.2 networking.  I know that's
horrible, but there isn't a lot of real 4.3 in the marketplace yet.
I'm not in a position to implement my own, or even to port 4.3 to the
Sun.  In a 4.2 world, none of the suggestions in RFC816 look very
attractive.  Ethernet obstinately refuses to tell me that it is unable
to deliver a packet.  The one implementation that tried to ping all of
the gateways it knew about [TOPS-20] was roundly condemned by all.
And 4.2 has no feedback from the upper layers to the lower ones.
[Indeed even in 4.3, it's not clear whether the feedback is good
enough that you can really depend upon it in practice.  I'd like to
hear from anybody who has experience in this area.  If you just use
the route command to set up several default gateways, will 4.3 really
manage to keep up communications, using just redirects and hints from
TCP?  Note that for many of my users, the main application they are
using through the gateway is NFS, which is UDP-based.]

Since none of these methods seems very attractive, I proposed the next
best thing that I could think of: watching the routing traffic between
the gateways.  I don't propose to look in the packets.  I'm just
trying to keep track of what gateways are sending them.  If you hate
routed, imagine that I am watching Cisco's IGRP traffic (which is by
no means impossible).  Given this, I thought I was proposing something
that was very much in the spirit of RFC816.  I proposed a daemon that
would do the following:

  - keep track of what gateways are up, by listening to their
	routing traffic
  - periodically scan the routes in the kernel
  - when it finds a route that uses a dead gateway, remove the route.
	In Unix, this means that traffic will revert to the default
	gateway, which will then redirect it to a better one if there
	is any.  This is exactly what RFC816 says.
  - it would manage the default routing entry, to make sure that it
	always points to a gateway that is up.

What I asked was whether anyone has experience with such a thing, and
can advise me on whether it is really worth doing this instead of
just using routed.  I also mentioned some practical problems that
would have to be solved before we could really rely on routed.

I really have been trying to follow your advice.  I have tried to
avoid using routed.  I find that I am moving that way, because my
connections with NSFnet, and use of various random Unix machines as
gateways, provide little choice.  Cisco, who supplies our core
gateways, had tried to avoid routed as well, but has finally caved in
to the inevitable.  It would help a lot if there were a standard
defining something better.  When the current efforts in this direction
result in something, I'll be happy to push the vendors we deal with to
implement it.

I'd be happy to join you in a campaign lobbying vendors for
implementations that follow the RFC's.  But please don't yell at me.
I'm just trying to do something reasonable with the tools I have.

walsh@HARVARD.HARVARD.EDU.UUCP (01/28/87)

	to deliver a packet.  The one implementation that tried to ping all of
	the gateways it knew about [TOPS-20] was roundly condemned by all.

The BBN VAX networking code pinged gateways.  You don't need to ping X if
you're getting acks back for any connection actively using gateway X as a
first hop.  You also don't need to ping if you're not currently using the
gateway.

	Since none of these methods seems very attractive, I proposed the next
	best thing that I could think of: watching the routing traffic between
	the gateways.

Using an Ethernet board in snoopy mode sounds awfully inefficient.

hedrick@TOPAZ.RUTGERS.EDU.UUCP (01/29/87)

Routed uses broadcasts, so snooping on the gateways doesn't require
promiscuous mode.