[comp.dcom.lans] routing table management

root@topaz.RUTGERS.EDU (Charles Hedrick) (12/06/86)

There are two issues raised in this discussion: routed as a gateway to
gateway routing protocol, and routed as a method by which non-gateway
hosts discover what gateway to use when they want to talk to another
net.  Potentially, one might use routed among the gateways, and some
other method for normal hosts to find a gateway.  Or visa versa.
Thomas Narten's message is evidence that I did not distinguish these
two issues carefully enough.

For smallish networks connected by high-speed media, routed is
probably OK as a protocol for gateways to keep each other up to date
about the topology of the network.  My original assertion was that it
was not appropriate in the general case.  Unfortunately, I don't
recall all of the problems.  But here is one.  It's called the
"counting to infinity problem".  Consider the following set of
networks and gateways.  Letters are gateways.  Number are networks.
The numbers under the gateways are the hop count they show to get to
network 3.

      A    1     B    2    C    3
      X----------X---------X--------
      2          1         0

Now suppose that gateway C goes down.  For a while, A and B will
continue to believe, and advertise, the old routes.  But eventually
they will time out.  Here is what will happen when B realizes that
the route via C is no longer current.

      A    1     B    2    C    3
      X----------X--------- --------
      2          3         

Since A has continued to advertise a hop count of 2, B thinks it can
route via A.  At some point, A will realize that its old route is no
longer any good.  But B is now advertising a hop count of 3.  So we
get

      A    1     B    2    C    3
      X----------X--------- --------
      4          3         

Etc.  While this is happening, any packet destined for network 3 is
bounced back and forth between A and B until its time to live goes to
zero.  There are other similar problems that can happen if you have
larger numbers of gateways.  Some of them result in long delays before
new routing patterns stabilize.  Others result in no convergence at
all.  Perhaps someone who is more familiar with the theory of these
algorithms can recall some of the more complex failure mode.

Here are some other problems.  Note that these often are not
significant for a network constructed entirely of Ethernets.  But for
heterogenous networks, they are.  (Our network has links to the
Arpanet, NSFnet, and one of our internal links is very slow.)
  - routed uses hop counts.  This is only a good idea if all of
	the links are roughly the same speed.
  - it has no way to help coordinate load sharing among parallel
	routes of comparable goodness
  - it has no way to specify default gateways.  If you have a
	"generally competent" exterior gateway, e.g. an Arpanet
	gateway, you may not want to propagate
	the whole list of networks accessible from the Arpanet
	through your routed setup.  Sites with more than one
	Arpanet gateway are more likely to want their internal
	routing protocol just to find them the "nearest" Arpanet
	gateway.
Again, let me say that I do not object to routed.  I can imagine
a number of places where it would be perfectly reasonable.  I am
simply arguing that it is not general enough for someone who wants
to build a commerical gateway that will be useful in as many
situations as possible.

Let us now assume that our gateways have adopted some manner of
communication among themselves, so that they always know the topology
of the network.  This may even be routed, if the network is such that
routed doesn't result in problems.  The other issue is how a host that
wants to talk to another network should figure out which gateway to
talk to.  I claim that routed is not a good solution for this because
it exists only on Unix.  Since there are other methods that will work
on every host, one would think that such methods should be preferred
to ones that work only on Unix.  (I'm less concerned with this issue
for gateway to gateway protocols.  You might well decide that all of
your gateways are going to be BSD systems, and thus feel safe in using
a BSD protocol among them.  That doesn't bother me.  But I think it is
a bad idea to assume that all of your non-gateway hosts are going to
be running Unix.)  I think routing redirects are probably the best
method, if the implementation is done right.  I believe 4.3 is.
Thomas Narton was concerned that redirects don't help if the gateway
goes down, since there is then nothing to issue the redirect.  First,
I think the more common situation (at least with our gateways) is for
communications facilities to go down.  In that case, the gateway is
still alive, and can issue redirects.  But even if a gateway goes down
and stays down, 4.3 should be able to handle it.  Should the current
gateway go down, 4.3 will time out.  However before actually killing
the connection, it will try recomputing the route.  If I understand
the code correctly, this will go back and start the whole
route-finding process again (i.e. starting with the default gateway or
gateways), and should come up with a new route if one still exists.
This is not instant, but neither is a solution based on routed, since
you have to wait for the routing tables to get updated.

He commented that proxy ARP also has problems adjusting to gateways
that go down.  I agree, but note that I recommend proxy ARP primarily
for use with 4.2.  It is the only method I know to allow unmodified
4.2 systems to work with subnets.  It doesn't seem much worse than
anything else for more general route-finding on 4.2.  As far as I can
tell from the sources, routed on 4.2 isn't all that good at dynamic
route adjustment either.  Unfortunately, under 4.2 there is no ioctl
to change the gateway address for an existing route.  So routed
deletes the old route and creates a new one.  This means that old
connections still have references to the old route, and will time out.
I confess that I am somewhat biased by the fact that gateway crashes
are not an issue for me.  I am concerned primarily about handling
reconfigurations of the network.  We generally manage to do this in
such a way that the old gateway is still around for a while to issue
redirects to the new one.  If we had gateways going down a lot, then I
might find myself looking for new methods.  Because we use dedicated
gateways (Cisco and the original Stanford design on which the Cisco
gateways are based), we don't have to worry about Unix system crashes.
As far as I know, we have never had a crash of the older gateway; once
we got some initial problems out of the way [most of them due to
design problems in hardware from other vendors], we have not had any
crashes in the Cisco machines either.  Even if they did crash, they
would reboot fast enough to avoid breaking connections that went
through them.

gds@sri-spam.istc.sri.com (The lost Bostonian) (12/07/86)

Regarding load sharing, the BBN TCP/IP does load sharing among multiple
routes to a destination net or host.  This can cause problems if the
gateways do not agree that the path of least resistance should be taken.
The host does not listen to redirects until the load is sufficiently
high on the old path of least resistance.  (Some people on the tcp-ip
list are suggesting that loading factors should be figured into internet
gateway routing mechanisms, which is probably a good idea).

Regarding changing the gateway to destinations in routing tables, it
seems like a relatively easy thing to do (in the code in
/sys/net/route.c, once you have hashed on the destination, you could
replace the gateway with the new gateway).  Why this hasn't been done is
a good question (maybe someone at berkeley will answer it).  Some
route-specific information would need to be changed, such as the refcnt
and flags.

--gregbo

ks@pur-ee.UUCP (Kirk Smith) (12/07/86)

In article <7541@topaz.RUTGERS.EDU> root@topaz.RUTGERS.EDU (Charles Hedrick) writes:
>Here are some other problems.  Note that these often are not
>significant for a network constructed entirely of Ethernets.  But for
>heterogenous networks, they are.  (Our network has links to the
>Arpanet, NSFnet, and one of our internal links is very slow.)
>  - routed uses hop counts.  This is only a good idea if all of
>	the links are roughly the same speed.

In 4.3 BSD, a per-interface metric can be set to increase the "hop count"
for slow interfaces.  This feature can be used when a high speed and a low
speed link are available.  Under normal operation, the high speed link would
be used.  When it is unavailable, the low speed link would be used.

>  - it has no way to help coordinate load sharing among parallel
>	routes of comparable goodness

True.  This would only be a problem, though, if the parallel routes were
both "low" speed.  A single "high" speed route is enough to support typical
network traffic, without sharing with another "high" speed route.

>  - it has no way to specify default gateways.  If you have a
>	"generally competent" exterior gateway, e.g. an Arpanet
>	gateway, you may not want to propagate
>	the whole list of networks accessible from the Arpanet
>	through your routed setup.  Sites with more than one
>	Arpanet gateway are more likely to want their internal
>	routing protocol just to find them the "nearest" Arpanet
>	gateway.

If a routed is started with the "-g" flag, that machine is assumed
to be an "external" gateway.  It will advertise a default route.
If you have multiple arpanet gateways, that is no problem.  Routed
will pick the "nearest" OPERATIONAL Arpanet gateway.

I am not trying to promote routed as the perfect solution to all problems,
but it seems to work quite well for us.  Many of the perceived problems with
routed are just not true.  Other solutions, such as Proxy ARP and route
redirects, are not as complete as routed.  Gateways going down are real
problems that need to be dealt with.  If we used routing algorithms that
could not find a route when one exists, our users would be quite upset.
On the other hand, routed has not been implemented by all machines that
run TCP/IP.  But there is no reason that it couldn't be implemented.
And it certainly does not preclude those machines from running on the
same internet, with less than optimal routing capabilities.  At this time,
there seems to be no serious alternative for us.

					Kirk Smith
					Purdue Engineering

dricej@drilex.UUCP (Craig Jackson) (12/09/86)

In article <5003@pur-ee.UUCP> ks@pur-ee.UUCP (Kirk Smith) writes:
>In article <7541@topaz.RUTGERS.EDU> root@topaz.RUTGERS.EDU (Charles Hedrick) writes:
>>Here are some other problems.  Note that these often are not
>>significant for a network constructed entirely of Ethernets.  But for
>>heterogenous networks, they are.  (Our network has links to the
>>Arpanet, NSFnet, and one of our internal links is very slow.)
>
>>  - it has no way to help coordinate load sharing among parallel
>>	routes of comparable goodness
>
>True.  This would only be a problem, though, if the parallel routes were
>both "low" speed.  A single "high" speed route is enough to support typical
>network traffic, without sharing with another "high" speed route.

This assumes that you aren't involved in really large scale computing.  We
are looking at at least one application involving up to 1000 users.  
These users would typically be terminal-io-bound (not much computation).
If those users come in over Ethernet (using terminal servers, for example),
a single Ethernet would be pushed to its limits.  Since we'd want another
one for redundancy, anyway, it would be nice to have a routing algorithm
which would share the load.  (This discussion originally started with
terminal servers, I believe.)  

I know that terminal servers generally don't come with two Ethernet interfaces.
However, I know of one (non-TCP/IP) that does, and it would be nice if they
all did.

BTW, at least one manufacturer has proposed solving this application with
terminal servers, so this is a real example.

>					Kirk Smith
>					Purdue Engineering

And
now
for
some
inews
food.
-- 
Craig Jackson
UUCP: {harvard!axiom,linus!axiom,ll-xn}!drilex!dricej
BIX:  cjackson