[mod.protocols.tcp-ip] more

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (01/14/86)

Since my posting yesterday, I have given a bit more thought to the
issue of keeping track of network topology.  I got several responses
acknowledging that the issue was an important and difficult one, but
none proposing any real solutions.  So it seemed worth putting a bit
more thought into the issue. While I haven't come up with any
startling innovations, I think I see a couple of approaches that would
work.  First, let me start by enumerating the possibilities that I
have seen.  We have several issues.  The first is how hosts keep track
of what gateways are up.  The second is how hosts keep track of
changes in gateway status.  The third is how hosts know what gateways
exist.  Of course these are not orthogonal.

Keeping track of what gateways are up:
  pinging - every host sends an echo request to every gateway that
	it knows about every 30 sec. or so.  Most people consider
	this unacceptable because it generates too much network
	traffic.  TOPS-20 does this, though with an interval of
	several minutes.  I believe it must be done every 30 sec.,
	because we have to be able to discover that a gateway is
	down in time to move to another one before connections start
	timing out or users start thinking that the system is down.
  gateway broadcast - every gateway sends a broadcast every 30 sec.
	For a network that supports broadcasts, this gives as good
	results as pinging, but the number of packets is far smaller.
	PUP and XNS gatewayinfo do this.  So does Unix routed.  The
	only disadvantage I can see is that it only works if the 
	network supports broadcasts, and that it may not be so good
	for single-process systems (e.g. IBM PC).  On an IBM PC, you
	can't just have a daemon sitting there keeping track of what
	networks are up.  Telnet could have to wait a minute or two
	gathering gateway information before starting to make the
	connection.
  host broadcast - when a host wants to make a connection, it sends
	a broadcast asking for any gateways to a certain host to
	respond.  This is effectively done now by ARP-hacking gateways.
	Since an ARP is needed anyway to initiate a connection, it
	adds no overhead.  This strategy is appropriate for single-
	process systems.  The only disadvantage I can think of is that
	it only works on media that support broadcast.  Note that in
	a complex network, this stategy requires that the gateways
	have some other way to keep track of each other.  They must
	arrange things so that only the preferred gateway will respond
	to an ARP.

Keeping track of changes.  These techniques would normally be combined
with those above.  
  timeouts - when a connection times out, one has a good suspicion
	that some part of the current route is down.  What to do
	about it depends upon which of the above strategies one is
	using.  If you are using pinging or gateway broadcast, 
	strictly speaking you don't need to do anything about timeouts.
	4.3 uses timeouts because 4.3 establishes a route when a
	connection is opened.  Even if routed has figured out that
	the gateway involved is down, the connection will still try
	to use it.  A timeout triggers the system to reexamine the
	route, using its latest gateway information.  On TOPS-20,
	this is not needed, since the route is recomputed for each
	packet sent.  If you are depending upon a host broadcast
	(e.g. ARP), a timeout should cause the current route (in this
	case ARP table entry) to be removed, so that the host sends
	another broadcast to look for a new route.  Note that timeouts
	do not totally solve the problem of detecting down gateways,
	if we have traffic to some gateway (or for ARP-based schemes,
	host) that is not connection-oriented.  That is, UDP-based
	protocols may not have a concept of timeout, or may find it
	hard to feed back information about timeouts to lower levels
	of the system.
  ICMP redirect - depending upon the design, the system may not know
	when a better route has become available.  Again, TOPS-20
	always will, because it recomputes routes each time, and
	continually pings all gateways.  But 4.2 will not change
	routes during a connection.  And a system that depends upon
	the ARP hack probably doesn't have enough information to do
	so either.  So one can arrange for gateways to keep track of
	each other, and to issue an ICMP redirect if a better route
	becomes available.  Note that this does not necessarily
	require the host to keep track of gateway information.  If
	all of the gateways do the ARP hack, a host can process an
	ICMP redirect simply by removing the ARP table entry for
	the destination host involved.
  ARP table expiration - Unix expires entries in the ARP table after
	N minutes of non-use.  This is primarily intended to keep
	down the number of entries in the ARP table.  However in
	theory this could be used to keep routing up to date.  If
	we expired entries even when they are in use, it would
	force a new ARP request.  This would (we hope) come back
	with the latest routing, taking into account any gateways
	that have come up or gone down.  The problem with doing
	this is that it would increase the number of ARP requests.
	If we only use it to discover better routes, we could afford
	to do it fairly infrequently, say once every 30 min.  If we
	depend upon it to discover gateways that are down, we probably
	have to do it every 30 sec.  This is likely to cause results 
	that are about as bad as pinging.  It would also interfere
	with performance, since our experience shows that waiting
	for an ARP causes a noticable pause in telnet.  Doing this
	once every 30 min is not likely to cause a significant
	load.  Suppose we have 256 hosts on a subnet, each talking
	to 4 other hosts at a time (this is probably a gross
	overestimate for any real network).  That is 1000 ARP
	requests in 1800 sec.  This is a packet rate of around
	1 per second.  That should be tolerable.  However the
	requests are probably not going to be random.  There may be
	a tendency for them to cluster, due to the fact that all of
	the systems will have been rebooted at the same time (the
	last power failure).

Knowledge of network topology.
  builtin tables - this is fairly common, but with a large network
	it becomes a pain to update all the tables.
  gateway broadcasts - the gateway broadcast strategy mentioned
	above also solves this problem, since it allows the host
	to discover what gateways exist simply by monitoring
	broadcasts.
  host broadcasts - the host broadcast strategy mentioned above
	also solves this problem, since the host no longer has to
	know the network topology.  When it needs to make a connection
	it broadcasts a request and the gateways have to figure out
	who should respond.  To use changes in topology, this should be
	combined with ICMP redirects when a better route becomes
	available.
  try a random gateway - TOPS-20 keeps a table with a small number
	of "prime" gateways.  When it wants to make a connection,
	and none of the currently known gateways is right for the
	job, it chooses a random prime gateway.  This gateway is
	expected to know about all of the others, and to issue an
	ICMP redirect to the right one.  However this only works
	if one knows which of the prime gateways are up.  TOPS-20
	uses pinging.  Any other solution to the problem of knowing
	which gateway is up will also solve the problem of knowing
	what gateways there are, so this strategy is probably not
	terribly useful.

Some choices are clear:
  - we probably don't want pinging to be the primary method of
	keeping track of the network.
  - ARPs are probably the only reasonable way for single-process
	machines to find out about the network, since they can't
	be expected to have daemons that keep track of topology.
	This implies that all gateways should be expected to
	support the "ARP hack", even when subnetting is general
	implemented.

Now the question is whether we also want the gateways to broadcast,
a la routed.  My initial reaction is that if we can come up with
a mechanism based on ARPs that will solve all of our problems, there
is no need to run routed or its equivalent on each host.  So first,
let's look at a design based on ARPs, and no protocol like routed.
  - connections are initially established by issuing an ARP
	request.  The gateways arrange to answer these in such a
	way as to give an optimal route.
  - when a connection times out, the ARP table entry for the host 
	involved is removed.  This forces a new ARP for the next
	packet to be sent.
  - when a better route becomes available, it would be helpful
	if the gateway currently being used issues an ICMP
	redirect.  Because of the timing out of ARP table
	entries, this is not completely necessary.
  - if non-connection-oriented protocols are being used (so that
	timeouts are not possible), or if it is not practical
	for gateways to issue ICMP redirects when a better route
	becomes available, ARP table entries must expire after
	30 minutes.


This mechanism is obviously not sufficient for hosts with more than
one Ethernet interface, since they have no way to choose which
interface to use.  ARP's don't help, since in general some other 
gateway will probably be able to find a route to any host on any
subnet, so there will be responses to ARP requests on both interfaces.
However a host with more than one interface is effectively a gateway.
It should participate in whatever protocol is used among the gateways,
probably EGP or routed.

There are several reasons why one might prefer some other mechanism
for hosts that are capable of running daemons:
  - if UDP-based protocols are in heavy use, it may be impractical to
	detect down gateways by depending upon timeouts.  For Suns,
	the Network File System is critical, and that uses UDP.
	While NFS does have a concept of timeout, our experience
	shows that timeouts may indicate a number of conditions
	other than routing failures.  It is not clear whether it
	would be appropriate to clear ARP table entries when there
	is an NFS timeout.
  - one may believe that it is not practical to implement ICMP
	redirect in the gateways when a better route becomes
	available, and that the overhead of expiring ARP entries
	is unacceptable.

If one decides that another scheme is needed other than having the
host broadcast requests, it seems clear that the best alternative is
to have the gateway broadcast the fact that it is up.  In that case,
routed seems to make a lot of sense.  It is widely implemented, and
seems to do what needs to be done.  In a Unix implementation, one also
needs a way to force routes to be recomputed when there is a change in
the gateway table.  The method in 4.3 seems to depend upon timeouts.
I suspect it might be better to have an IOCTL that routed could do to
invalidate routes (either all routes whenever a topology change
happens, or some slightly more selective method).

Unfortunately, the problem I have to solve is not just picking the
combination of strategies that I like the best.  I also have to be
able to live with existing TCP/IP implementations.  Currently Rutgerse
is using 4.2 (Sun, Pyramid, Celerity, Ultrix), TOPS-20, DG, Symbolics,
Bridge, ...  We only have source to some of these, and even where we
do have source, it may not be desirable to do major network development
work.  If we are unable to change the host implementation, then
the advice to a gateway designer is pretty much the obvious:

1) Do the best one can for hosts that will depend upon ARP's to
discover routing.  This means trying to coordinate gateways so
that only the best one responds.

2) Enough systems use code based on 4.2, and routed is a reasonable
enough way of doing things, that it probably makes sense for the
gateways to implement routed.

3) One should probably try to get gateways to issue ICMP redirects
whenever appropriate.  However it is not clear which existing
implementations this is going to help.  Certainly it would help
TOPS-20.  Existing systems that use ARP are pretending that all all of
the hosts are directly connected, so an ICMP redirect is going to be
irrelevant to them.  For Unix systems, ICMP redirect doesn't add much
to what routed already provides (and indeed may even confuse it, if
routed thinks it is managing the gateway tables).  Circumstances where
ICMP redirects could be generated are when a packet is sent to a
gateway that knows it is not the best route.  Len Bosack at Stanford
suggests that gateways should have a command that says we are about to
shut them down.  In that case, they can start issuing ICMP redirects
to an alternate.  (However one has to be careful to avoid loops.  If
the alternate doesn't know you are shutting down, and it is a less
prefered route, it may issue a redirect right back to you.)

JNC@MIT-XX.ARPA ("J. Noel Chiappa") (01/14/86)

Chuck:

	Some of the issues you raise (e.g. how do hosts find dead
gateways) are already covered in the RFC's Dave Clark wrote for the
Internet Implementor's Guide; the one you want is the one about fault
isolation. The Gateway commitee is working on an RFC about Routing in
the Host IP Layer which talks about the others (e.g. finding gateways,
etc.) We want to insulate the hosts as much as possible from the
details of routing since we are going to be changing that, so having
routing tables sent to hosts is out. Remember also that whatever
mechanisms we use have to also work on nets that do not support
broadcast.

	Noel
-------