hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (01/14/86)
Since my posting yesterday, I have given a bit more thought to the issue of keeping track of network topology. I got several responses acknowledging that the issue was an important and difficult one, but none proposing any real solutions. So it seemed worth putting a bit more thought into the issue. While I haven't come up with any startling innovations, I think I see a couple of approaches that would work. First, let me start by enumerating the possibilities that I have seen. We have several issues. The first is how hosts keep track of what gateways are up. The second is how hosts keep track of changes in gateway status. The third is how hosts know what gateways exist. Of course these are not orthogonal. Keeping track of what gateways are up: pinging - every host sends an echo request to every gateway that it knows about every 30 sec. or so. Most people consider this unacceptable because it generates too much network traffic. TOPS-20 does this, though with an interval of several minutes. I believe it must be done every 30 sec., because we have to be able to discover that a gateway is down in time to move to another one before connections start timing out or users start thinking that the system is down. gateway broadcast - every gateway sends a broadcast every 30 sec. For a network that supports broadcasts, this gives as good results as pinging, but the number of packets is far smaller. PUP and XNS gatewayinfo do this. So does Unix routed. The only disadvantage I can see is that it only works if the network supports broadcasts, and that it may not be so good for single-process systems (e.g. IBM PC). On an IBM PC, you can't just have a daemon sitting there keeping track of what networks are up. Telnet could have to wait a minute or two gathering gateway information before starting to make the connection. host broadcast - when a host wants to make a connection, it sends a broadcast asking for any gateways to a certain host to respond. This is effectively done now by ARP-hacking gateways. Since an ARP is needed anyway to initiate a connection, it adds no overhead. This strategy is appropriate for single- process systems. The only disadvantage I can think of is that it only works on media that support broadcast. Note that in a complex network, this stategy requires that the gateways have some other way to keep track of each other. They must arrange things so that only the preferred gateway will respond to an ARP. Keeping track of changes. These techniques would normally be combined with those above. timeouts - when a connection times out, one has a good suspicion that some part of the current route is down. What to do about it depends upon which of the above strategies one is using. If you are using pinging or gateway broadcast, strictly speaking you don't need to do anything about timeouts. 4.3 uses timeouts because 4.3 establishes a route when a connection is opened. Even if routed has figured out that the gateway involved is down, the connection will still try to use it. A timeout triggers the system to reexamine the route, using its latest gateway information. On TOPS-20, this is not needed, since the route is recomputed for each packet sent. If you are depending upon a host broadcast (e.g. ARP), a timeout should cause the current route (in this case ARP table entry) to be removed, so that the host sends another broadcast to look for a new route. Note that timeouts do not totally solve the problem of detecting down gateways, if we have traffic to some gateway (or for ARP-based schemes, host) that is not connection-oriented. That is, UDP-based protocols may not have a concept of timeout, or may find it hard to feed back information about timeouts to lower levels of the system. ICMP redirect - depending upon the design, the system may not know when a better route has become available. Again, TOPS-20 always will, because it recomputes routes each time, and continually pings all gateways. But 4.2 will not change routes during a connection. And a system that depends upon the ARP hack probably doesn't have enough information to do so either. So one can arrange for gateways to keep track of each other, and to issue an ICMP redirect if a better route becomes available. Note that this does not necessarily require the host to keep track of gateway information. If all of the gateways do the ARP hack, a host can process an ICMP redirect simply by removing the ARP table entry for the destination host involved. ARP table expiration - Unix expires entries in the ARP table after N minutes of non-use. This is primarily intended to keep down the number of entries in the ARP table. However in theory this could be used to keep routing up to date. If we expired entries even when they are in use, it would force a new ARP request. This would (we hope) come back with the latest routing, taking into account any gateways that have come up or gone down. The problem with doing this is that it would increase the number of ARP requests. If we only use it to discover better routes, we could afford to do it fairly infrequently, say once every 30 min. If we depend upon it to discover gateways that are down, we probably have to do it every 30 sec. This is likely to cause results that are about as bad as pinging. It would also interfere with performance, since our experience shows that waiting for an ARP causes a noticable pause in telnet. Doing this once every 30 min is not likely to cause a significant load. Suppose we have 256 hosts on a subnet, each talking to 4 other hosts at a time (this is probably a gross overestimate for any real network). That is 1000 ARP requests in 1800 sec. This is a packet rate of around 1 per second. That should be tolerable. However the requests are probably not going to be random. There may be a tendency for them to cluster, due to the fact that all of the systems will have been rebooted at the same time (the last power failure). Knowledge of network topology. builtin tables - this is fairly common, but with a large network it becomes a pain to update all the tables. gateway broadcasts - the gateway broadcast strategy mentioned above also solves this problem, since it allows the host to discover what gateways exist simply by monitoring broadcasts. host broadcasts - the host broadcast strategy mentioned above also solves this problem, since the host no longer has to know the network topology. When it needs to make a connection it broadcasts a request and the gateways have to figure out who should respond. To use changes in topology, this should be combined with ICMP redirects when a better route becomes available. try a random gateway - TOPS-20 keeps a table with a small number of "prime" gateways. When it wants to make a connection, and none of the currently known gateways is right for the job, it chooses a random prime gateway. This gateway is expected to know about all of the others, and to issue an ICMP redirect to the right one. However this only works if one knows which of the prime gateways are up. TOPS-20 uses pinging. Any other solution to the problem of knowing which gateway is up will also solve the problem of knowing what gateways there are, so this strategy is probably not terribly useful. Some choices are clear: - we probably don't want pinging to be the primary method of keeping track of the network. - ARPs are probably the only reasonable way for single-process machines to find out about the network, since they can't be expected to have daemons that keep track of topology. This implies that all gateways should be expected to support the "ARP hack", even when subnetting is general implemented. Now the question is whether we also want the gateways to broadcast, a la routed. My initial reaction is that if we can come up with a mechanism based on ARPs that will solve all of our problems, there is no need to run routed or its equivalent on each host. So first, let's look at a design based on ARPs, and no protocol like routed. - connections are initially established by issuing an ARP request. The gateways arrange to answer these in such a way as to give an optimal route. - when a connection times out, the ARP table entry for the host involved is removed. This forces a new ARP for the next packet to be sent. - when a better route becomes available, it would be helpful if the gateway currently being used issues an ICMP redirect. Because of the timing out of ARP table entries, this is not completely necessary. - if non-connection-oriented protocols are being used (so that timeouts are not possible), or if it is not practical for gateways to issue ICMP redirects when a better route becomes available, ARP table entries must expire after 30 minutes. This mechanism is obviously not sufficient for hosts with more than one Ethernet interface, since they have no way to choose which interface to use. ARP's don't help, since in general some other gateway will probably be able to find a route to any host on any subnet, so there will be responses to ARP requests on both interfaces. However a host with more than one interface is effectively a gateway. It should participate in whatever protocol is used among the gateways, probably EGP or routed. There are several reasons why one might prefer some other mechanism for hosts that are capable of running daemons: - if UDP-based protocols are in heavy use, it may be impractical to detect down gateways by depending upon timeouts. For Suns, the Network File System is critical, and that uses UDP. While NFS does have a concept of timeout, our experience shows that timeouts may indicate a number of conditions other than routing failures. It is not clear whether it would be appropriate to clear ARP table entries when there is an NFS timeout. - one may believe that it is not practical to implement ICMP redirect in the gateways when a better route becomes available, and that the overhead of expiring ARP entries is unacceptable. If one decides that another scheme is needed other than having the host broadcast requests, it seems clear that the best alternative is to have the gateway broadcast the fact that it is up. In that case, routed seems to make a lot of sense. It is widely implemented, and seems to do what needs to be done. In a Unix implementation, one also needs a way to force routes to be recomputed when there is a change in the gateway table. The method in 4.3 seems to depend upon timeouts. I suspect it might be better to have an IOCTL that routed could do to invalidate routes (either all routes whenever a topology change happens, or some slightly more selective method). Unfortunately, the problem I have to solve is not just picking the combination of strategies that I like the best. I also have to be able to live with existing TCP/IP implementations. Currently Rutgerse is using 4.2 (Sun, Pyramid, Celerity, Ultrix), TOPS-20, DG, Symbolics, Bridge, ... We only have source to some of these, and even where we do have source, it may not be desirable to do major network development work. If we are unable to change the host implementation, then the advice to a gateway designer is pretty much the obvious: 1) Do the best one can for hosts that will depend upon ARP's to discover routing. This means trying to coordinate gateways so that only the best one responds. 2) Enough systems use code based on 4.2, and routed is a reasonable enough way of doing things, that it probably makes sense for the gateways to implement routed. 3) One should probably try to get gateways to issue ICMP redirects whenever appropriate. However it is not clear which existing implementations this is going to help. Certainly it would help TOPS-20. Existing systems that use ARP are pretending that all all of the hosts are directly connected, so an ICMP redirect is going to be irrelevant to them. For Unix systems, ICMP redirect doesn't add much to what routed already provides (and indeed may even confuse it, if routed thinks it is managing the gateway tables). Circumstances where ICMP redirects could be generated are when a packet is sent to a gateway that knows it is not the best route. Len Bosack at Stanford suggests that gateways should have a command that says we are about to shut them down. In that case, they can start issuing ICMP redirects to an alternate. (However one has to be careful to avoid loops. If the alternate doesn't know you are shutting down, and it is a less prefered route, it may issue a redirect right back to you.)
JNC@MIT-XX.ARPA ("J. Noel Chiappa") (01/14/86)
Chuck: Some of the issues you raise (e.g. how do hosts find dead gateways) are already covered in the RFC's Dave Clark wrote for the Internet Implementor's Guide; the one you want is the one about fault isolation. The Gateway commitee is working on an RFC about Routing in the Host IP Layer which talks about the others (e.g. finding gateways, etc.) We want to insulate the hosts as much as possible from the details of routing since we are going to be changing that, so having routing tables sent to hosts is out. Remember also that whatever mechanisms we use have to also work on nets that do not support broadcast. Noel -------