mws%beta@LANL.GOV (Mitchel W Sukalski) (01/06/89)
I'm in a slight dilemma here, and I was wondering if anybody out in netland can help me? I'm looking for documentation (RFCs, IENs, etc.) that validates the behavior I'm going to describe. I have a router with two network interfaces. I'm trying to connect from machine A, on one of the networks, to machine B, on the other network. The router does not have an ARP entry for machine B, and when it receives the first packet from machine A, it sends a "host unreachable" destination unreachable ICMP message to machine A. Simultaneously, it generates an ARP request for machine B. Some of the folks I have talked to have stated that it is the duty of the receiving machine, machine A in this case, to ignore the host unreachable messages and use a timeout mechanism instead (reporting the error condition upon timeout, perhaps). As I see it, that makes sense for an active connection (to avoid reacting to a transient network problem), but upon opening a connection, an application, such as telnet, should take heed of the error immediately and let the user take the necessary recovery action. In any case, I have been unable to find any documentation that says the above behavior is within specification. I have found ample statements to the effect that the packet can be dropped, and an ARP request generated; then, it is up to the transport protocol to resend the packet. If the ARP module has not gotten a reply, then the router could send a host unreachable message. I'd appreciate any insights or references. Thanks in advance, Mitch Sukalski Communications Group, C-4 Los Alamos National Laboratory
hedrick@geneva.rutgers.edu (Charles Hedrick) (01/06/89)
I think it's very wierd to say "host unreachable" because you don't have an ARP table entry. In my view unreachable should mean that you really can't get to the host, not that your internal cache needs refreshing. There's no question that it is a bug for hosts to break a connection when they get an unreachable while a connection is open. There may be a transient routing problem that will clear up shortly. The recommended approach is to save the error message but keep retrying. If you end up timing out, then you print the error based on the most recent unreachable. Thus the unreachable does accomplish something: it gives you some information about what is wrong, which is useful in generating an intelligent error message. But you sure don't want to break the connection the first time you get an unreachable. (Doing so is a common bug in TCP/IP implementations, however.) I've heard claims that when you are first setting up a connection, you should pay attention to unreachables. A user who is in the middle of a connection likely has time invested in his context and is willing to wait in hopes of continuing. But when you're first trying a connection, you don't, and it's silly to make people wait for a minute to time out of a connection attempt when you know that there's no route available to the destination. That sounds sort of plausible to me, but I think I might still wait for a timeout. I my opinion, both the gateway and the host are behaving suboptimally in your example. I'm not sure they are flatly wrong, but I'd try to get them changed. I think the host should ignore the unreachables, or use them only as a basis for error messages. And I think the gateway shouldn't issue them for a missing ARP table entry. (Indeed probably it shouldn't issue them at all, because of the number of buggy host implementations.)
subbu@hpindda.HP.COM (MCV Subramaniam) (01/07/89)
Mitchel, I think ICMP unreachable message should not be generated if the gateway can find a route to the destination. If there is no routing entry by which the gateway can access the destination, then the gateway should not ARP for the destination at all, but just send the Unreachable message to the source. Hope this helps. -Subbu
slevy@UC.MSC.UMN.EDU ("Stuart Levy") (01/07/89)
A year or so ago on this list, someone suggested that "Host Unreachable" should be returned when there was no response after some number of ARP retries. On a network where there's no direct indication that a packet was received, this seems as good a method as any to detect dead hosts. I don't believe it would be a layering violation, any more than translating ARPAnet "Host Dead" status into "Host Unreachable" is -- it's just implementing a well-defined network layer service. On the other hand, to send "Host Unreachable" immediately just because there's no ARP response in the cache is clearly broken. It should make some attempt to contact the host before saying this. Stuart Levy