[mod.protocols.tcp-ip] UDP vs. ICMP dest unreachable messages

narten@PURDUE.EDU.UUCP (02/10/87)

This note is prompted by the observation that lots of nameservers
still contact usc-isib (10.3.0.52), even though the machine no longer
exists.

A related problem that is appearing a lot more now has to do with
reactions in general to ICMP dest unreachable messages. Now that name
server traffic is picking up, it really hurts to see UDP
implementations ignoring ICMP errors. It is not uncommon to see half a
dozen (or more) packets sent to some remote nameserver even though the
first packet causes an ICMP dest unreachable to be returned. In some
cases this is not too serious (but still undesirable), since the
message comes from a gateway one hop away with a LAN in between. Other
times, the message comes from some distant gateway on the other side
of the ARPANET.

The problem with letting UDP see these errors lies with the stateless
nature of datagram delivery at that level. With TCP, there has to be a
connection block that the error packet can be matched up against.
Hence, TCP can do something intelligent (but often doesn't). With UDP,
the packet gets sent with no guarantees about delivery.  Furthermore
the user process might well be sending data to several destinations
via the same "socket", and it is not clear how to return errors to the
user.  I see three basic approaches.

1) Do nothing, a favorite among current implementations.

2) Pass errors back to the user process. This is hard to do, since the
kernel may well have no idea of what process sent the packet. In some
cases, the kernel would have to keep a log of all UDP packets it sends
in order to pass back errors to the user.

3) Cache errors in the routing tables for short periods of time. This
can be done by adding a flag to route table entries that says "Not
really reachable".  That way, the user would not get an error on the
first packet, but the retransmission of that packet could cause the
route lookup to note that the destination is unreachable and the user
could be informed.  This would be a significant improvement because
now the user process could elect to use a different address as the
jprimary. Furthermore, the user process could elect not to use the "bad
route" for a long time (say 30 minutes or several hours), long after
the record of the unreachable message has been flushed from routing
tables. 

This has the desired effect of:

1) Processes using UDP can get feedback about unreachable 
destinations.

2) It doesn't drastically change the semantics of the UDP interface.
E.g. the user is not notified asyncronously or forced to ask
explicitely whether a route works. On return from the sendpacket()
routine, a flag could be returned.  In addition, sending packets to an
unreachable destination doesn't have to mean that the packet didn't
get sent, it just means "I got an dest unreachable a while ago. It is
not likely that the packet will get there". The user can choose to
ignore this (though he/she really shouldn't).

3) The length of time dest unreachable messages are cached can (and
probably needs to be) an adjustable parameter. It may well be that
caching such a message for 10-30 seconds would be sufficient to cut
down on the number of useless packets sent, yet would not keep users
from reaching hosts that were down but just came back up.

4) Programs don't have to rely on timeouts to decide that a host or
list of hosts is unreachable. This will often times give users a
quicker response. 

Comments?

Thomas

hedrick@TOPAZ.RUTGERS.EDU.UUCP (02/11/87)

You suggest that the kernel should remember destination unreachable
messages, and not bother to try again for some time.  The problem
with this is that there are often transient routing problems.  If
you try again, things might actually work.  Until the core gets
more reliable, I would rather retry. Indeed for a while we
intentionally broke our TCP code so that it would keep trying when
it got destination unreachable, instead of aborting the connection.
This helped us keep connnections up to certain hosts.

brescia@CCV.BBN.COM.UUCP (02/11/87)

> (you can ignore an 'unreachable' because it may indicate a transient
>  routing problem) [paraphrased - mb]

You really need to be advocating to look at the subcodes returned by ICMP
destiination unreachable, because you can usually trust the 'host dead' type
returned from some gateways when trying to talk to hosts on arpanets.  Yes,
you would do well to ignore ICMP net unreachable if you suspect routing
flurries (often the case nowadays).  With UDP domain lookups however, could
you not use that as an indication to try another address, even if you keep
retransmitting to the original one?  You need not worry about "bothering" too
many servers in this case, because the 'unreachable' is a response which tells
you that you did not reach that server.

Also, you should be explicit in your reasoning about the 'port unreachable'
subcode.  Do you mean to try again because the server too busy and did not get
another server listen up again, or give up because there is not now nor will
there ever be a server at that host (because the service host changed).

I think you should use the broadcast approach for connection setup, since you
supposedly don't care which of the equivalent servers you reach.  If, for
example, you try to contact one from the set { A, B, C }, and you get an
unreachable from A, try B next, and only forget A if the reply code was 'host
dead'.

Of course, your implementation on the arpanet (AHIP) interface does recognize
arpanet host-dead messages, doesn't it?

mike

karels%okeeffe@UCBVAX.BERKELEY.EDU.UUCP (02/11/87)

ICMP unreachable messages are reported to users of UDP sockets in 4.3
if and only if the socket is "connected"; that is, that the remote
address is bound as well as the local address.  Otherwise, it is unreasonable
to report errors even though the local address matches that in an ICMP
error message.  The error may well refer to a datagram other than the most
recently sent, in which case it is likely to be confusing at best.

This is used in the UNIX resolver code to detect the abscence of a local
server; it depends on receiving the "port unreachable" error.  On the other
hand, the same binding causes late messages from one server to be discarded
after "connecting" to the next of a series of choices.  This isn't a problem
in the standard installation, with only one server choice (the server on the
same host).  The UNIX nameserver does not take advantage of ICMP error returns,
in part because it runs multi-threaded, processing other requests while
awaiting a reply to a recursive query.  However, recent additions to the BIND
server will enable it to measure response time of multiple servers for a domain.
It will then choose the fastest server, which will not include one that
was recently unreachable if there are alternatives.

Recent questions about the ordering of root servers in BIND configuration
files are no longer interesting.  Current servers use the configuration
file to reach the root servers initially, which they then query about
the root domain.  That information is then used as long as it is valid.

		Mike

Mills@LOUIE.UDEL.EDU.UUCP (02/12/87)

Charlie,

I know the TOPS-20s have been sorting ICMP messages to the right processes
for years, since that's where I got the idea to do the same thing in the
fuzzballs. Having said that, it's too bad the users at the top of the
TOPS-20 protocol stack don't see the information itself - say in TELNET
or FTP.

Dave

brady@DCN9.ARPA.UUCP (02/12/87)

I came in on this discussion a little late, so pardon me if I'm 
a little off topic...

> The problem with this is that there are often transient routing 
> problems.  If you try again, things might actually work.  Until 
> the core gets more reliable, I would rather retry. Indeed for a 
> while we intentionally broke our TCP code so that it would keep 
> trying when it got destination unreachable, instead of aborting 
> the connection.  This helped us keep connnections up to certain 
> hosts.

If you adopt this practice, you negate the purpose of the message. 
So why is it sent in the first place? In the long run, ignoring 
control messages like these could undermine any sort of development
on the internet, particularly in relation to gateway to gateway 
communications. It may seem that some benefit is gained in certain
instances from ignoring unreachable messages. But if there is to 
be a "standard" protocol, such a change would have to be beneficial
(or at least non-detrimental) to the majority of the cases. I believe
that in most cases, the control messages are a necessary factor in 
the control of needless congestion across an already strained internet.


							-Sean