[comp.dcom.sys.cisco] Dealing with remote routers

drs@bnlux0.bnl.gov (David R. Stampf) (08/16/90)

We are experiencing a tough problem in x-country communications. A connection
is set up between two unix hosts which are quite remote from each other.
Between them are several networks, some of which are connected by cisco
routers. This connection is in operation for a long period of time, then
suddenly disappears, killing the session and annoying the users. This
occurs intermittantly, >1 <20 times a day.

Some detective work with ping and traceroute shows that an intermediate
cisco router is (for no discernable reason) sending a "Host unreachable"
icmp message at infrequent and unpredictable intervals. Querying the 
router does not indicate the loss of a route,nor the toggling of an 
interface.

I've discussed this on other mailing lists, and found someone who had a
similar problem and solved it by turning off the icmp unreachable messages
on that interface. He did not understand why the messages appeared, but
it did solve his problem. When I discussed this solution with the owners
of the router, they didn't seem too enthusiastic about making such a change
in that it may break traceroute (that being more important than telnet ;-)).

So my questions are, 1) Does anybody know any other reasons for a cisco
router to generate this message, and 2) (the hard question) how do you
deal with a router many levels removed from you whose operation is having
a major impact on your operation?

	Thanks for any help or enlightenment.

	< dave

satz@cisco.com (Greg Satz) (08/17/90)

>> We are experiencing a tough problem in x-country communications. A connection
>> is set up between two unix hosts which are quite remote from each other.
>> Between them are several networks, some of which are connected by cisco
>> routers. This connection is in operation for a long period of time, then
>> suddenly disappears, killing the session and annoying the users. This
>> occurs intermittantly, >1 <20 times a day.
>> 
>> Some detective work with ping and traceroute shows that an intermediate
>> cisco router is (for no discernable reason) sending a "Host unreachable"
>> icmp message at infrequent and unpredictable intervals. Querying the 
>> router does not indicate the loss of a route,nor the toggling of an 
>> interface.
>> 
>> I've discussed this on other mailing lists, and found someone who had a
>> similar problem and solved it by turning off the icmp unreachable messages
>> on that interface. He did not understand why the messages appeared, but
>> it did solve his problem. When I discussed this solution with the owners
>> of the router, they didn't seem too enthusiastic about making such a change
>> in that it may break traceroute (that being more important than telnet ;-)).
>> 
>> So my questions are, 1) Does anybody know any other reasons for a cisco
>> router to generate this message, and 2) (the hard question) how do you
>> deal with a router many levels removed from you whose operation is having
>> a major impact on your operation?

cisco routers will return ICMP unreachables whenever a route a packet is
destined for is not in the routing table. The most common cause for this is
an interface is flapping. Using the command debug ip-routing (or debug
routing on older software versions) will print out a message when routes
enter and leave the routing table. You can configure this debugging
information to be sent to a syslog daemon to watch it over the long term.
The software will also send unreachables when an access list would prevent
your packet from being forwarded. We will send an unreachable whenever we
would drop a packet for any reason. This includes input queue overflow and
output queue overflow (both are signs of congestion and is common when
going from a fast network such as ethernet to a slow network such as a 56K
line -- show interface will give you these counts). You can also enable
debug icmp to see all ICMP messages sent and received by the router.

The real question here is why are unreachables having such a deleterious
effect on your operation? Those messages shouldn't cause any problem on a
properly functioning TCP/IP implementation. We have "fixed" the problematic
systems which reacted badly to ICMP unreachables here because we couldn't
live with that behavior.

If I can be of any further help, please drop a line to
customer-service@cisco.com.

Greg Satz
cisco

oberman@rogue.llnl.gov (08/17/90)

In article <24867@boulder.Colorado.EDU>, satz@cisco.com (Greg Satz) writes:
> The real question here is why are unreachables having such a deleterious
> effect on your operation? Those messages shouldn't cause any problem on a
> properly functioning TCP/IP implementation. We have "fixed" the problematic
> systems which reacted badly to ICMP unreachables here because we couldn't
> live with that behavior.

Greg, The problem is that under 4.3BSD (Tahoe), when an ICMP host/net
unavailable message is received, all sessions to the effected node are
terminated. And Tahoe implementations are VERY common around here. While I
don't know if it's our cisco, we have been having problems with this here.
Attempts by remote sites to get large files from Tahoe systems often fail with
reachability errors.

I'm told the Reno release "fixes" this by not making an unreachable a fatal
condition unless it persists. I hope to have my software patched to work this
way in a couple of week or so. But, in the meantime, it's a real pain.

					R. Kevin Oberman
					Lawrence Livermore National Laboratory
					Internet: oberman@icdc.llnl.gov
   					(415) 422-6955

morris@windom.UCAR.EDU (Don Morris) (08/17/90)

     >> effect on your operation? Those messages shouldn't cause any problem on a
     >> properly functioning TCP/IP implementation. We have "fixed" the problematic
     >> Greg Satz
     >> cisco
	
We have encountered the ICMP Host Unreachable problem causing connections
to be aborted immediately also.  So, does anybody have a list of those
TCP/IP implementations that are "properly functioning" and those that
aren't?

-- Don --

dana@ferris.cray.com (Dana Dawson) (08/18/90)

I can say from direct experience that SunOS 3.5 systems will abort ALL
connections to/from a remote system if the local system receives a
"host unreachable" message for the remote host.  We've fixed this by
turning off the ICMP unreachables in a few of our routers, and everything
(including traceroute) seems to work just fine.

Dana Dawson
Cray Research, Inc.
dana@snoid.cray.com

stigall@ucs.indiana.edu (STIGALL ,JOHN ,BAC) (08/18/90)

I have had problems with sessions dropping on the DCA (Racal-Milgo) 355
TCP/IP gateway, when a T-1 circuit has a single error, causing the cisco
serial line to have a "illegal frame address" error. I have set the 
Ethernet interface that has the DCA product on it to "no ip unreachables"
with some success. This is version 5.10 of DCA (Racal-Milgo), which is not
the current revision. I don't know yet if 5.20 has the bug fixed. That's 
the only one that I know of that is not implemented correctly. I have not
found anything in the RFCs that address this scenario of the route being
only momentarily down, but it seems like you wouldn't want to close the
session at the first hint of trouble. :)

John Stigall