drs@bnlux0.bnl.gov (David R. Stampf) (08/16/90)
We are experiencing a tough problem in x-country communications. A connection is set up between two unix hosts which are quite remote from each other. Between them are several networks, some of which are connected by cisco routers. This connection is in operation for a long period of time, then suddenly disappears, killing the session and annoying the users. This occurs intermittantly, >1 <20 times a day. Some detective work with ping and traceroute shows that an intermediate cisco router is (for no discernable reason) sending a "Host unreachable" icmp message at infrequent and unpredictable intervals. Querying the router does not indicate the loss of a route,nor the toggling of an interface. I've discussed this on other mailing lists, and found someone who had a similar problem and solved it by turning off the icmp unreachable messages on that interface. He did not understand why the messages appeared, but it did solve his problem. When I discussed this solution with the owners of the router, they didn't seem too enthusiastic about making such a change in that it may break traceroute (that being more important than telnet ;-)). So my questions are, 1) Does anybody know any other reasons for a cisco router to generate this message, and 2) (the hard question) how do you deal with a router many levels removed from you whose operation is having a major impact on your operation? Thanks for any help or enlightenment. < dave
satz@cisco.com (Greg Satz) (08/17/90)
>> We are experiencing a tough problem in x-country communications. A connection >> is set up between two unix hosts which are quite remote from each other. >> Between them are several networks, some of which are connected by cisco >> routers. This connection is in operation for a long period of time, then >> suddenly disappears, killing the session and annoying the users. This >> occurs intermittantly, >1 <20 times a day. >> >> Some detective work with ping and traceroute shows that an intermediate >> cisco router is (for no discernable reason) sending a "Host unreachable" >> icmp message at infrequent and unpredictable intervals. Querying the >> router does not indicate the loss of a route,nor the toggling of an >> interface. >> >> I've discussed this on other mailing lists, and found someone who had a >> similar problem and solved it by turning off the icmp unreachable messages >> on that interface. He did not understand why the messages appeared, but >> it did solve his problem. When I discussed this solution with the owners >> of the router, they didn't seem too enthusiastic about making such a change >> in that it may break traceroute (that being more important than telnet ;-)). >> >> So my questions are, 1) Does anybody know any other reasons for a cisco >> router to generate this message, and 2) (the hard question) how do you >> deal with a router many levels removed from you whose operation is having >> a major impact on your operation? cisco routers will return ICMP unreachables whenever a route a packet is destined for is not in the routing table. The most common cause for this is an interface is flapping. Using the command debug ip-routing (or debug routing on older software versions) will print out a message when routes enter and leave the routing table. You can configure this debugging information to be sent to a syslog daemon to watch it over the long term. The software will also send unreachables when an access list would prevent your packet from being forwarded. We will send an unreachable whenever we would drop a packet for any reason. This includes input queue overflow and output queue overflow (both are signs of congestion and is common when going from a fast network such as ethernet to a slow network such as a 56K line -- show interface will give you these counts). You can also enable debug icmp to see all ICMP messages sent and received by the router. The real question here is why are unreachables having such a deleterious effect on your operation? Those messages shouldn't cause any problem on a properly functioning TCP/IP implementation. We have "fixed" the problematic systems which reacted badly to ICMP unreachables here because we couldn't live with that behavior. If I can be of any further help, please drop a line to customer-service@cisco.com. Greg Satz cisco
oberman@rogue.llnl.gov (08/17/90)
In article <24867@boulder.Colorado.EDU>, satz@cisco.com (Greg Satz) writes: > The real question here is why are unreachables having such a deleterious > effect on your operation? Those messages shouldn't cause any problem on a > properly functioning TCP/IP implementation. We have "fixed" the problematic > systems which reacted badly to ICMP unreachables here because we couldn't > live with that behavior. Greg, The problem is that under 4.3BSD (Tahoe), when an ICMP host/net unavailable message is received, all sessions to the effected node are terminated. And Tahoe implementations are VERY common around here. While I don't know if it's our cisco, we have been having problems with this here. Attempts by remote sites to get large files from Tahoe systems often fail with reachability errors. I'm told the Reno release "fixes" this by not making an unreachable a fatal condition unless it persists. I hope to have my software patched to work this way in a couple of week or so. But, in the meantime, it's a real pain. R. Kevin Oberman Lawrence Livermore National Laboratory Internet: oberman@icdc.llnl.gov (415) 422-6955
morris@windom.UCAR.EDU (Don Morris) (08/17/90)
>> effect on your operation? Those messages shouldn't cause any problem on a >> properly functioning TCP/IP implementation. We have "fixed" the problematic >> Greg Satz >> cisco We have encountered the ICMP Host Unreachable problem causing connections to be aborted immediately also. So, does anybody have a list of those TCP/IP implementations that are "properly functioning" and those that aren't? -- Don --
dana@ferris.cray.com (Dana Dawson) (08/18/90)
I can say from direct experience that SunOS 3.5 systems will abort ALL connections to/from a remote system if the local system receives a "host unreachable" message for the remote host. We've fixed this by turning off the ICMP unreachables in a few of our routers, and everything (including traceroute) seems to work just fine. Dana Dawson Cray Research, Inc. dana@snoid.cray.com
stigall@ucs.indiana.edu (STIGALL ,JOHN ,BAC) (08/18/90)
I have had problems with sessions dropping on the DCA (Racal-Milgo) 355 TCP/IP gateway, when a T-1 circuit has a single error, causing the cisco serial line to have a "illegal frame address" error. I have set the Ethernet interface that has the DCA product on it to "no ip unreachables" with some success. This is version 5.10 of DCA (Racal-Milgo), which is not the current revision. I don't know yet if 5.20 has the bug fixed. That's the only one that I know of that is not implemented correctly. I have not found anything in the RFCs that address this scenario of the route being only momentarily down, but it seems like you wouldn't want to close the session at the first hint of trouble. :) John Stigall