lid@cernvax.UUCP (lid) (09/11/87)
We are experiencing a lot of problems with TCP/IP here and I would like to know if someone else out there has seen/solved similar problems. We have an Apollo ring with 2 TCP/IP gateways (Interlan NM10 + dsp[89]0) that connect to our Ethernet, all the machines on Ethernet know about both gateways just like all non-gateway Apollo do. Without a specific reason from time to time some ethernet hosts become unreachable from non-gateway Apollos (but they still work from the gateways), the message we get on the Apollos is "destination timed out". The routing information on Apollo is OK, some other ethernet hosts are still reachable. We almost always have problems with Wollongong TCP/IP on VAX/VMS and with Wiscnet TCP/IP on IBM/VM systems, never (almost) with Ultrix. It seems to me that the Apollo side is OK, I monitored a "ftp vaxvms" and saw: a) the request went thru the Apollo gateway, b) reached the vax (found out with Wollongong's netstat), c) the vax knew about both Apollo gateways, d) but the ftp'ing Apollo never got an answer. TCP/IP on the Vax was working just a couple of minutes before and nobody of the system people was working on it during that time. Just note that the Vax was still able to reach other ethernet hosts, including the Apollo gateway. Does all this resemble something you've already seen ? Any suggestion about what the problem can be ? It seems to me that when we only had 1 gateway things were working better, but I wouldn't bet on this because the TCP/IP usage is steadily growing on our site, so may it was working better just because of the smaller amount of traffic... As far as Apollo hosts are concerned the thing works as you would expect: if the first gateway in the routing table of the Apollo does not answer, the second one is used, providing automatic fall back for Apollo users. All the exercise started to provide more reliable and continued operations of TCP/IP !!! Thanks to all who can give me some hints. Achille Petrilli
krowitz@mit-richter.UUCP (David Krowitz) (09/11/87)
I have seen the error message "destination timed out" before here at MIT. The problem in the past seemed to be a problem with the routing tables on the ethernet host getting flushed periodically (probably because it hadn't seen any info [or any correct info] from the rip servers on the Apollos). What would happen is that the non-gateway Apollos would know to route the packets for the ethernet host via the Apollo gateway, but the ethernet host would lose the routing table that told it to reply to the non-gateway Apollos via the Apollo gateway. The ethernet host can always talk to the Apollo gateway because it is on the same network -- no routing info is needed. We solved this problem on our BSD 4.2 machines by forcing a static entry for the Apollo network into the routing tables of the ethernet hosts by using the command "route add apollo-net apollo-gw 1" in the machine's startup file (/etc/rc.local) Unlike the info provided by the rip server/routed programs, this entry does not get flushed if the ethernet host doesn't here any routing info from the Apollo gateway periodically. -- David Krowitz mit-erl!mit-kermit!krowitz@eddie.mit.edu mit-erl!mit-kermit!krowitz@mit-eddie.arpa krowitz@mit-mc.arpa (in order of decreasing preference)
giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (09/13/87)
I have two ucr's which cover this: 0d885985 and 0d885997 The answer to both of them say that this problem will be fixed with release 3.1 of TCP. 3.1 seems to be over due (ready to be released any day). You may want to bother your salesman about it. On my network, it is only the UNIX machines which get lost. I assume that the routed's are not communicating correctly somehow. I'd bet it to be a bug in UNIX.