sdempsey@UCSD.EDU (Steve Dempsey) (01/19/91)
Warning! People prone to nervous breakdowns when dealing with network problems are advised to stop reading this now! ------------- Hardware: 4D/340VGX with IO2 i/o controller 4D/25TG Software: all machines at 3.3.1 Situation: When I attempt to transmit a file from our 340 to a remote site with 'ftp' or 'rcp' the transfer is either very slow or terminates with a lost connection. I then disconnect the 340 from the net by unplugging the drop line from the 340 I/O panel and plug the drop line into the I/O panel of the 4D/25TG, to insure that both machines use the same physical net connection. Now I attempt the same file transfer from the 4D/25TG and it goes without a hitch. What I Know So Far: I have monitored our subnet to watch the packets. What I see is that packets from the 340 appear with varying frequency. The time between delayed packets will double from 1 to 2, then 4, 8, 16, 32, and finally 64 seconds will elapse between packets. On occasion six consecutive packets will be delayed by 64 seconds, at which point the connection is summarily dropped. More frequently the timeout resets back to 1 second and then starts doubling again. When the 4D/25TG is used all of the packets are sent within a second or two without any unusual delays. The only clue I have from the 340's point of view is that if I run 'netstat -p tcp' while the transfer is in progress I see these counts incrementing: 6241 retransmit timeouts 11 connections dropped by rexmit timeout This problem has been reported to the HOTLINE (call # H2774) on 10Jan91 but no answers yet. Our FE tried replacing the IO2 board but it had no effect. A 240GTX on the same subnet also experiences these delays, and several other PIs also on the same subnet have no problems. The Big Question: What is different about the 240/340 and 25 that would account for this behavior? --------------------------------------------------------------------------- Steve Dempsey voice: (619) 534-0208 Dept. of Chemistry Computer Facility, 0314 UUCP: ucsd!sdempsey University of Calif. at San Diego BITNET: sdempsey@ucsd 9500 Gilman Drive INTERNET: sdempsey@ucsd.edu La Jolla, CA 92093-0314 fax: (619) 534-0058
vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (01/19/91)
In article <9101182254.AA12177@chem.chem.ucsd.edu>, sdempsey@UCSD.EDU (Steve Dempsey) writes: > .... > The Big Question: > What is different about the 240/340 and 25 that would account for this > behavior? They're very similar designs based on 7990's, as you no doubt noticed if you examined the chips on the IO2 and the 4D25. It would pay to check for errors with `netstat -i`, and for the tired old ethernet complaints on the consoles (e.g. "late collisions" on the machines in question or on any other IRIS on any ethernet between the ends of the FTP transfer). Perhaps there is some kind of grounding or other difference that makes one of the machines unable to hear the ACK's from the remote machine. (E.g. the frame grounding differences among Ethernet 1, 2, and 802.3 cables and transcievers.) I also seem to recall some differences in the 802.3/e1/e2 transformers between the IO2 and the 4D25. It might pay to switch cables and transcievers. Vernon Schryver, vjs@sgi.com
sdempsey@UCSD.EDU (Steve Dempsey) (01/24/91)
Let me try asking a specific question since I'm not getting anywhere with this problem. What causes "retransmit timeouts" on a 340 using the IO2 board as reported by 'netstat -p tcp'?: tcp: 2241271 packets sent 1942725 data packets (1701239978 bytes) 11970 data packets (14359783 bytes) retransmitted 239995 ack-only packets (227522 delayed) 278 URG only packets 943 window probe packets 41099 window update packets 4261 control packets 2017922 packets received 1393024 acks (for 1701333400 bytes) 20860 duplicate acks 0 acks for unsent data 887096 packets (221013020 bytes) received in-sequence 1592 completely duplicate packets (85826 bytes) 248 packets with some dup. data (993 bytes duped) 7435 out-of-order packets (5046742 bytes) 54 packets (5 bytes) of data after window 5 window probes 40083 window update packets 27 packets received after close 0 discarded for bad checksums 0 discarded for bad header offset fields 0 discarded because packet too short 1553 connection requests 1532 connection accepts 3024 connections established (including accepts) 3477 connections closed (including 26 drops) 174 embryonic connections dropped 1378640 segments updated rtt (of 1389598 attempts) >> 7373 retransmit timeouts >> 12 connections dropped by rexmit timeout 354 persist timeouts 334 keepalive timeouts 30 keepalive probes sent 0 connections dropped by keepalive --------------------------------------------------------------------------- Steve Dempsey voice: (619) 534-0208 Dept. of Chemistry Computer Facility, 0314 UUCP: ucsd!sdempsey University of Calif. at San Diego BITNET: sdempsey@ucsd 9500 Gilman Drive INTERNET: sdempsey@ucsd.edu La Jolla, CA 92093-0314 fax: (619) 534-0058
vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (01/24/91)
In article <9101232123.AA17793@chem.chem.ucsd.edu>, sdempsey@UCSD.EDU (Steve Dempsey) writes: > Let me try asking a specific question since I'm not getting anywhere > with this problem. What causes "retransmit timeouts" on a 340 using the IO2 > board as reported by 'netstat -p tcp'?: > ... Not receiving ack's from the other end. Sorry, but that's the most that can be inferred from the symptom. Possible reasons for not receiving stuff from the other end are many. They include wild backhoes, broken wires, people pushing reset buttons, improperly installed ethernets, and broken hardware or software. Vernon Schryver, vjs@sgi.com