sdempsey@UCSD.EDU (Steve Dempsey) (01/19/91)
Warning!
People prone to nervous breakdowns when dealing with network problems
are advised to stop reading this now!
-------------
Hardware:
4D/340VGX with IO2 i/o controller
4D/25TG
Software:
all machines at 3.3.1
Situation:
When I attempt to transmit a file from our 340 to a remote site with
'ftp' or 'rcp' the transfer is either very slow or terminates with a lost
connection. I then disconnect the 340 from the net by unplugging the
drop line from the 340 I/O panel and plug the drop line into the I/O
panel of the 4D/25TG, to insure that both machines use the same physical
net connection. Now I attempt the same file transfer from the 4D/25TG
and it goes without a hitch.
What I Know So Far:
I have monitored our subnet to watch the packets. What I see is that
packets from the 340 appear with varying frequency. The time between
delayed packets will double from 1 to 2, then 4, 8, 16, 32, and finally
64 seconds will elapse between packets. On occasion six consecutive
packets will be delayed by 64 seconds, at which point the connection is
summarily dropped. More frequently the timeout resets back to 1 second
and then starts doubling again. When the 4D/25TG is used all of the
packets are sent within a second or two without any unusual delays.
The only clue I have from the 340's point of view is that if I run
'netstat -p tcp' while the transfer is in progress I see these counts
incrementing:
6241 retransmit timeouts
11 connections dropped by rexmit timeout
This problem has been reported to the HOTLINE (call # H2774) on 10Jan91
but no answers yet. Our FE tried replacing the IO2 board but it had no
effect.
A 240GTX on the same subnet also experiences these delays, and several
other PIs also on the same subnet have no problems.
The Big Question:
What is different about the 240/340 and 25 that would account for this
behavior?
---------------------------------------------------------------------------
Steve Dempsey voice: (619) 534-0208
Dept. of Chemistry Computer Facility, 0314 UUCP: ucsd!sdempsey
University of Calif. at San Diego BITNET: sdempsey@ucsd
9500 Gilman Drive INTERNET: sdempsey@ucsd.edu
La Jolla, CA 92093-0314 fax: (619) 534-0058vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (01/19/91)
In article <9101182254.AA12177@chem.chem.ucsd.edu>, sdempsey@UCSD.EDU (Steve Dempsey) writes: > .... > The Big Question: > What is different about the 240/340 and 25 that would account for this > behavior? They're very similar designs based on 7990's, as you no doubt noticed if you examined the chips on the IO2 and the 4D25. It would pay to check for errors with `netstat -i`, and for the tired old ethernet complaints on the consoles (e.g. "late collisions" on the machines in question or on any other IRIS on any ethernet between the ends of the FTP transfer). Perhaps there is some kind of grounding or other difference that makes one of the machines unable to hear the ACK's from the remote machine. (E.g. the frame grounding differences among Ethernet 1, 2, and 802.3 cables and transcievers.) I also seem to recall some differences in the 802.3/e1/e2 transformers between the IO2 and the 4D25. It might pay to switch cables and transcievers. Vernon Schryver, vjs@sgi.com
sdempsey@UCSD.EDU (Steve Dempsey) (01/24/91)
Let me try asking a specific question since I'm not getting anywhere
with this problem. What causes "retransmit timeouts" on a 340 using the IO2
board as reported by 'netstat -p tcp'?:
tcp:
2241271 packets sent
1942725 data packets (1701239978 bytes)
11970 data packets (14359783 bytes) retransmitted
239995 ack-only packets (227522 delayed)
278 URG only packets
943 window probe packets
41099 window update packets
4261 control packets
2017922 packets received
1393024 acks (for 1701333400 bytes)
20860 duplicate acks
0 acks for unsent data
887096 packets (221013020 bytes) received in-sequence
1592 completely duplicate packets (85826 bytes)
248 packets with some dup. data (993 bytes duped)
7435 out-of-order packets (5046742 bytes)
54 packets (5 bytes) of data after window
5 window probes
40083 window update packets
27 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
1553 connection requests
1532 connection accepts
3024 connections established (including accepts)
3477 connections closed (including 26 drops)
174 embryonic connections dropped
1378640 segments updated rtt (of 1389598 attempts)
>> 7373 retransmit timeouts
>> 12 connections dropped by rexmit timeout
354 persist timeouts
334 keepalive timeouts
30 keepalive probes sent
0 connections dropped by keepalive
---------------------------------------------------------------------------
Steve Dempsey voice: (619) 534-0208
Dept. of Chemistry Computer Facility, 0314 UUCP: ucsd!sdempsey
University of Calif. at San Diego BITNET: sdempsey@ucsd
9500 Gilman Drive INTERNET: sdempsey@ucsd.edu
La Jolla, CA 92093-0314 fax: (619) 534-0058vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (01/24/91)
In article <9101232123.AA17793@chem.chem.ucsd.edu>, sdempsey@UCSD.EDU (Steve Dempsey) writes: > Let me try asking a specific question since I'm not getting anywhere > with this problem. What causes "retransmit timeouts" on a 340 using the IO2 > board as reported by 'netstat -p tcp'?: > ... Not receiving ack's from the other end. Sorry, but that's the most that can be inferred from the symptom. Possible reasons for not receiving stuff from the other end are many. They include wild backhoes, broken wires, people pushing reset buttons, improperly installed ethernets, and broken hardware or software. Vernon Schryver, vjs@sgi.com