wisner@hayes.fai.alaska.edu (Bill Wisner) (11/30/89)
I sent a message to this august assemblage recently complaining that FTP on our entire campus has an unpleasant habit of hanging in the middle of transfers. It now turns out that the problem is much more general. Any TCP connection may hang under certain unknown circumstances. Background: we have two subnets. The main subnet is populated by a VMS machine (with TWG WINS TCP) and many PCs and Macs running NCSA Telnet. It is connected with a Proteon router to the world at large. The other subnet has a another VAX (with TWG) and a slew of Sun workstaions; it is connected by a second Proteon router to the main subnet. TCP links from any campus machine to any machine off campus are prone to failure. (The same is true of connections from off-campus to on-.) I've used SunOS's etherfind utility to try to find the problem. I have hex dumps of an rlogin connection and an FTP transfer, both of which hung. (A certain message in my mailbox on a remote machine will cause my connection to hang every time if I merely read it. It contains no obvious clues -- it's just another normal looking message.) I could find nothing in the hex dumps. No packets were truncated; one packet arrived and the next one didn't. The hung connections seem to be data-specific. If I try transferring the same file twice, both FTP connections are likely to hang at the same point in the file. As an exercise, I took a file and split it into several small chunks, then tried transferring it. The connection still hung at the same point in the fragmented file. I have come to believe that perhaps the problem resides in the Proteon box that connects us to the Internet-at-large. It is, after all, the one common factor shared by all campus machines. But I am at a loss to figure out just what the problem might be. Any ideas or suggestions would be greatly appreciated. <wisner@hayes.fai.alaska.edu>
meggers@orion.oac.uci.edu (mark eggers) (12/01/89)
Bill, One thing that may be biting you is a particular bit pattern. If you have marginal connections to the rest of the world, or if your CSU/DSUs are a bit flakey, they can mangle certain bit patterns. We had this happen with a connection on CERFnet from here at UCI to SWRL (Cal State University net). PacBell and Brian Roode (another member of our network team) found the flakey link between Seal Beach and Long Beach. I think that they found it by doing extensive BERT (bit error rate tests) tests. Since you seem to be on the track of this problem, you might try to artificially generate the bit pattern in a file (a quick and dirty C program), and then do a binary FTP to another system. If the FTP hangs, then you have some cause to suspect that bit pattern. You might want the circuit provider to then run a BERT test using the suspected pattern and watch for errors. At that point, you can then start replacing things to bring the error rate down. Of course, this is just a guess (with 2 hours of sleep at that ;-) ). Good luck - Mark Eggers, Network Communications Analyst University of California, Irvine email: meggers@uci.edu
henry@utzoo.uucp (Henry Spencer) (12/03/89)
In article <8911301020.AA14662@hayes.fai.alaska.edu> wisner@hayes.fai.alaska.edu (Bill Wisner) writes: >The hung connections seem to be data-specific. If I try transferring the >same file twice, both FTP connections are likely to hang at the same >point in the file. As an exercise, I took a file and split it into several >small chunks, then tried transferring it. The connection still hung at the >same point in the fragmented file. You should probably pursue this further to pin down the exact data pattern that causes the problem. This might yield some insight. -- Mars can wait: we've barely | Henry Spencer at U of Toronto Zoology started exploring the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu