dplatt@teknowledge-vaxc.ARPA (Dave Platt) (05/12/87)
I've run into some problems on a Sun 3/52 workstation running SunOS 3.2 that I've been told may involve IP packet fragmentation. The primary symptom is that SMTP mail deliveries "hang up" and abort with a read timeout. Background: my Sun is sitting on a 10 Mbit Ethernet with the default ifconfig for the Ethernet board; the MTU for the Ethernet interface is 1500 bytes. The system is configured so that packets destined for IP addresses not on our net are sent to our Vax 8650 (Ultrix 1.2), which ipforwards them to the Internet TIP. The MTU for the Vax's "imp0" interface is 1006 bytes. Problem: if a process on the Sun establishes a TCP connection with a peer running on a host somewhere on the Internet (e.g. an SMTP server), and then sends a large burst of data, the Sun will typically queue up about 4k of data in the TCP buffers at one time. This apparently results in the sending of an IP packet that approaches the Sun's 1500-byte MTU; when the packet passes through the Vax on its way to the IMP, it is apparently fragmented. Some system or gateway seems to drop the fragmented IP packet on the floor. The Sun's TCP never receives an acknowledgement for the TCP segment, retries the transmission periodically, and eventually aborts the connection. The problem typically occurs in the later stages of an SMTP session. The Sun's SMTP mailer is able to connect with its peer on another Internet host, go through the "MAIL FROM" and "RCPT TO" steps, and receive permission to send the message body. If the message is short (< 1k bytes), everything works fine; if it's too long, then the timeout occurs. This problem appears to occur only when the host I'm trying to connect with lies on a local-area net... and not all LANs are affected. I've been told that certain gateways are incapable of reassembling fragmented IP packets; other gateways seem to work just fine. Question for the gurus: is there any way to reconfigure my Sun's le0 interface so that its MTU doesn't exceed that of the 8650? If so, how do I do it? Or, is there a better solution to the problem? Or, finally, have I totally misunderstood the problem? advTHANKSance, Dave Platt Internet: dplatt@teknowledge-vaxc.arpa Usenet: {hplabs|sun|ucbvax|seismo|uw-beaver|decwrl}!teknowledge-vaxc.arpa!dplatt Voice: (415) 424-0500
hedrick@topaz.RUTGERS.EDU (Charles Hedrick) (05/13/87)
Now and then we run into machines that can't reassemble. Note that the 1006 limit on imp0 isn't a problem with the VAX. It is the limit allowed by the Arpanet. There are more elegant solutions, but if you don't have source, here is a program that will let you change the MTU on the fly. We have used it on both Pyramid and Sun, changing only the name of the kernel variable. I.e. the string "_il_softc", which is the name appropriate for il0 on the Pyramid. I just checked and it looks like _le_softc will work for a Sun 3/50. At least this will let you see whether your problem is really a reassembly problem. You should try "mtu 1006" or maybe some slightly smaller number. (We typically use 900 for testing.) #include <sys/types.h> #include <sys/stat.h> #include <a.out.h> #include <stdio.h> struct nlist nl[2]; short mtu; int kmem; struct stat statblock; char *kernelfile; main(argc,argv) char *argv[]; { if (argc < 2) { fprintf(stderr,"usage: mtu <n> {<kernelfile>}\n"); exit(2); } if ((kmem = open("/dev/kmem",2))<0) { perror("open /dev/kmem"); exit(1); } if (argc > 2) { kernelfile = argv[2]; } else { kernelfile = "/vmunix"; } if (stat(kernelfile,&statblock)) { fprintf(stderr,"%s not found.\n",kernelfile); exit(1); } initnlistvars(atoi(argv[1])); exit(0); } initnlistvars(on) register int on; { nl[0].n_un.n_name = "_il_softc"; nl[1].n_un.n_name = ""; nlist(kernelfile,nl); if (nl[0].n_type == 0) { fprintf(stderr, "%s: No namelist\n", kernelfile); exit(4); } (void) lseek(kmem,(nl[0].n_value)+6,0); if (read(kmem,&mtu,2) != 2) { perror("read kmem"); exit(5); } fprintf(stderr,"mtu was: %d is now: %d\n",mtu,on); (void) lseek(kmem,(nl[0].n_value)+6,0); mtu = on; if (write(kmem,&mtu,2) != 2) { perror("write kmem"); exit(6); } }
jonab@CAM.UNISYS.COM (Jonathan P. Biggar) (05/14/87)
Don't change the MTU on your network interface. What you want to do is change tcp to never send segments that are larger than the mtu of the Arpanet. If you change the MTU on your interface, you will mess up any ND or NFS access you may have. Jon Biggar jonab@cam.unisys.com
jas@MONK.PROTEON.COM (John A. Shriver) (05/14/87)
The SunOS TCP will choose to put 1024 bytes of data in each packet unless the socket receive high water mark is lower (so_rcv.sb_hiwat). This is straight out of the 4.2BSD VAX code, without any change. (At least as of SunOS 3.2.) Indeed, this will result in the IP packets being fragmented on the ARPANET, which is a lose. IP fragment reassembly is far less robust than TCP reassembly. This code is fixed in 4.3BSD, where it sends large packets only to hosts on the same net (LAN), and otherwise limits istelf to 576 byte packets. The same code also allows the data to open up beyond 1024 bytes if you have a LAN with large MTU. This can dramatically increase local TCP performance. Bother your Sun technical support contact to encourage them to fix this. It involves adding one subroutine (tcp_mss()), and tweaking tcp_output(). As for tweaking the MTU, I don't think that it will hurt NFS, as it is already sending 8192 byte UDP packets that are being fragmented by the IP layer. I have no idea what effect it will have on ND, since ND is proprietary. However, better to fix the problem (TCP) than to have to crock around it (MTU).
rick@SEISMO.CSS.GOV (Rick Adams) (05/15/87)
I can provide you with the source the the 4.3BSD tcp as hacked to run with the Sun 4.2 IP. It makes a tremendous difference in performance. It often is the difference between making a connection or not being able to connect at all. Based on the following, I am assuming that you don't even need a source license. (Right Mike?) ---rick From: karels@monet.berkeley.edu (Mike Karels) Message-Id: <8605142343.AA09396@monet.Berkeley.EDU> To: CERF@usc-isi.arpa Cc: tcp-ip@sri-nic.arpa Subject: Re: C implementations of TCP/IP In-Reply-To: Your message of 13 May 86 22:13:00 EDT. Date: Wed, 14 May 86 16:43:02 PDT The Berkeley 4.2/4.3BSD TCP/IP code is written in C. It's not quite public domain (it is copyright by the university), but the only restriction on its use is that the University of California be credited. Mike
brady@MACOM4.ARPA (Sean Brady) (05/15/87)
>I can provide you with the source the the 4.3BSD tcp as hacked to run with >the Sun 4.2 IP. It makes a tremendous difference in performance. >It often is the difference between making a connection or not being able to >connect at all. If you do have the source, would you be so kind as to allow me to use it? I am currently in need of doing some tcp work on a 4.2 Sun, and I am having the usual difficulties. A copy of an improved tcp would be most appreciated. Sean
swb@DEVVAX.TN.CORNELL.EDU (Scott Brim) (05/17/87)
There's one other thing to check, which is rather simple. What you describe sounds exactly like the symptoms we used to get with hosts trying to send IP trailers through gateways. Be sure you have "-trailers" in your ifconfig. Scott
dplatt@teknowledge-vaxc.ARPA (Dave Platt) (05/29/87)
About three weeks ago I posted a query concerning an IP-fragmentation problem that I had encountered on my Sun workstation. I've received a really astounding amount of assistance from folks on the net, and have been able to zap the problem. Several people have asked me to summarize my findings and the answers I received from informed netfolks... so, here goes. - The original symptom of the problem was that SMTP connections would hang, and then abort with a network-read timeout, while sending large messages to a few hosts on the Internet. Other hosts (including those of the same type as the affected systems) were not affected. - Several people suggested that I check to ensure that my Ethernet interface was configured with the -trailers option (it is). - The problem was triggered by the fact that the MTU of my Sun's Ethernet interface (1500 bytes) was less than the MTU of our ARPANET gateway's IMP interface (1006 bytes). This situation caused the TCP/IP packets sent by my Sun to be fragmented as they passed through the gateway. - The fragmented packets would occasionally fail to be reassembled upon reception. Some hosts apparently don't implement IP-packet reassembly (or don't do it reliably). Also, I'm told that there is a bug in BSD 4.2 UNIX (and possibly in 4.3 as well) that prevents BSD systems from successfully fragmenting an already-fragmented IP packet. Thus, if a 1006-byte fragment from our net's gateway had to be refragmented to fit within the MTU of the destination host's network, the new fragments would be malformed and could not be successfully reassembled. - One method for working around the problem is to reduce the Sun's Ethernet MTU to <= 1006 bytes, so that our gateway won't have to fragment the packets. I was able to locate the constant 1500 in the "ether_attach()" function in /vmunix, and patch it down to 1000 bytes with adb; booting with the patched /vmunix resolved the problem. Charles Hedrick posted the source for a small program that can change the MTU of the interface "on the fly", and it also works like a charm; it's the method I'm now using. Reducing the Ethernet MTU increases the number of packets needed to complete NFS RPCs, and thus increases the overhead; NFS continues to work just fine. I've been warned that decreasing the MTU will probably break ND, but as I don't use it I don't really care. - Another method for fixing the problem is persuading TCP to use a smaller segment size, so that the packets that it sends will not exceed the 1006-byte limit. I tried patching the 1024-byte MSS in tcp_output() to a smaller size (512 bytes), but this did not appear to work. I'm not sure why, as I have no sources for the SunOS 3.2 version of BSD 4.2 TCP. Many people have pointed out that BSD 4.3 TCP makes a better choice of MSS, based on the MTU of the interface and on whether the packets will be routed through a gateway (a 512-byte MSS is used if the packets are sent to any non-local destination). The BSD 4.3 enhancements have been incorporated into SunOS 3.4, which is due to be shipped Real Soon Now according to our Sun sales-rep. I FTP'ed the BSD 4.3 source for TCP from seismo (thanks, rick!) and can see the additional logic; I haven't tried to retrofit the new TCP into SunOS 3.2 or patch in equivalent code due to lack of time and lack of urgency. So... I've got a good workaround for the problem (reducing the MTU), and the problem will go away once I install SunOS 3.4 with the BSD 4.3 enhancements to TCP. Happy ending. MANY thanks to all of the people on the net who have sent suggestions, hints, and reports of similar problems elsewhere!