dplatt@teknowledge-vaxc.ARPA (Dave Platt) (05/12/87)
I've run into some problems on a Sun 3/52 workstation running SunOS 3.2 that I've been told may involve IP packet fragmentation. The primary symptom is that SMTP mail deliveries "hang up" and abort with a read timeout. Background: my Sun is sitting on a 10 Mbit Ethernet with the default ifconfig for the Ethernet board; the MTU for the Ethernet interface is 1500 bytes. The system is configured so that packets destined for IP addresses not on our net are sent to our Vax 8650 (Ultrix 1.2), which ipforwards them to the Internet TIP. The MTU for the Vax's "imp0" interface is 1006 bytes. Problem: if a process on the Sun establishes a TCP connection with a peer running on a host somewhere on the Internet (e.g. an SMTP server), and then sends a large burst of data, the Sun will typically queue up about 4k of data in the TCP buffers at one time. This apparently results in the sending of an IP packet that approaches the Sun's 1500-byte MTU; when the packet passes through the Vax on its way to the IMP, it is apparently fragmented. Some system or gateway seems to drop the fragmented IP packet on the floor. The Sun's TCP never receives an acknowledgement for the TCP segment, retries the transmission periodically, and eventually aborts the connection. The problem typically occurs in the later stages of an SMTP session. The Sun's SMTP mailer is able to connect with its peer on another Internet host, go through the "MAIL FROM" and "RCPT TO" steps, and receive permission to send the message body. If the message is short (< 1k bytes), everything works fine; if it's too long, then the timeout occurs. This problem appears to occur only when the host I'm trying to connect with lies on a local-area net... and not all LANs are affected. I've been told that certain gateways are incapable of reassembling fragmented IP packets; other gateways seem to work just fine. Question for the gurus: is there any way to reconfigure my Sun's le0 interface so that its MTU doesn't exceed that of the 8650? If so, how do I do it? Or, is there a better solution to the problem? Or, finally, have I totally misunderstood the problem? advTHANKSance, Dave Platt Internet: dplatt@teknowledge-vaxc.arpa Usenet: {hplabs|sun|ucbvax|seismo|uw-beaver|decwrl}!teknowledge-vaxc.arpa!dplatt Voice: (415) 424-0500
hedrick@topaz.RUTGERS.EDU (Charles Hedrick) (05/13/87)
Now and then we run into machines that can't reassemble. Note that the 1006 limit on imp0 isn't a problem with the VAX. It is the limit allowed by the Arpanet. There are more elegant solutions, but if you don't have source, here is a program that will let you change the MTU on the fly. We have used it on both Pyramid and Sun, changing only the name of the kernel variable. I.e. the string "_il_softc", which is the name appropriate for il0 on the Pyramid. I just checked and it looks like _le_softc will work for a Sun 3/50. At least this will let you see whether your problem is really a reassembly problem. You should try "mtu 1006" or maybe some slightly smaller number. (We typically use 900 for testing.) #include <sys/types.h> #include <sys/stat.h> #include <a.out.h> #include <stdio.h> struct nlist nl[2]; short mtu; int kmem; struct stat statblock; char *kernelfile; main(argc,argv) char *argv[]; { if (argc < 2) { fprintf(stderr,"usage: mtu <n> {<kernelfile>}\n"); exit(2); } if ((kmem = open("/dev/kmem",2))<0) { perror("open /dev/kmem"); exit(1); } if (argc > 2) { kernelfile = argv[2]; } else { kernelfile = "/vmunix"; } if (stat(kernelfile,&statblock)) { fprintf(stderr,"%s not found.\n",kernelfile); exit(1); } initnlistvars(atoi(argv[1])); exit(0); } initnlistvars(on) register int on; { nl[0].n_un.n_name = "_il_softc"; nl[1].n_un.n_name = ""; nlist(kernelfile,nl); if (nl[0].n_type == 0) { fprintf(stderr, "%s: No namelist\n", kernelfile); exit(4); } (void) lseek(kmem,(nl[0].n_value)+6,0); if (read(kmem,&mtu,2) != 2) { perror("read kmem"); exit(5); } fprintf(stderr,"mtu was: %d is now: %d\n",mtu,on); (void) lseek(kmem,(nl[0].n_value)+6,0); mtu = on; if (write(kmem,&mtu,2) != 2) { perror("write kmem"); exit(6); } }
mike@BRL.ARPA (Mike Muuss) (05/18/87)
The problem is fixed by a change in the TCP Max Seg Size used on the connection. The algorithm for computing this in 4.2 BSD (and thus the SUN OS's) is rather simplistic, resulting in exactly the sort of difficulties you reported. A long time ago, I posted a few lines of code that fix this problem. Mike Karels improved them some more, and the correct behavior is now standard in 4.3 BSD UNIX. I'm certain it is a mater of time until SUN "integrates" this code into their product. -Mike
dplatt@teknowledge-vaxc.ARPA (Dave Platt) (05/29/87)
About three weeks ago I posted a query concerning an IP-fragmentation problem that I had encountered on my Sun workstation. I've received a really astounding amount of assistance from folks on the net, and have been able to zap the problem. Several people have asked me to summarize my findings and the answers I received from informed netfolks... so, here goes. - The original symptom of the problem was that SMTP connections would hang, and then abort with a network-read timeout, while sending large messages to a few hosts on the Internet. Other hosts (including those of the same type as the affected systems) were not affected. - Several people suggested that I check to ensure that my Ethernet interface was configured with the -trailers option (it is). - The problem was triggered by the fact that the MTU of my Sun's Ethernet interface (1500 bytes) was less than the MTU of our ARPANET gateway's IMP interface (1006 bytes). This situation caused the TCP/IP packets sent by my Sun to be fragmented as they passed through the gateway. - The fragmented packets would occasionally fail to be reassembled upon reception. Some hosts apparently don't implement IP-packet reassembly (or don't do it reliably). Also, I'm told that there is a bug in BSD 4.2 UNIX (and possibly in 4.3 as well) that prevents BSD systems from successfully fragmenting an already-fragmented IP packet. Thus, if a 1006-byte fragment from our net's gateway had to be refragmented to fit within the MTU of the destination host's network, the new fragments would be malformed and could not be successfully reassembled. - One method for working around the problem is to reduce the Sun's Ethernet MTU to <= 1006 bytes, so that our gateway won't have to fragment the packets. I was able to locate the constant 1500 in the "ether_attach()" function in /vmunix, and patch it down to 1000 bytes with adb; booting with the patched /vmunix resolved the problem. Charles Hedrick posted the source for a small program that can change the MTU of the interface "on the fly", and it also works like a charm; it's the method I'm now using. Reducing the Ethernet MTU increases the number of packets needed to complete NFS RPCs, and thus increases the overhead; NFS continues to work just fine. I've been warned that decreasing the MTU will probably break ND, but as I don't use it I don't really care. - Another method for fixing the problem is persuading TCP to use a smaller segment size, so that the packets that it sends will not exceed the 1006-byte limit. I tried patching the 1024-byte MSS in tcp_output() to a smaller size (512 bytes), but this did not appear to work. I'm not sure why, as I have no sources for the SunOS 3.2 version of BSD 4.2 TCP. Many people have pointed out that BSD 4.3 TCP makes a better choice of MSS, based on the MTU of the interface and on whether the packets will be routed through a gateway (a 512-byte MSS is used if the packets are sent to any non-local destination). The BSD 4.3 enhancements have been incorporated into SunOS 3.4, which is due to be shipped Real Soon Now according to our Sun sales-rep. I FTP'ed the BSD 4.3 source for TCP from seismo (thanks, rick!) and can see the additional logic; I haven't tried to retrofit the new TCP into SunOS 3.2 or patch in equivalent code due to lack of time and lack of urgency. So... I've got a good workaround for the problem (reducing the MTU), and the problem will go away once I install SunOS 3.4 with the BSD 4.3 enhancements to TCP. Happy ending. MANY thanks to all of the people on the net who have sent suggestions, hints, and reports of similar problems elsewhere!