ORCHARD/BRUC@SCARECROW.WAISMAN.WISC.EDU (Bruce Orchard) (11/06/87)
Consider a TCP connection passing through 3 networks: An Ethernet, the ARPANET, and another Ethernet. There are gateways between each Ethernet and the ARPANET. At connection time, each host must choose a maximum segment size for the TCP segments it transmits. The selection algorithm is generally to take the minimum of the maximum size allowed by the receiver (given in the TCP maximum segment size option), the maximum size allowed by the network the host is connected to (in this case, an Ethernet), and the maximum the transmitting host can handle. Allowing integral multiples of the local network maximum would be reasonable too. All this has to allow enough space for headers. Now the maximum size allowed by an Ethernet is 1500 bytes, but the maximum allowed by the ARPANET is 1007 bytes. If a maximum segment size greater than 1007 is picked by the transmitting end, the gateway going into the ARPANET will fragment the message. A segment of 1500 bytes would get split into one of about 1000 bytes and another of about 500 bytes. One particularly poor choice I have seen used is a maximum segment size of 1024 bytes. Since the 1024 bytes excludes headers, this results in one fragment of 1007 bytes and another 77 bytes. The real problem is that the transmitting end has no knowledge of the limits of networks it is not connected to. Therefore, I propose adding a new option to the IP header. This option would give the minimum of the maximum transmission units of any network that handled the packet. The originating end would set it to a large value. Each node that transmitted the packet would compare the value given in the option to the maximum transmission unit of the outgoing network. If the network value were less, the value in the option would be reduced to the network value. This IP header option would be used by TCP on the packet that includes the TCP maximum segment size option. The receiving TCP would consider both the maximum allowed by its peer TCP (in the TCP option) and the maximum allowed by any network along the way. It would probably take the lessor of the two. One limitation of this proposal is that all packets of a TCP connection do not necessarily pass through the same networks. Actually, given the way the networks are connected, all packets usually go through the same networks. Also, if one packet takes a different route from an earlier packet, the second route could be on the same kind of network (for example, two parallel Ethernets). Regardless, the consequence of a poor choice is reduced throughput, not failure. Bruce Orchard University of Wisconsin-Madison P.S. Is the MTU of SATNET really 256 bytes, as given in IEN 192?
JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") (11/10/87)
The lower the performance of your network interface, the more trouble *any* fragmentation means to you. On PCs, we try to eliminate fragmentation by specifying a small MSS when routing via any gateway (subnets-are-local would be nice to do, but we haven't yet). The IP option you propose would help, but not until all gateways handled it properly. If gateway gurus saw their way clear to do so, they might help some fraction of the world by arranging that IP fragments aren't transmitted consecutively (if there is other traffic to handle) or by inserting a little time delay if the Ether or other non-serial media is idle. Presently, fragmenting an IP datagram is the best simple way I know to determine how close together a given hardware/software combination can send packets. If the gateway goes faster than the host can handle, suddenly it is time for a TCP retransmit... I can't say when/where I heard this, but I always thought that SATNET had an MTU of 128 bytes. jbvb
PAP4@AI.AI.MIT.EDU ("Philip A. Prindeville") (11/11/87)
My apologies if someone has already thought of this, but mail to my site is being delayed by up to 5 days, and seems to arrive in random order. But, here goes anyway: Date: Thu, 05 Nov 87 21:31:54 CST From: Bruce Orchard <ORCHARD/BRUC@scarecrow.waisman.wisc.edu> Subject: TCP maximum segment size determination [ ... ] large value. Each node that transmitted the packet would compare the value given in the option to the maximum transmission unit of the outgoing network. If the network value were less, the value in the option would be reduced to the network value. [ ... ] One limitation of this proposal is that all packets of a TCP connection do not necessarily pass through the same networks. Actually, given the way the networks are connected, all packets usually go through the same networks. Also, if one packet takes a different route from an earlier packet, the second route could be on the same kind of network (for example, two parallel Ethernets). Regardless, the consequence of a poor choice is reduced throughput, not failure. What about the required overhead for gateways and routers to have to further inspect each packet? It could be optimized so that only TCP packets are inspected, but still, that would seem to add to the burden of possibly compute-bound gateways... P.S. Is the MTU of SATNET really 256 bytes, as given in IEN 192? Could be worse, could be the 128 byte MTU of most X.75 implementations... -Philip
chris@GYRE.UMD.EDU.UUCP (11/12/87)
There is a good standard argument against setting the MSS via an IP
option, and that is that the route the SYN packet takes is not
necessarily the same as the route that other packets will take. (In
practise, I think we see a fair number of routes that,
diagrammatically, look like this:
net1
/------>f1------------>f2-----\
| v
X Y
^ |
\-------g1<------------g2<----/
net2
where the return path is consistently different from the originating
path. And of course, since the Internet does not rely on virtual
circuits, it can reroute dynamically, invalidating MSSes on the fly.)
4.3BSD sets the MSS to 576 (which becomes 536 data bytes) when
crossing a gateway. This is not necessarily ideal but is the
official recommended practise.
Chris
mogul@DECWRL.DEC.COM (Jeffrey Mogul) (11/14/87)
ORCHARD/BRUC@SCARECROW.WAISMAN.WISC.EDU (Bruce Orchard) writes:
I propose adding a new option to the IP header. This option would
give the minimum of the maximum transmission units of any network
that handled the packet. The originating end would set it to a
large value. Each node that transmitted the packet would compare
the value given in the option to the maximum transmission unit of
the outgoing network. If the network value were less, the value
in the option would be reduced to the network value.
This is one of the several options proposed in "Fragmentation
Considered Harmful", by Chris Kent and myself, presented at the
SIGCOMM '87 Workshop this past August. I understood that the proceedings
would be distributed to members of SIGCOMM, but so far I have not
seen anything except the unpaginated version distributed at the
workshop. Chris and I are preparing a slightly expanded version
which should be available as a tech report sometime in the next few
months.
Although I think this is a great idea (and some day we'll take the
proposal given in the paper and turn it into an RFC) it's not real
likely that it is practical in the IP Internet, given the low likelihood
of changing enough hosts and gateways to make it work. Instead, we
recommend simply biting the bullet and using 576 bytes whenever you're
not absolutely sure where a packet is going. 576 bytes isn't always
fragmentation-proof, but it's a reasonable compromise.
P.S. Is the MTU of SATNET really 256 bytes, as given in IEN 192?
It's probably slightly less. I've had a hard time discovering the
exact value.
-Jeff
craig@NNSC.NSF.NET (Craig Partridge) (11/16/87)
Bruce, Last week I volunteered at the IETF meeting to write a proposal for just such an IP option. You should see it within a couple of weeks (seeing it implemented may take a while longer...) By the way, the scheme is sound even if the path changes if you treat the IP option and the TCP MSS values as distinct. I.e. in the TCP MSS you should advertise the maximum segment size you wish to accept and the remote end should keep this value and separately keep track of what IP reports. You would use the minimum of the two MSS's reported when sending. Then if you get an indication that the route has changed (such as an ICMP redirect), you can send the IP option again, and update the effective MSS (up or down). There's still the problem of packets following different paths -- this may have a solution but I'm still looking for something that doesn't feel like a kludge of a three way handshake. Craig
gnu@hoptoad.uucp (John Gilmore) (11/22/87)
JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") wrote: > If gateway gurus saw their way clear to do so, they might help some fraction > of the world by arranging that IP fragments aren't transmitted consecutively > (if there is other traffic to handle) or by inserting a little time delay > if the Ether or other non-serial media is idle. It's hard to believe that in this age of utterly cheap dense RAM, otherwise sane people are proposing inserting artificial delays between Ethernet packets because a lowball vendor wouldn't put, say, TWO buffers on their card! The free market has a clear solution to this problem... [I admit I'm prejudiced, since I worked on Suns, which currently seem to have the highest Ethernet thruput, but they were built out of standard Ethernet chips and DRAMs available to everyone. You too can handle infinite back to back packets, if you just design with that in mind as Sun did.] -- {pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu gnu@toad.com Love your country but never trust its government. -- from a hand-painted road sign in central Pennsylvania
cetron@CS.UTAH.EDU (Edward J Cetron) (11/24/87)
In article <3370@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >standard Ethernet chips and DRAMs available to everyone. You too can >handle infinite back to back packets, if you just design with that in >mind as Sun did.] Is this the same sun that, when it receives two or more back to back rarp packets, simply discards all but the last??? I guess the hardware can catch the packets, the software just can't keep up.... -ed cetron cetron@cs.utah.edu
backman@interlan.UUCP (Larry Backman) (11/25/87)
>It's hard to believe that in this age of utterly cheap dense RAM, >otherwise sane people are proposing inserting artificial delays between >Ethernet packets because a lowball vendor wouldn't put, say, TWO >buffers on their card! > >[I admit I'm prejudiced, since I worked on Suns, which currently seem >to have the highest Ethernet thruput, but they were built out of >standard Ethernet chips and DRAMs available to everyone. You too can >handle infinite back to back packets, if you just design with that in >mind as Sun did.] > But whhhat about all those old, old Suns, the ones without back to back capacity. What about the tens of thousands of NI5010's or 3C501's. People bought them, have them, are working with them daily even though no double buffering is available. Its not just a querstion of being a lowball vendor, reality is the fact that old outmode hardware is out there and used! Software must be cognizent that it will not always run in ideal situations. In fact, I define good softwarre as that which can perform adaquately under less than ideal environmental situations. Larry Backman Micom - Interlan