[comp.protocols.tcp-ip] TCP maximum segment size determination

ORCHARD/BRUC@SCARECROW.WAISMAN.WISC.EDU (Bruce Orchard) (11/06/87)

Consider a TCP connection passing through 3 networks: An Ethernet,
the ARPANET, and another Ethernet.  There are gateways between
each Ethernet and the ARPANET.  At connection time, each host must
choose a maximum segment size for the TCP segments it transmits.
The selection algorithm is generally to take the minimum of the
maximum size allowed by the receiver (given in the TCP maximum
segment size option), the maximum size allowed by the network the
host is connected to (in this case, an Ethernet), and the maximum
the transmitting host can handle.  Allowing integral multiples of
the local network maximum would be reasonable too.  All this has
to allow enough space for headers.

Now the maximum size allowed by an Ethernet is 1500 bytes, but the
maximum allowed by the ARPANET is 1007 bytes.  If a maximum
segment size greater than 1007 is picked by the transmitting end,
the gateway going into the ARPANET will fragment the message.  A
segment of 1500 bytes would get split into one of about 1000 bytes
and another of about 500 bytes.  One particularly poor choice I
have seen used is a maximum segment size of 1024 bytes.  Since the
1024 bytes excludes headers, this results in one fragment of 1007
bytes and another 77 bytes.

The real problem is that the transmitting end has no knowledge of
the limits of networks it is not connected to.  Therefore, I
propose adding a new option to the IP header.  This option would
give the minimum of the maximum transmission units of any network
that handled the packet.  The originating end would set it to a
large value.  Each node that transmitted the packet would compare
the value given in the option to the maximum transmission unit of
the outgoing network.  If the network value were less, the value
in the option would be reduced to the network value.

This IP header option would be used by TCP on the packet that
includes the TCP maximum segment size option.  The receiving TCP
would consider both the maximum allowed by its peer TCP (in the
TCP option) and the maximum allowed by any network along the way.
It would probably take the lessor of the two.

One limitation of this proposal is that all packets of a TCP
connection do not necessarily pass through the same networks.
Actually, given the way the networks are connected, all packets
usually go through the same networks.  Also, if one packet takes a
different route from an earlier packet, the second route could be
on the same kind of network (for example, two parallel Ethernets).
Regardless, the consequence of a poor choice is reduced
throughput, not failure.

Bruce Orchard
University of Wisconsin-Madison

P.S.  Is the MTU of SATNET really 256 bytes, as given in IEN 192?

JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") (11/10/87)

The lower the performance of your network interface, the more trouble *any*
fragmentation means to you.  On PCs, we try to eliminate fragmentation by
specifying a small MSS when routing via any gateway (subnets-are-local
would be nice to do, but we haven't yet).  The IP option you propose would
help, but not until all gateways handled it properly.

If gateway gurus saw their way clear to do so, they might help some fraction
of the world by arranging that IP fragments aren't transmitted consecutively
(if there is other traffic to handle) or by inserting a little time delay
if the Ether or other non-serial media is idle.  Presently, fragmenting an
IP datagram is the best simple way I know to determine how close together a
given hardware/software combination can send packets.  If the gateway goes
faster than the host can handle, suddenly it is time for a TCP retransmit...

I can't say when/where I heard this, but I always thought that SATNET had an
MTU of 128 bytes.

jbvb

PAP4@AI.AI.MIT.EDU ("Philip A. Prindeville") (11/11/87)

My apologies if someone has already thought of this, but mail to my site is
being delayed by up to 5 days, and seems to arrive in random order.  But,
here goes anyway:

    Date:  Thu, 05 Nov 87 21:31:54 CST
    From:  Bruce Orchard <ORCHARD/BRUC@scarecrow.waisman.wisc.edu>
    Subject: TCP maximum segment size determination

    [ ... ]
    large value.  Each node that transmitted the packet would compare
    the value given in the option to the maximum transmission unit of
    the outgoing network.  If the network value were less, the value
    in the option would be reduced to the network value.

    [ ... ]
    One limitation of this proposal is that all packets of a TCP
    connection do not necessarily pass through the same networks.
    Actually, given the way the networks are connected, all packets
    usually go through the same networks.  Also, if one packet takes a
    different route from an earlier packet, the second route could be
    on the same kind of network (for example, two parallel Ethernets).
    Regardless, the consequence of a poor choice is reduced
    throughput, not failure.

What about the required overhead for gateways and routers to have
to further inspect each packet?  It could be optimized so that only TCP
packets are inspected, but still, that would seem to add to the burden of
possibly compute-bound gateways...

    P.S.  Is the MTU of SATNET really 256 bytes, as given in IEN 192?

Could be worse, could be the 128 byte MTU of most X.75 implementations...

-Philip

chris@GYRE.UMD.EDU.UUCP (11/12/87)

There is a good standard argument against setting the MSS via an IP
option, and that is that the route the SYN packet takes is not
necessarily the same as the route that other packets will take.  (In
practise, I think we see a fair number of routes that,
diagrammatically, look like this:

			net1
	 /------>f1------------>f2-----\
	|				v
	X				Y
	^				|
	 \-------g1<------------g2<----/
			net2

where the return path is consistently different from the originating
path.  And of course, since the Internet does not rely on virtual
circuits, it can reroute dynamically, invalidating MSSes on the fly.)

4.3BSD sets the MSS to 576 (which becomes 536 data bytes) when
crossing a gateway.  This is not necessarily ideal but is the
official recommended practise.

Chris

mogul@DECWRL.DEC.COM (Jeffrey Mogul) (11/14/87)

ORCHARD/BRUC@SCARECROW.WAISMAN.WISC.EDU (Bruce Orchard) writes:
    I propose adding a new option to the IP header.  This option would
    give the minimum of the maximum transmission units of any network
    that handled the packet.  The originating end would set it to a
    large value.  Each node that transmitted the packet would compare
    the value given in the option to the maximum transmission unit of
    the outgoing network.  If the network value were less, the value
    in the option would be reduced to the network value.

This is one of the several options proposed in 	"Fragmentation
Considered Harmful", by Chris Kent and myself, presented at the
SIGCOMM '87 Workshop this past August.  I understood that the proceedings
would be distributed to members of SIGCOMM, but so far I have not
seen anything except the unpaginated version distributed at the
workshop.  Chris and I are preparing a slightly expanded version
which should be available as a tech report sometime in the next few
months.

Although I think this is a great idea (and some day we'll take the
proposal given in the paper and turn it into an RFC) it's not real
likely that it is practical in the IP Internet, given the low likelihood
of changing enough hosts and gateways to make it work.  Instead, we
recommend simply biting the bullet and using 576 bytes whenever you're
not absolutely sure where a packet is going.  576 bytes isn't always
fragmentation-proof, but it's a reasonable compromise.

    P.S.  Is the MTU of SATNET really 256 bytes, as given in IEN 192?

It's probably slightly less.  I've had a hard time discovering the
exact value.

-Jeff

craig@NNSC.NSF.NET (Craig Partridge) (11/16/87)

Bruce,

    Last week I volunteered at the IETF meeting to write a proposal for
just such an IP option.  You should see it within a couple of weeks
(seeing it implemented may take a while longer...)

    By the way, the scheme is sound even if the path changes if you
treat the IP option and the TCP MSS values as distinct.

    I.e. in the TCP MSS you should advertise the maximum segment
size you wish to accept and the remote end should keep this value
and separately keep track of what IP reports.  You would use the minimum
of the two MSS's reported when sending. Then if you get an indication
that the route has changed (such as an ICMP redirect), you can send the
IP option again, and update the effective MSS (up or down).  There's
still the problem of packets following different paths -- this may
have a solution but I'm still looking for something that doesn't feel
like a kludge of a three way handshake.

Craig

gnu@hoptoad.uucp (John Gilmore) (11/22/87)

JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") wrote:
> If gateway gurus saw their way clear to do so, they might help some fraction
> of the world by arranging that IP fragments aren't transmitted consecutively
> (if there is other traffic to handle) or by inserting a little time delay
> if the Ether or other non-serial media is idle.

It's hard to believe that in this age of utterly cheap dense RAM,
otherwise sane people are proposing inserting artificial delays between
Ethernet packets because a lowball vendor wouldn't put, say, TWO
buffers on their card!

The free market has a clear solution to this problem...

[I admit I'm prejudiced, since I worked on Suns, which currently seem
to have the highest Ethernet thruput, but they were built out of
standard Ethernet chips and DRAMs available to everyone.  You too can
handle infinite back to back packets, if you just design with that in
mind as Sun did.]
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
Love your country but never trust its government.
		      -- from a hand-painted road sign in central Pennsylvania

cetron@CS.UTAH.EDU (Edward J Cetron) (11/24/87)

In article <3370@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>standard Ethernet chips and DRAMs available to everyone.  You too can
>handle infinite back to back packets, if you just design with that in
>mind as Sun did.]

	Is this the same sun that, when it receives two or more back to
back rarp packets, simply discards all but the last??? I guess the hardware
can catch the packets, the software just can't keep up....

-ed cetron
cetron@cs.utah.edu

backman@interlan.UUCP (Larry Backman) (11/25/87)

>It's hard to believe that in this age of utterly cheap dense RAM,
>otherwise sane people are proposing inserting artificial delays between
>Ethernet packets because a lowball vendor wouldn't put, say, TWO
>buffers on their card!
>
>[I admit I'm prejudiced, since I worked on Suns, which currently seem
>to have the highest Ethernet thruput, but they were built out of
>standard Ethernet chips and DRAMs available to everyone.  You too can
>handle infinite back to back packets, if you just design with that in
>mind as Sun did.]
>

	But whhhat about all those old, old Suns, the ones without back to
	back capacity.  What about the tens of thousands of NI5010's or
	3C501's.  People bought them, have them, are working with them
	daily even though no double buffering is available.  Its not just
	a querstion of being a lowball vendor,  reality is the fact that
	old outmode hardware is out there and used!  Software must be
	cognizent that it will not always run in ideal situations.  In
	fact, I define good softwarre as that which can perform adaquately
	under less than ideal environmental situations.

						Larry Backman
						Micom - Interlan