[comp.protocols.tcp-ip] IP Bandwidth limits

cjohnson@somni.wpd.sgi.com (Chris Johnson) (01/11/91)

	Well, there is a data rate limit for TCP/IP,
	but it isn't window size dependent.  The
	sixteen bit IP id field and the 16 bit max
	packet length limit a particular connection
	to 4GB/255 seconds or about 16MB/sec.

					cj*

thomas@uppsala.telesoft.se (Thomas Tornblom) (01/11/91)

In article <80719@sgi.sgi.com> cjohnson@somni.wpd.sgi.com (Chris Johnson) writes:

	   Well, there is a data rate limit for TCP/IP,
	   but it isn't window size dependent.  The
	   sixteen bit IP id field and the 16 bit max
	   packet length limit a particular connection
	   to 4GB/255 seconds or about 16MB/sec.


Sorry if I'm being overly ignorant but I can't make sense out this.
Where does the time come into this as a limiting factor other than the
available bandwidth of the underlying hardware?

Thomas
-- 
Real life:      Thomas Tornblom             Email:  thomas@uppsala.telesoft.se
Snail mail:     Telesoft Uppsala AB         Phone:  +46 18 189406
                Box 1218                    Fax:    +46 18 132039
                S - 751 42 Uppsala, Sweden

zweig@cs.uiuc.edu (Johnny Zweig) (01/12/91)

cjohnson@somni.wpd.sgi.com (Chris Johnson) writes:
>	Well, there is a data rate limit for TCP/IP,
>	but it isn't window size dependent.  The
>	sixteen bit IP id field and the 16 bit max
>	packet length limit a particular connection
>	to 4GB/255 seconds or about 16MB/sec.

For one thing, 255 seconds seems like a long TTL but whatever. There is _not_
a requirement for IP that ID numbers be unique over the time frame of one TTL
(or 2*TTL or whatever). ID numbers are for fragment reassembly -- no fragments
means no bandwidth limitation (so as long as I guaranteee my super-zippy net
has an MTU of 65536 octets I am okay).

I admit this is dangerous, in the sense that if some moron comes along and
starts fragmenting in a bizarre way such that fragments aren't reassembled
soon enough and their IDs get confused (a multi-megabyte LIFO buffer?) things
will break, but I think it is important to keep in mind that just because IP
was designed around very general interoperability requirements it is broken.

-Johnny IP

rpw3@rigden.wpd.sgi.com (Rob Warnock) (01/14/91)

In article <80719@sgi.sgi.com> cjohnson@pei.com (Chris Johnson) writes:
+---------------
| Well, there is a data rate limit for TCP/IP,
| but it isn't window size dependent.
+---------------

...in a zero-delay network. But in the presence of any round-trip delay,
TCP *is* rate-limited to approximately one window-size per round trip.
For example, on a maximally-configured FDDI network, the idle-net token
rotation time is 1.6 milliseconds, or "20,000 bytes". Under some reasonable
assumptions (e.g., hosts are very fast, but not infinitely so), one can
show that this will limit TCP speed to about (16/(16+20))*12.5 = 6.9 MByte/s
for a single connection between a pair of hosts (assuming all other hosts
are idle).

By sending more than the usual two ACKs per window (usual anti-"silly window"
strategy), one can get ACKs for data close to the "end" of the window sent
with the same token rotation as the data itself, and data rates closer to
10-12 MB/s could be obtained. (Even with very fast hosts and FDDI interfaces,
I assume a few packets near the end on the trasmit burst will not get ACKed
in time to go out with the "current" token rotation.)

Of course, most FDDI rings will not be "maximally" configured, and will
have less idle-ring token delays. But there are other media with substantial
delay-bandwidth products (e.g. long-haul lines, satellites), and this is
why RFC 1072 was promulgated.

+---------------
| The sixteen bit IP id field and the 16 bit max
| packet length limit a particular connection
| to 4GB/255 seconds or about 16MB/sec.  | cj*
+---------------

...iff all packets have an initial TTL of 255. If one knows (somehow) that
the true needed TTL is lower (it's usually *much* lower!), the TTL rate
limit is higher (usually *much* higher). For example, a TTL of 15 seconds
yields a TTL-limited rate of ~286 MB/s.


-Rob

-----
Rob Warnock, MS-9U/515		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

dab@BERSERKLY.CRAY.COM (David Borman) (01/15/91)

> From tcp-ip-RELAY@NIC.DDN.MIL Fri Jan 11 23:00:40 1991
> Date: 10 Jan 91 23:17:18 GMT
> From: sgi!cjohnson%somni.wpd.sgi.com@ucbvax.Berkeley.EDU  (Chris Johnson)
> Organization: Silicon Graphics, Inc., Mountain View, CA
> Subject: IP Bandwidth limits (was Re: TCP window size restriction)
> References: <9101091020.AA08870@techops.cray.com>, <THOMAS.91Jan10103915@uplog.uppsala.telesoft.se>
> Sender: tcp-ip-relay@nic.ddn.mil
> To: tcp-ip@nic.ddn.mil
> 
> 
> 	Well, there is a data rate limit for TCP/IP,
> 	but it isn't window size dependent.  The
> 	sixteen bit IP id field and the 16 bit max
> 	packet length limit a particular connection
> 	to 4GB/255 seconds or about 16MB/sec.
> 
> 					cj*
> 

Gosh, thanks.  I guess I shouldn't believe my memory to memory TCP
tests (through the software loopback driver on a Cray YMP computer) that
show that I've run a TCP stream at 795 mbits/second..., and over 360
mbits/second between machines, across an 800 mbit/second channel.  (See
the article "High Speed Networking at Cray Research" in the next issue
of CCR for more info.)

There is no theoretical limit to how fast TCP can run.  Period.  End
of discussion.  However, there are physical limiting factors on how
fast a specific TCP/IP connection can run:

1) You can't go any faster than then the speed of the slowest link
   in the path.  (pretty obvious...)

2) You can't go any faster then the memory bandwidth of the slowest
   machine involved  (assuming you have a highly tuned implementation
   that only requires one pass over the data, slower if you don't).

3) You can't go any faster than the maximum TCP window offered by
   the receiver divided by the round-trip-time, because you can never
   send more than one entire TCP window per RTT.  (Once you've sent
   the entire window, you've got to wait for that ACK...)  The maximum
   TCP window is 64K-1 bytes.  With the expanded TCP window option
   (RFC 1072), the maximum TCP window is 1.07 gbits ((64K-1)*(2^14)).

Now, can we stop all these erroneous messages about limits on the speed
of TCP?

			-David Borman, dab@cray.com

BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) (01/16/91)

Hmm.  If I understand the arguments correctly, the ident/sequence
number limitations are only theoretical - if your network hangs on to
fragments or packets for long periods of time, and then delivers them,
they could lead to all sorts of interesting failures - fragments from
different packets getting reassembled together, or old segments being
interpretted as recent.

The latter case sound pretty catastrophic, but the fragmentation
problem seems less serious - putting together the wrong fragments
should simply result in a TCP checksum error.

Note that these problems don't actually limit the throughput of TCP,
they just limit the throughput below which you are assured reliable
transport in spite of arbitrary network failure modes.

I don't know what the impact of the fact the the IP TTL is rarely used
as a time would be on this whole mess, but I suspect that it isn't
good.  (Eg, setting the TTL to 255 doesn't guarentee your packet will
be dead 255 seconds from now in todays networks...)

BillW
-------

mni@techops.cray.com (Michael Nittmann) (01/17/91)

Mea culpa!

Please!

RTT is under no control whatsoever neither by sender 
nor by receiver.
This holds also for the trivial "net" of two hosts.
Why:

Satellite: RTT is totally determined by the signal propagation.
Stationary orbit means about 80.000 km travel. With electronic 
delays this is then about 1 sec. ONLY THEN you could (but it
makes no sense for quantification!) say: 1 max window in the
net. But even that's no limitation if the receiver acks 
early so that the window slides and not concatenates end to end
(try to explain queuing theory inplain language).

Two hosts back to back. Take Sun WS: the ethernet card
of a Sun 3 gives about 4Mbit/s throughput. BUT: you do not
control memory accessibility, processor activity etc.
of the receiver. RTT may have a very big variance, big
enough to make talking of an "average" senseless (although
you can calculate one) since the average time may by
far be NOT the most frequent timelag occurring.
Take a disk at the receiver. It might be scattered
in free sectors. Then not PING RTT but receiver's I/O
determines RTT. RTT cannot be determined by taking some
TTL counts in seconds (who brings that up, TTL in 
TCP/IP is hops and should be internet diameter times
two to account for enough hops for the packet to 
propagate through the net on a fairly direct path).
And RTT does not determine or limit throughput
(early ack e.g.).
The case of one max. window in propagation on the
net is an interesting special case. 

By the way, I had to reply to the net 
because the original author was not reachable
since his path was not listed in nodes my mail
passes. Next time I will not reply to the net
if some obvious wrong posting is there. 


Could we please take this now off the list?
Please mail to me, not to the list.


Michael




And the disclaimer!!!!


:wq

henry@zoo.toronto.edu (Henry Spencer) (01/17/91)

In article <12654151296.12.BILLW@mathom.cisco.com> BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) writes:
>The latter case sound pretty catastrophic, but the fragmentation
>problem seems less serious - putting together the wrong fragments
>should simply result in a TCP checksum error.

Well, do remember that the TCP checksum is only 16 bits, so you have one
chance in 65536 of getting a bad packet.  This becomes non-trivial if
you are exchanging millions of packets and fragmentation rears its ugly
head a lot.  (Which probably means you have other problems, but...)
-- 
If the Space Shuttle was the answer,   | Henry Spencer at U of Toronto Zoology
what was the question?                 |  henry@zoo.toronto.edu   utzoo!henry

cjohnson@somni.wpd.sgi.com (Chris Johnson) (01/17/91)

It is always interesting to see the level of response two sentences
can generate.  I got a mailbox full from many Internet mavens and
maven wannabes.

I said (paraphrased slightly): 
	The data rate limit for TCP/IP isn't window size dependent.

The responses to this generally said "there is a window size bandwidth
limit", which in turn caused a flurry of "no there isn't a limit" notes.

I should have said:

	There are performance bottlenecks in TCP as specified in
	rfc793, but TCP options have been developed that address
	these problems.  As rfc1072 and rfc1185 point out, data
	rates over certain thresholds *require* extensions to
	rfc793.  To this extent, interoperability among vendors at
	data rates beyond the FDDI range is only possible if these
	TCP "options" become requirements, or if other mechanisms
	are developed and standardized that address the problems
	discussed in rfc1072 and  rfc1185.

	Note that these options are not mentioned in the host
	requirement doc, so as of today TCP has bandwidth limits
	in the FDDI range.  Certain implementations exceed these
	limits using the optional extensions to the base protocol.

Of course this will bother the mavens, too, but it is accurate.

I also said:
	The sixteen bit IP id field and the 16 bit max packet
	length limit a particular connection to 4GB/255 seconds
	or about 16MB/sec.

The responses to this were more entertaining (all paraphrased):

	Some suggested data to the contrary:
		> What about Famous Person at Acme Data Co who got
		> umpteen gigaunits per picoblip?

The point here is that ip ids in svr4 and BSD come from
	ip->ip_id = htons(ip_id++);
which is incorrect according to the IP spec.  So the high data rates
probably come from non-conforming IP implementations.  Perhaps the
designers decided that a subset of IP was all that people needed,
but that design decision never seems to get mentioned.  As data rates
go toward terabytes/sec, the above bug will get more severe.

	Some argued my choice of parameters:
		> No one uses a ttl of 255

Of course, using a smaller ttl will move the bottleneck, but that
has its limit.  Also, in reality very few media support 64k packets,
so the "real world" cases modify both the numerator and denominator
of the ratio.  Do your own calculations for your favorite numbers.
FDDI (4KB packets)  with a ttl of 30 yields a maximum of 8.9MB/sec.
Even less if you can't fill every packet to the brim.

	Higher level can detect it:
		> So what?  Mis-reassembled IP packets will be
		> detected by checksum failure, window range tests
		> or rfc1185 sequence wrap check.

The suggestion that TCP will detect the problem relaxes the IP
specification from general purpose routing and fragmentation, to
routing and fragmentation in the presence of higher layer fragment
assembly validation checks.  Of course IP implementations are
still broken, just not too broken for TCP.

	TCP should do routing:
		> TCP should use path mtu discovery and then
		> fragging is irrelevant.

This says that IP isn't broken because TCP should know about
routing/fragmentation tasks.  This is an appealing argument because
it addresses the actual flaw (that IP fragging is brain dead) but it
violates layering in a particularly violent manner.  And again, it
ignores the fact that other layers may be using IP's services.

	And the last catagory, the INET Jihad:
		> How dare you complain about items designed by
		> your betters.
		>
		> Don't you realize that tcp-ip is fighting the
		> forces of darkness and these petty complaints
		> just help OSI.
		>
		> These issues shouldn't be discussed in public
		> forums because naive users get confused.

The religous arguments were the most entertaining, but had the
least content.  

Here's what I should have said in my first message:
	There is another bottleneck at the IP layer that is
	unresolved as yet.  The spec in rfc791 *requires* that
	the (IPid, protocol, src host, dest host) quadruple is
	unique for a MPL.  At this time, most reference (SVR4 and
	BSD-reno) IP implementations DO NOT enforce this
	restriction, which may result in data corruption at the IP
	layer in rare cases at sufficiently high data rates.
	Some of these errors may be detected at the transport
	layer, but senarios can be defined in which application
	layers will receive stale or mangled data.

Finally, I think this discussion is overkill on an issue that
falls out on simple math from the protocol definition.  But
for some reason a single sentence was inadequate, so here is
a longer analysis.  As the summary says, facts is facts, so IP
has a data limit even if your implementation doesn't.  In
particular, if your IP implementation doesn't have a rate
limit, then it isn't 100% compliant with rfc791.

Keep those cards and letters coming in,
					cj*