[comp.protocols.tcp-ip] Summary of responses to my tcp-ip performance query

retrac@titan.rice.edu (John Carter) (02/17/88)
Hello,

    Back in late November I posted the following request for information
to the network.  I had intended to post this summary of the responses that
I received, but I forgot.  What follows is my original posting followed by
the responses that I received.  Hope people find this to be useful!  Many
thanks to all of you who answered my query!!!

==============================================================================

>     I'm a fairly new reader of this newsgroup, so I apologize if this has
> already been discussed.  I would like to know what the best performance
> figures are for large memory to memory transfers using TCP-IP.  More
> specifically, what are the fastest reported average transfer times for
> transferring 10 Mbytes over a 10 Mbit/sec ethernet?  (or) What is the
> highest reported throughput of DATA across a 10 Mbit/sec ethernet using
> TCP-IP?
> 
>     Def.:  Memory to memory above means, the client generates the data
>            out of thin air and the server puts them all in one buffer
>	     (the "best case" situation).  I'm interested in raw transfer
> 	     rates and the cost of TCP-IP overhead on performance.
> 
>     I have seen performance figures for van Jacobson's modifications to
> Berkeley 4.3 TCP-IP which gave measurements of 23.3 secs for 10 MB over
> a 10 Mbit/sec ethernet (effective throughput of 3.4 Mbit/sec).  Are there
> any better?
> 
> John Carter
> Dept. of Computer Science, Rice University

==============================================================================

From: David Robinson <david@elroy.jpl.nasa.gov>
To: retrac@rice.edu
Subject: TCP performence


The best I have personally seen is between two Sun-3/260's doing
memory-memory transfers is 3.2Mbits/sec.  Their UDP topped out
at 5Mbits/sec.  This was on a fairly quite net, rwhod and routed
traffic only.  From my experience excelan boards have been the
worst, slower than my lowly Sun-2, but what can be expected from
a 80186??

I do not know what the limiting factor of the Sun's is by I suspect
that it is CPU bound, the e-net controllers each have large
memmory buffers (256K?).

	-David Robinson
	david@elroy.jpl.nasa.gov

[Wishing for an IP pure hardware chip!]

---------------------------------------------------------------------

Subject: Re: What's the "best" TCP/IP throughput?
Date: 26 Oct 87 07:37:25 PST (Mon)
From: lekash@orville.nas.nasa.gov

Better performance for ethernet is not that likely until someone
builds a better interface card.  Thats the current bottleneck.
If you go to other media, say pronet-80, or hyperchannel, you
can get much higher rates.  we were seeing up to 17mbits/sec over
hyperchanel, proteon claims over seven for their ring.  I
would guess with performance tuning, those numbers will
increase. (We might even do some here.)

					john
---------------------------------------------------------------

From: nowicki%rose@sun.com (Bill Nowicki)
Message-Id: <8710272303.AA03617@rose.sun.com>
To: retrac@rice.edu
Subject: Re: What's the "best" TCP/IP throughput?

Disclaimer: this is NOT an official number, just my latest test in an
uncontrolled environment with an unannounced software configuration.
But between a pair of Sun-4/260s I am able to transfer with TCP over an
Ethernet at 5.0 Mbits/second.  

	-- WIN

---------------------------------------------------------------------

From: Richard Fox <rfox@ames-nas.arpa>
Organization: NASA Ames Research Center, Mountain View, CA


I have just spent some time using the FTP protocol using tcp-ip over
an ethernet, gathering stats. As you are probably aware the results
at this point have been pretty disappointing. 

I would like to start investigating new and different protocols. So
any info you get could you please forward. Also, if you have other
protocols that need testing I would be glad to help. We have ethernets,
hyperchannels, satellites etc. and a strong interest in finding a more
efficient protocol than the current TCP-IP.

By the way, we have just received a new TCP implementation that is
supposed to have rate-control. If you are interested I will send you
the results after I have some time to play with it.


rich fox
(415)694-4358

-------------------------------------------------------------------------

From: erikn@sics.se (Erik Nordmark)
Summary: 1.8 Mbps between Vaxstation II's, about 4 Mbps between Sun-3's


When I was at Stanford I was working on David Cheriton's VMTP protocol.
There was already an implementation in the V distributed system and I 
did one in the 4.3BSD kernel.

These are the number we got:

Memory-to-memory bulk data transfer between 2 VAXstation II's on a 10 Mbps
Ethernet running 4.3BSD Unix:
	1.8 Mbps

Short request-reponse interaction: a 32 byte request and a 32 byte response
message (same system):
	send -----> recv
	     <----- reply

	8.6 milliseconds 

The implementation in V performs with about 4 Mbps and 2.3 ms between two
Sun-3's on a 10 MBPS Ethernet.

Note: The Unix implementation uses IP for datagram delivery whereas
the V implementation has its own mechanisms for delivery, routing
et.c. on the local net. An optimized version of the Unix
implementation that uses "raw" ethernet for packets on the local net
(and IP for internet packets) achieves about 2.1 Mbps between the two
microVAXes.


About VMTP:

The Versatile Message Transaction Protocol is a reliable transport
protocol based on the transaction style of communication. A
transaction consists of a request and a response message which are
limited in size to 1 Mbyte.  Current implementations limit the message
size to 16 kbytes, so there is room for performance improvements in
the implementations.

The protocol has:
	better naming then tcp/ip  (stable, location independent identifiers)
	support for real-time communication  
	support for security
	multicasting on LAN as well as WAN (latter relying on the Internet 
			Group Management Protocol and extensions to IP)
	solves the speed mismatch problem on the local net by using rate
			based flow control
	etc.


David Cheriton 		(cheriton@pescadero.stanford.edu)
"VMTP: a Transport Protocol for next generation ..."
Proc. Communications and Architecture and Protocols
Aug. 1986 (ACM)

Steve Deering
"Host extensions for IP multicasting"
RFC 988

Karen C. Lam
"4.3bsd Internet Multicast {Implementation Notes,Installation and Usage Notes}"
BBN Laboratories Inc, 10 Moulton Street, Cambridge MA


** Erik Nordmark **
Swedish Institute of Computer Science, Box 1263, S-163 13  SPANGA, Sweden
Phone: +46 8 750 79 70	Ttx: 812 61 54 SICS S	Fax: +46 8 751 72 30

uucp:	erikn@sics.UUCP or {seismo,mcvax}!enea!sics!erikn
Domain: erikn@sics.se

-------------------------------------------------------------------------

From: Jack Jansen <mcvax!cwi.nl!jack@uunet.uu.net>

The protocol used in the Amoeba distributed OS is probably the fastest
around currently (at least, that's what we like to think:-).

We do 420Kb/sec between two microvaxen, and 600+Kb/sec between two
68020 systems. This is all running the protocol between machines
running amoeba. I got up to 250Kb/sec once between two uVaxen running
ultrix 1.2.

References to amoeba should be easy to find, there's quite a bit
published, mainly written by Andrew S. Tanenbaum and Sape J. Mullender.
If you cannot find anything, drop me a line and I'll send you
a list of references.

Another protocol to look at might be the one used in David Cheriton's
V operating system. He does almost as good as we do.
--
	Jack Jansen, jack@cwi.nl (or jack@mcvax.uucp)
	The shell is my oyster.

-------------------------------------------------------------------------

From: dmc%tv.tv.tek.com@relay.cs.net
Subject: Bulk data transfer protocol timings

We are running Stanford's V-system Version 6 kernels in
Tektronix 4405 workstations, which have 16.6 Mhz. 68020 processors.
The ethernet interface used is the AMD LANCE, with 64 receive
packet buffers and 8 transmit packet buffers of 1518 bytes.

The Inter-Kernel measurement program `timeipc' gives us the following
figures for segment transfers between the user process memory of
two 4405's.  The program runs at a real-time scheduling priority,
and normal process execution is essentially suspended while the test
is in progress.  The protocol is an early version of VMTP.

Send-Receive-ReplyWithSegment (5 trial average):
Size (bytes)	elapsed time/100 transactions	effective bit rate
0		.20 seconds
1024		.34 seconds			2.409 Mbit/sec.

Send-Receive-MoveTo-Reply (5 trial average):
Size (bytes)	elapsed time/100 transactions	effective bit rate
2048		.66 seconds			2.482 Mbit/sec.
4096		.99 seconds			3.310 Mbit/sec.
8192		1.40 seconds			4.681 Mbit/sec.
16384		2.32 seconds			5.650 Mbit/sec.
32768		4.18 seconds			6.271 Mbit/sec.
65536		7.91 seconds			6.628 Mbit/sec.
131072		15.30 seconds			6.853 Mbit/sec.

Send-ReceiveWithSegment-Reply (5 trial average):
Size (bytes)	elapsed time/100 transactions	effective bit rate
1024		.35				2.341 Mbit/sec.

Send-Receive-MoveFrom-Reply (5 trial average):
Size (bytes)	elapsed time/100 transactions	effective bit rate
2048		.78				2.101 Mbit/sec.
4096		1.01				3.244 Mbit/sec.
8192		1.58				4.148 Mbit/sec.
16384		2.62				5.003 Mbit/sec.
32768		4.57				5.736 Mbit/sec.
65536		7.87				6.662 Mbit/sec.
131072		15.2				6.899 Mbit/sec.

Don Craig
Tektronix Television Systems

------------------------------------------------------------------------------

From: Mike Muuss <mike@brl.arpa>

A pair of Sun-3/50 machines running SUNOS 3.3 with tcp_sndspace and 
tcp_rcvspace (or whatever they are called) increased to 16K (ie,
increased offered windows).  Test is typically 1 Mbyte memory to memory
using the TTCP program (copies on request).  Typical data rate is
3 Mbits/sec.  For two pairs, typically see 6 Mbits/sec total for both
connections.  Never bothered to do three pairs.  Trailers were off.

6 Mbits/sec is fairly close to the maximum usable bandwidth of an Ethernet.

On an NSC Hyperchannel, between a Gould PN9080 running UTX 2.0, using a
PI32 to access an A400, with an otherwise idle trunk to an A130 adaptor
connected to a Cray XMP48 running UNICOS 2.0 (at the time), I was able
to achieve 11 Mbits/sec aggregate, using MTU of 4144 and Cray-IP
encapsulation.  This was not using TCP at all, but merely IP/ICMP_Echo
request/response packets, in a "flood ping" test.

	Best,
	 -Mike

-----------------------------------------------------------------------

From: aeh@j.cc.purdue.edu (Dale Talcott)

Re your query about high speed network protocols:  Several of our
mainframes here at Purdue are connected using Control Data's LCN
(loosely coupled network).  This is not a 10Mbs based network, but may
provide some insight into limiting factors.

The LCN is somewhat Ethernet-like in that it is tapped-trunk using
coaxial cable, but it runs at 50Mbs instead of 10Mbs and uses 3/4 inch
coax.  There is carrier-sense, but collisions are avoided by providing
fixed time slots for each host to start a transmission.  In practice,
there are occasional collisions.  The maximum trunk length is limited
to about 2000 feet.  The typical number of taps on a trunk is small
(5 - 10).  The hardware level protocol is point-to-point, with no
broadcasts.  The hardware level protocol packet limit is 65535 16-bit
words.  The software packet size is 4096 bytes of data with a 12 byte
software header and 21 bytes of hardware framing, addressing, crc, etc.
However, there is a mode called "streaming" in which a sender can
"grab" the trunk and keep it for as long as it wants by holding the
carrier asserted between packets.

The hosts do not connect directly to the LCN.  Rather, there are
specialized minicomputers to do the connection.  These are called NADs
(network access devices).  There are several kinds of NADs, according
to the device the NAD is connecting to the LCN.  We have NADs for tape
controllers, disk controllers, VAXes, and various CDC Cyber mainframes.

Simplified network:

 +-----+                +-----+
 |host |                |host |
 +-----+                +-----+
    |                      |
  +---+                  +---+
  |nad|                  |nad|
  +---+                  +---+
    |                      |
===============================================  trunk
                |
              +---+
              |nad|
              +---+
                |
             +-----+   +---+    +---+   +----+
             |host |---|nad|====|nad|---|disk|
             +-----+   +---+    +---+   +----+

There are different protocols used, according to the devices being
connected.  When a host talks to another host, the protocol is called
RHF (remote host facility) and is more-or-less ISO seven layer in
philosophy.  In practice, there is much mingling of layers.

When a NAD pair with dedicated trunk is being used to connect a host to
its disks (as in the bottom example), the NADs use a simplified
protocol, idiosyncratic to the device and host operating system.

---
Now, for the numbers:

("Mbs" = "megabits per second", throughout.)

A Cyber 205 host to CDC 819 disk drives, over a dedicated trunk
achieved a sustained transfer rate of 30Mbs.  The instantaneous rate at
which the drives read is 36Mbs.  The 205 usually reads/writes data in
65536 byte chunks (memory small page size), and often reads .5Mbyte
chunks (memory large page).  The disk NADs use streaming mode between
themselves.  I do not remember the actual size of the file used to
determine this transfer rate, but it was at least 8 Mbytes.

--
Transfers among hosts using the RHF protocol, exclusive of the time
needed to build a connection:

CDC 6600 to CDC 6500, no other load on the network, 6 Mbit of
fabricated data at sending end, discarded at receiving end, packet size
of 3840 bytes:						6.5Mbs.

CDC 6600 to Cyber 205, disk to disk, both systems idle, 16 Mbytes of
data:							2.6Mbs.

Same file, opposite direction:				2.0Mbs.

VAX 11/780 (dual processor) running 4.3BSD to Cyber 205.  VAX
multiuser, but idle.  Cyber running typical middle-of-the-night, CPU
intensive, low priority workload.  Transfer rate for 1 Mbyte, "holey"
file on VAX to 205 disk file:				1.3Mbs.

(A holey Unix file does not require disk accesses to read the holes:
the system just fabricates a chuck of zeros.)  Same file, only real and
residing on Fuji Eagle disk:				1.1Mbs.

Statistics for transfers during normal production from 205 to 780 give
numbers in the .5Mbs range for moderately large files (~1Mbyte) when the
780 has to translate each '\037' character into '\n' (done with the VAX
movtc instruction).  Highest rate noticed was .8Mbs for ~.25Mbyte
transfer.

For a transfer from CDC Cyber 720 to VAX 780, no character translation,
disk to disk, the transfer rate was .96Mbs.  This was a while ago, and
I don't remember the loads on the two systems, but I suspect both were
idle.

For Cyber 205 to VAX 8600 running Ultrix 2.?, both systems in normal
daytime production (but the VAX still lightly loaded), a .5Mbyte file
transferred disk to disk with character translation at	1.05Mbs.
Normal rates are 20 - 30% better than the 780.  (Note: the 8600 is
at JVNC at Princeton, not at Purdue.)

---
Notice the huge disparity between the data rate the hardware (NADs and
LCN trunk) is capable of and that actually obtained once several layers
of host resident protocol are placed on top.  In developing the RHF
implementation for the VAXes, we noticed that every optimization in
host code (placing data blocks on page boundaries, using the movtc
instead of a C loop, etc) showed up in the transfer statistics.  (Our
first cut got only 24kbs!).  We have an open project to find why it
is still so slow.

Dale Talcott                            Systems programmer
ARPANET: aeh@j.cc.Purdue.EDU            Purdue University Computing Center
 BITNET: AEH@PURCCVM                    Mathematical Sciences Bldg.
  Phone: (317) 494-1787                 West Lafayette, IN 47907

------------------------------------------------------------------------------

From ogud@sdag.cs.umd.edu Tue Nov 24 00:05:11 1987

Sorry I did not read your mail earlier but I trying to finish my thesis on
the behavior of Ethernet here in the CS Dep.
One part of my study was to look at bandwith between SUN's and more machines
Another part was to examine the behavior of the net under overload conditions

The results are in short: (all load figures include headers)
	sun3/50 to sun3/50 transferrate 2Mbits for Max size TCP packets
			   max 500 pack per sec of min size TCP packets

	sun3/160 to sun3/160 multiply by 1.2

I had sun3/160 and sun3/50 listening to all the traffic using modified 
Etherspy program  and killed of all UNIX processes that are not needed
and at sun3/50 started to report dropped packets at around 25% load (2.5Mb)
Both machines where shot down a(dropping lot of packets) at 40% load and
started showing erratic behavior. 


How does this affect the transfer rate?
Well no protcol using TCP will get any better rate because the SUN's are the
bottleneck not the NET. When running FTP here at night on large files I 
see max 140Kbytes per sec or (140 * 1090 * 8 = 1.2Mb). The higest number 
I have heard about is 190 KBytes/sec or( 1.6 Mb). This number is probably for
a SUN3/260 and scales well down to the 140 for 3/50.

If you want more info send my questions and Hopefully I will be able to
answer them for you. 

Olafur Gudmundsson  Dept. of Computer Science University of Maryland
ARPA: ogud@brillig.edu
UUCP: {...}!seismo!umcp-cs!brillig!ogud   Tel: (301)-454-6153 (w)
UPS: College Park MD. 20742               ATT: (301)-595-4154 (h)

------------------------------------------------------------------------------

From mangler@csvax.caltech.edu Tue Nov 24 04:22:59 1987
>   From: David Robinson <david@elroy.jpl.nasa.gov>
>
>   The best I have personally seen is between two Sun-3/260's doing
>   memory-memory transfers is 3.2Mbits/sec.

What wasn't mentioned is that this was on SunOS 3.2.  Prior and later
versions of SunOS aren't nearly as fast at TCP.  SunOS 3.4 is 2X slower!

Don Speck   speck@vlsi.caltech.edu  {amdahl,scgvaxd}!cit-vax!speck

==============================================================================
         ___
        /   \             John Carter
=======O     \            Rice Univerity
|  _  |       \           Houston, Texas
| (_) |    (( ))
|__#__|    O/  \\O        ARPA/CSNET:  retrac@rice.edu
  | |     /+     -+.      UUCP: {Backbone or Internet site}!rice!retrac
  [N]     =       |
  [B]    //      //       Rockets record with me at the game: 13 -  0
  [A]    ))      \\       Rockets record w/o  me at the game:  4 -  4  Home
  |_|            =/                                           15 - 18  Total
 /___\                    ^^^ ME, *superstitious*?!? ^^^  No way!  :-)