[net.lan] ethernet performance

lindberg@chalmers.UUCP (Gunnar Lindberg) (10/15/84)

Recently, there has been much discussion about Ethernet performance,
with several suggestions of where to find the bottleneck(s). However,
I have not seen any description of what REALLY happens inside UNIX
when TCP/IP/Ethernet is used. Therefore I tried the following:

    Hosts:    2 VAX 11/780, running UNIX 4.2 BSD, single user mode.
    Ethernet: 3COM, with standard 4.2 BSD driver. No other activity
	      on the network.

    A program in one host sent 2 Mbytes of data via TCP/IP/Ethernet
    to  a program in the other host. Both UNIX systems were analyzed
    using the kernel profiler (KGMON, GPROF).

From what I've heard before, I expected to find the system spending
lots of time inside the TCP/IP/Ethernet code. However, the systems
were in IDLE most of the time:

    SEND	   % of total time	RECV	     % of total time
    "idle"	40 %			"idle"	  35 %
    ecput	15 %			ecget	  25 %
    in_cksum	 4 %			in_cksum   4 %

The problem is not the amount of TCP/IP code that has to be run,
but merely a question of end-to-end flow control and lack of
parallelism. The scenario is as follows:

    SEND: write(2Mbytes);		    RECV: loop until 2Mbytes
						     read(2Mbytes);

    SEND fills buffers up to max
    size (2Kbytes), and gives
    that to TCP.

	TCP sends data on the net.

					TCP receives data and
					returns an acknowledgement.

					The RECV program is awakened
					with a 2Kbytes buffer.
	
	TCP receives the ack and
	releases SEND buffer space.
	However, since RECV has not	    RECV consumes the data and
	yet consumed the data, SEND	    releases buffer space.
	has to wait for buffer space
	to be released at RECV's host.

					TCP sends window information,
					telling SEND that he may send
					more data.

	TCP receives the window info
	and wakes SEND up.

    SEND fills next 2Kbytes...

			    etc. etc. etc.

Of course, less overhead in the TCP/IP/Ethernet code would lead
to this "SEND/RECV see-saw" toggling faster, resulting in a higher
throughput. However, most of the system's active time is spent in
the 3COM inteface routine, reading and writing memory on the 3COM
board, (ecget/ecput), which is known to be slow. Using a DMA
interface instead should give a better performance.

Unfortunately, performance will NOT be increased by use of a
dedicated network processor. A typical processor for such a task
would be a Motorola MC68000, which is MUCH slower than a VAX 11/780.
Therefore, unless we can reduce the amount of code that must be run
to implement TCP/IP, a network processor will DECREASE performance.
However, a network processor reduces the host processor's load,
which of course is a good thing, and makes it possible for the host
to consume data in one buffer, concurrently with the network processor
fetching data to the next buffer.

Now finally the question: What can we do?

    1) An increase of "max buffer size" from 2K to 4/8K should
       reduce scheduling overhead etc. This could be implemented
       as an "ioctl" function in TCP, to be used by programs such
       as "rcp" when both nodes are on the same net. I have not
       tried this yet, but I plan to.
    
    2) Design of a new protocol for usage on "reliable" local nets.
       The Ethernet interface performs checksumming and drops all
       erroneous packets, which means that data in a delivered packet
       may be "trusted", i.e. no checksumming needed. Of course, a
       LAN protocol does not need any inter-network code, although
       using IP addresses would be an advantage (makes it simple to
       check "same_net(addr1, addr2)" ).
    
    3) Other suggestions?
       Does anyone know which protocol SUN uses for disc transfers
       (file system data and swapping)?

I am sorry for the length of this letter, but I have not seen any
discussion on UNIX-TCP/IP/Ethernet's internals before. If my
observations were obvious to everybody else, I apologize.

	Gunnar Lindberg
	Department of Computer Science
	Chalmers University of Technology
	S-412 96 Gotherburg
	SWEDEN
	..!mcvax!enea!chalmers!lindberg

honey@down.FUN (code 101) (03/17/85)

Synopsizing kroot's data, and adding a third column:

buffer	  kbytes/sec	writes/sec
1k - 2k	  50 - 100	25 - 100
10	  10 - 20	1k - 2k
1	  .06 - .09	60 - 90

The big surprise is not the shabby throughput with one byte writes, but
the incredible performance with ten byte writes.  This seems suspect.

	Peter

tim@cmu-cs-k.ARPA (Tim Maroney) (03/24/85)

Writing your own protocol depends on the application.  I believe you said
you wanted extremely small packets.  If the size is likely to be four bytes
of data or less, then just send duplicates in every packet.  If it is going
to be more, use a simple checksum from IP.  As for hacking the device
drivers for the network, that shouldn't be hard for any kernel hacker.
-=-
Tim Maroney, Carnegie-Mellon University, Networking
ARPA:	Tim.Maroney@CMU-CS-K	uucp:	seismo!cmu-cs-k!tim
CompuServe:	74176,1360	audio:	shout "Hey, Tim!"