lindberg@chalmers.UUCP (Gunnar Lindberg) (10/15/84)
Recently, there has been much discussion about Ethernet performance, with several suggestions of where to find the bottleneck(s). However, I have not seen any description of what REALLY happens inside UNIX when TCP/IP/Ethernet is used. Therefore I tried the following: Hosts: 2 VAX 11/780, running UNIX 4.2 BSD, single user mode. Ethernet: 3COM, with standard 4.2 BSD driver. No other activity on the network. A program in one host sent 2 Mbytes of data via TCP/IP/Ethernet to a program in the other host. Both UNIX systems were analyzed using the kernel profiler (KGMON, GPROF). From what I've heard before, I expected to find the system spending lots of time inside the TCP/IP/Ethernet code. However, the systems were in IDLE most of the time: SEND % of total time RECV % of total time "idle" 40 % "idle" 35 % ecput 15 % ecget 25 % in_cksum 4 % in_cksum 4 % The problem is not the amount of TCP/IP code that has to be run, but merely a question of end-to-end flow control and lack of parallelism. The scenario is as follows: SEND: write(2Mbytes); RECV: loop until 2Mbytes read(2Mbytes); SEND fills buffers up to max size (2Kbytes), and gives that to TCP. TCP sends data on the net. TCP receives data and returns an acknowledgement. The RECV program is awakened with a 2Kbytes buffer. TCP receives the ack and releases SEND buffer space. However, since RECV has not RECV consumes the data and yet consumed the data, SEND releases buffer space. has to wait for buffer space to be released at RECV's host. TCP sends window information, telling SEND that he may send more data. TCP receives the window info and wakes SEND up. SEND fills next 2Kbytes... etc. etc. etc. Of course, less overhead in the TCP/IP/Ethernet code would lead to this "SEND/RECV see-saw" toggling faster, resulting in a higher throughput. However, most of the system's active time is spent in the 3COM inteface routine, reading and writing memory on the 3COM board, (ecget/ecput), which is known to be slow. Using a DMA interface instead should give a better performance. Unfortunately, performance will NOT be increased by use of a dedicated network processor. A typical processor for such a task would be a Motorola MC68000, which is MUCH slower than a VAX 11/780. Therefore, unless we can reduce the amount of code that must be run to implement TCP/IP, a network processor will DECREASE performance. However, a network processor reduces the host processor's load, which of course is a good thing, and makes it possible for the host to consume data in one buffer, concurrently with the network processor fetching data to the next buffer. Now finally the question: What can we do? 1) An increase of "max buffer size" from 2K to 4/8K should reduce scheduling overhead etc. This could be implemented as an "ioctl" function in TCP, to be used by programs such as "rcp" when both nodes are on the same net. I have not tried this yet, but I plan to. 2) Design of a new protocol for usage on "reliable" local nets. The Ethernet interface performs checksumming and drops all erroneous packets, which means that data in a delivered packet may be "trusted", i.e. no checksumming needed. Of course, a LAN protocol does not need any inter-network code, although using IP addresses would be an advantage (makes it simple to check "same_net(addr1, addr2)" ). 3) Other suggestions? Does anyone know which protocol SUN uses for disc transfers (file system data and swapping)? I am sorry for the length of this letter, but I have not seen any discussion on UNIX-TCP/IP/Ethernet's internals before. If my observations were obvious to everybody else, I apologize. Gunnar Lindberg Department of Computer Science Chalmers University of Technology S-412 96 Gotherburg SWEDEN ..!mcvax!enea!chalmers!lindberg
honey@down.FUN (code 101) (03/17/85)
Synopsizing kroot's data, and adding a third column: buffer kbytes/sec writes/sec 1k - 2k 50 - 100 25 - 100 10 10 - 20 1k - 2k 1 .06 - .09 60 - 90 The big surprise is not the shabby throughput with one byte writes, but the incredible performance with ten byte writes. This seems suspect. Peter
tim@cmu-cs-k.ARPA (Tim Maroney) (03/24/85)
Writing your own protocol depends on the application. I believe you said you wanted extremely small packets. If the size is likely to be four bytes of data or less, then just send duplicates in every packet. If it is going to be more, use a simple checksum from IP. As for hacking the device drivers for the network, that shouldn't be hard for any kernel hacker. -=- Tim Maroney, Carnegie-Mellon University, Networking ARPA: Tim.Maroney@CMU-CS-K uucp: seismo!cmu-cs-k!tim CompuServe: 74176,1360 audio: shout "Hey, Tim!"