cheriton@PESCADERO.STANFORD.EDU ("David Cheriton") (03/16/88)
VMTP (RFC 1045) is specifically designed to work well with hardware support on a network adaptor board with its own processing power. In fact, we have designed and are implementing such a board to support VMTP, and this process, concurrent with the protocol design and refinement, has significantly influenced the design of the protocol. I would agree there are severe limits on what such intelligent interfaces can provide with current protocols independent of how well the boards are designed. I would also agree that most intelligent interfaces to date are slower than the dumb fast ones when you look at transport-level performance. However, my experience with VMTP and the NAB (Network adaptor Board) we are building convinces me that this approach is essential for transport-level performance in the same general range as the network when we go to 100 Mb networks and higher. Moreover, offboarding the processing load of protocols seems to have additional advatnages on multiprocessors machines because the interrupts and cache demands of protocols leans on the critical resources, namely the system bus. Interested parties can send to my secretary (nevena@pescadero.stanford.edu) for a copy of our draft paper on the NAB. The VMTP spec is of course available as an RFC - only the first 30 pages are really needed to get a feeling for the protocol. David Cheriton
CERF@A.ISI.EDU (03/17/88)
I'm not goin to get into the front-end versus operating system resident protocol argument (I argued against the front-end years ago on the same basis you suggest Dave Clark argues, if anyone cares about my historical biases). However, it seems to me that as you approach the gigabit channels, you really want to simplify the host's view of networking. An analogy might be found in disk/file access and virtual memory. Years ago, an operating systme was designed at UCLA called the Sigma EXperimental system (SEX for short - the user's manual was a popular item!). It ran on a Sigma 7 made by Scientific Data System (later, Xerox Data Systems, later, R.I.P.). The notion of associating ("coupling" - God, I never thought about how suggestive that term was in connection with the operating system acronym) virtual memory with pages of files was an essential design element. One would associate a particular virtual page space with disk pages occupied by a file. This is not much different than virtual memory linked to pages of a disk, except in this case, actions to the memory content were reflected in changes to the FILE (not just changes to a disk page which happened to represent a page of virtual memory space). So, the user's virtual memory space was mapped onto the file system. I imagine Multics could be considered to have done something like that only even more elegantly with its rich addressing structure. Perhaps what is needed is a way to associate virtual memory with places in the networking space. Writing to virtual memory would be like writing to the network. PDP-11's had the concept of associating certain words of memory with I/O channels. But what I am looking for is a notion that lets very simple actions to memory be interpreted by outboard processors as network-related actions. Perhaps Dave Clark could expand on his theme which I view as related to your question if not the rather poorly expressed ideas above. Vint Cerf
kwe@bu-cs.BU.EDU (kwe@bu-it.bu.edu (Kent W. England)) (03/19/88)
In article <[A.ISI.EDU]17-Mar-88.05:12:56.CERF> CERF@A.ISI.EDU writes: [discussion of simplifying and speeding up network/host interface] > >Perhaps what is needed is a way to associate virtual memory with places >in the networking space. Writing to virtual memory would be like writing >to the network. PDP-11's had the concept of associating certain words of >memory with I/O channels. Doesn't Apollo's Domain [proprietary] networked operating system do just that?
KASTEN@MITVMA.MIT.EDU (Frank Kastenholz) (03/19/88)
More importantly than devising protocols with OP's in mind is to move data directly from the users space to the processor - it should not go through some central network application. A second (equally important issue) is to trust your local I/O channel. The things that really kill the protocol processing are checksums and "adminstrative" I/O (separate ack's, etc). By trusting the local I/O channel, you do not need to checksum packets going between the OP and the host, ack them, etc, etc. A very empirical model that I have dreamed up for a TCP file transfer for a non-kernal TCP implementation (e.g. Wollongong, KNET etc) is: The cost of moving the data from disk to user is X, from user to network application is X, running the TCP checksum is X and then moving the data from the network application to the I/O adaptor is X. The total cost is 4X and X seems to be O(n). This model is not proven, but seems to be borne out by some empirical testing for running file transfers through a VAX using TCP and UDP (both had about the same throughput, but TCP took 100% and UDP about 75% of the CPU - the transfers were done by FTP/TCP and NFS/RPC/ UDP - the only effective difference was the TCP checksum). Moral of the story, if you can not move the data from the user's space to the OP directly (i.e. need to go to an application level network process first) you only save about 25%. Remember, this is all empirical! real testing needs to be done. Frank Kastenholz Atex Inc. All opinions are mine - not my employers.....
jerry@twg.COM ("Jerry Scott") (03/22/88)
Frank, That is not the way that data flows inside the Wollongong software at all. The same style used by 4.3BSD is the case here. First data is sent from the user into the kernel where it is placed into network buffers call mbufs. Mbufs can and are chained to build packets (IP headers, TCP headers, data, etc). The mbuf chains are passed between protocols, thus no data is moved at all just the pointers to the data. Plus once the data is in the kernel, it never has to take a hop back to an application for any further processing. We are well aware of the overhead of moving data between the kernel level and user level, that is why we have done considerable work in preventing this from happening (eg. Telnet server is kernel resident, sharing DEC ethernet controllers using ALTSTART interface). We have also been eagerly tracking the good work by Van Jacobson and Mike Karels in the TCP area. Our implementation allows us to use there public domain code without modification. I do agree with your assessment of the on-board TCP solutions. The overhead in the host must be minimal. Data must be moved from the user into a DMA area where the smart controller can access it. You must trust the data integrity between the host and the controller performing the network functions. Now if you can get Van's and Mike's code down onto these controllers... Jerry
chris@gyre.umd.EDU (Chris Torek) (03/22/88)
On the other hand, putting the Jacobson/Karels TCP into the board may produce something significantly slower than what you get when you run the protocol on a Sun-3. Even if the interface is right. Even if you have a good DMA path. No matter how low the overhead is. The problem, you see, is that the Sun-3 CPU may be significantly faster than the one on your protocol card. That 68020 runs rings around the 80x8x in some of those external protocol boards. The latest Ethernet chips from Intel and AMD are fast, but they are not CPUs. There may be some protocol boards that use fast hardware ---if they do not exist now they probably will soon---but I have never seen one myself. Chris
jerry@TWG.COM ("Jerry Scott") (03/23/88)
Chris, Agreed, the cpu on the board will definitely come into play in terms of performance. We see that here at TWG when our host resident software is compared against on-board software on VAX 86xx or 88xx hardware. The host resident runs circles around the on-board in these cases. I think the Jacobson/Karels code has more to offer than blazing performance. The code that I am distributing does not yet have the performance hooks that Van explained in some of his mail messages. But it does have improved congestion control that allows my connections to adapt to line speeds during the life of the connection rather than at the beginning. Not only does this code save the net the overhead of unnecessary retransmissions, but prevents timeouts of connections as well. The big improvement I have seen is with Arpanet mail. It used to be the case that I would try to send mail to another host, make the connection, and then lose the connection because of timeouts before the mail could be transferred. Now, even at peak packet times, mail delivery is reliable. Jerry
lamaster@ames.arpa (Hugh LaMaster) (03/24/88)
In article <8803160106.AA05728@Pescadero> cheriton@PESCADERO.STANFORD.EDU ("David Cheriton") writes: >VMTP (RFC 1045) is specifically designed to work well with hardware support I find it interesting to note that while some people are worrying about the necessity for offloading protocol processing, Van Jacobson and Mike Karels have contributed algorithms that together will push 10Mbits/sec from a CPU with less than 2 MIPS. In any reasonable model of the rates of computation versus network traffic for any non-gateway host, it isn't clear that there is any benefit at all to offloading protocol processing. In fact, recent history seems to confirm that using a general purpose CPU is a better way to go- easier to install new algorithms and bug fixes. If more processing power is needed, multiple (general purpose) CPU's seems to be a much more cost effective way to go. I note that Ardent Computer seems to be applying the same principle to Graphics processing - instead of special purpose graphics engines, build the system with multiple CPU's.
karn@thumper.bellcore.com (Phil R. Karn) (03/24/88)
I would like to draw extra attention to the fact that Van and Mike were able to do what they did WITHOUT cheating, i.e., turning off TCP checksums. Somebody should tell Sun that this puts the final nail in the coffin of their argument that NFS can't tolerate UDP checksums for "performance" reasons... :-) Phil