dove@mit-bug.UUCP (Web Dove) (10/07/85)
References: With sock_stream communications between 750's we get 60bytes/sec. With SOCK_DGRAM we get 100kbytes/sec. The limitation seems to be from two things: 1) It is only possible to send() 1024 byte pkts one at a time. This leads to enormous overhead due to the number of syscalls. Curiously 1536 byte pkts cost more, presumably because copying the data into the system is more expensive if it is not an integral number of "CLUSTERS" as defined in the kernel (1024bytes for the vax). The 100kbyte/second speed is dominated by system time, and corresponds to a job using >80% of the cpu on a 750. Part of this problem would be alleviated if the system call sendmsg() sent multiple pkts instead of packing all the buffers into one pkt (getting us back to the 1024 byte limit per syscall). 2) It is not possible for the user to make use of the dma capability of the network interface. I don't how difficult this would be, but I could envision queing a request to the transmitter (like I would to a disk) that included the destination address and a vector of pkt descriptors. Likewise I could envision informing the receiver that incoming pkts for a specified address should be packed into a given vector of pkts. Incoming pkts with other addresses would follow the normal route, but those with the address I gave would get stuffed (via dma) into my user buffers. Another alternative would be to have a socket ioctl that would map a network buffer page with my pkt into my address space so no copy would be needed. Note that functions built into the kernel (such as remote virtual disks) don't suffer from this problem since there is no user copy involved. So these could potentially run closer to the speed of the net hardware.