torek@elf.ee.lbl.gov (Chris Torek) (03/18/91)
In article <thurlow.669179279@convex.convex.com> thurlow@convex.com (Robert Thurlow) writes: >... The only difference is the performance bottleneck due to the network. >If you crippled your I/O subsystem, you'd see similar things. Until we >get new networks that are two orders of magnitude faster, this may be >the case. (Rob T is at convex, so he may actually have disks with real bandwidth; then the picture changes.) The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 MB/s. The bandwidth of your standard, boring old SCSI disk without synchronous mode is around 1.5 MB/s. The latency on your Ethernet is typically much *lower* than that on your standard boring SCSI controller (which probably contains a 4 MHz 8085 running ill-planned and poorly-written code, whereas your Ethernet chip has a higher clock rate and shorter microcode paths.) In other words, they are fairly closely matched. So why does NFS performance fall so far below local SCSI performance? There are many different answers to this question, but one of the most important is one of the easiest to cure. A typical NFS implementation uses UDP to ship packets from one machine to another. Its UDP interface typically squeezes about 500 KB/s out of the Ethernet (i.e., around 42% of the available bandwidth). Since UDP is an `unreliable protocol' (in the sense that UDP is allowed to drop and reorder packets), the NFS implementation has to duplicate most of the TCP mechanism to make things reliable. A good TCP implementation, on the other hand, squeezes about 1.1 MB/s out of the Ethernet even when talking to user code (talking to user code is inherently at least slightly more expensive than talking to kernel code, because you must double-check everything so that users cannot crash the machine). This is 92% of the available bandwidth. Thus, one easy way to improve NFS performance (by a factor of less than 2, unfortunately: even though you may halve the time spent talking, there is plenty of other unimproved time in there) is to replace the poor TCP implementations with good ones, and then simply call the TCP transport code. (To talk to existing NFS implementations, you must also preserve a UDP interface, so you might as well fix that too.) The reason this is easy is that much of the work has already been done for you---it appears in the current BSD systems. As a nice side bonus, TCP NFS works over long-haul and low-speed networks (including 9600 baud serial links). A typical UDP NFS does not, because its retransmit algorithms are wired for local Ethernet speeds. Indeed, even if you do go from Ethernet to FDDI, you will find that your NFS performance is largely unchanged unless you fix the UCP and TCP code. (When you fix TCP, you will discover that you also need window scaling, since the amount of data `in flight' over gigabit networks is much more than an unscaled TCP window can describe.) Opening up this bottleneck reveals the next one to be NFS's write-through cache policy, and now I will stop talking. (You may infer anything you like from this phrasing :-) .) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
david@bacchus.esa.oz.au (David Burren [Athos]) (03/18/91)
In <11030@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 >MB/s. Say what? If you can get over 1 Mb/s out of an Ethernet I'd like to hear about it. As a simple test, on a barely-loaded Ethernet (5 Sony workstations, with two people running vi) I ftp'ed a >400k file from one machine to another. Local SCSI disk to local RAM disk. No NFS involved. The transfer rate I got was was 94 kbytes/s. (strange, considering the 270 kb/s NFS throughput shown below) I know this is a poor test, but it indicates a ballpark figure MUCH less than 1 Mb/s. On a busy Ethernet I'd expect IP performance to fall _far_ short of 1 Mb/s, as collisions take their toll. >The bandwidth of your standard, boring old SCSI disk without >synchronous mode is around 1.5 MB/s. Using the bonnie filesystem-benchmark on our local SCSI disks shows writes ranging from 200 kb/s (for char-by-char) to >600 kb/s (block I/O) and reads from 150 kb/s (character) to >600 kb/s (block). This is with Wren-IV's and M9380S's using asynchronous SCSI. Note that bonnie measures *through-the-filesystem* performance. I ran bonnie again, over NFS/Ethernet (onto a workstation with 8 Mb RAM). By this stage there were about 5 users on the net, running a mix of vi, nn, xmahjongg, etc. Block writes came in at about 50 kb/s (not surprising really) while reads showed ~150 kb/s (character) and ~270 kb/s (block). This was for a 40 Mb file, and no I didn't do the test more than once. >The latency on your Ethernet is >typically much *lower* than that on your standard boring SCSI >controller (which probably contains a 4 MHz 8085 running ill-planned >and poorly-written code, whereas your Ethernet chip has a higher clock >rate and shorter microcode paths.) Which distinguishes older SCSI/ST506 implementations from the newer embedded-SCSI disks. I wonder which is more prevalent in today's machines? Also, see my comment above re Ethernet collisions. >In other words, they are fairly closely matched. I beg to differ. Of course, the hardware here may be atypical. That aside, I agree that NFS performance is probably less than optimal. >There are many different answers to this question, but one of the most >important is one of the easiest to cure. >A good TCP implementation, on the other hand, squeezes about 1.1 MB/s >out of the Ethernet even when talking to user code (talking to user code >is inherently at least slightly more expensive than talking to kernel >code, because you must double-check everything so that users cannot >crash the machine). This is 92% of the available bandwidth. Could you please refer me to such a TCP implementation? The figures I've quoted above were on Sony NEWS-1750 workstations, running NEWS-OS 3.3a (basically 4.3BSD-Tahoe, I believe). _____________________________________________________________________________ David Burren [Athos] Email: david@bacchus.esa.oz.au Software Development Engineer Phone: +61 3 819 4554 Expert Solutions Australia, Hawthorn, VIC Fax: +61 3 819 5580 [Above opinions and comments are mine, not ESA's.]
goudreau@larrybud.rtp.dg.com (Bob Goudreau) (03/19/91)
In article <2028@bacchus.esa.oz.au>, david@bacchus.esa.oz.au (David Burren [Athos]) writes: > In <11030@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: > > >The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 > >MB/s. > > Say what? If you can get over 1 Mb/s out of an Ethernet I'd like to hear > about it. Note that Chris said "bandwidth", not "effective throughput". Also, pay attention to "b" ("bit") vs. "B" (byte). ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA
torek@elf.ee.lbl.gov (Chris Torek) (03/19/91)
>In <11030@dog.ee.lbl.gov> I wrote: >>The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 >>MB/s. In article <2028@bacchus.esa.oz.au> david@bacchus.esa.oz.au (David Burren [Athos]) writes: >Say what? If you can get over 1 Mb/s out of an Ethernet I'd like to hear >about it. You just did. :-) Van Jacobson regulary gets around 1 MB/s (8 Mb/s) on Sun-3 (68020) boxes. 4.3BSD-reno (a much less carefully tuned system than Van's) running on a VAX 8250 with a DEUNA, talking to an Encore Multimax running UMax 4.3, receives data inside FTP at 130 kb/s or just a bit over 1 Mb/s. (I used `get /vmunix /dev/null' to get this number. Note that this depends on the rate at which the remote machine can generate data for you.) >As a simple test, on a barely-loaded Ethernet (5 Sony workstations, with two >people running vi) I ftp'ed a >400k file from one machine to another. >Local SCSI disk to local RAM disk. No NFS involved. The transfer rate I >got was was 94 kbytes/s. (You may have forgotten to use binary mode.) >>The bandwidth of your standard, boring old SCSI disk without >>synchronous mode is around 1.5 MB/s. >Using the bonnie filesystem-benchmark on our local SCSI disks shows writes >ranging from 200 kb/s (for char-by-char) to >600 kb/s (block I/O) and reads >from 150 kb/s (character) to >600 kb/s (block). >This is with Wren-IV's and M9380S's using asynchronous SCSI. Note that >bonnie measures *through-the-filesystem* performance. Yes, these numbers are fairly typical (you lose half the bus performance in the file system code: something else that needs tuning: see Larry McVoy's paper from the last Usenix for one approach). >>A good TCP implementation, on the other hand, squeezes about 1.1 MB/s >>out of the Ethernet even when talking to user code ... >Could you please refer me to such a TCP implementation? >The figures I've quoted above were on Sony NEWS-1750 workstations, running >NEWS-OS 3.3a (basically 4.3BSD-Tahoe, I believe). 4.3-tahoe lacks the `header prediction' code that appears in 4.3-reno. 4.3-reno lacks Van's latest changes (though said changes are likely to be in 4.4BSD, if/when 4.4BSD exists). Only those who work on NEWS-OS could say for certain which performance fixes are in it. Also, much depends on the bus design and the code for the Ethernet driver. It is important to avoid data copies; many existing implementations copy a packet just so they can insert headers, even though it is easy to arrange for space for those headers `in advance'. It is also important to avoid long code paths for typical cases (e.g., the `header prediction' stuff that went into 4.3-reno, and the route cacheing stuff; I think the latter has been around longer). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
david@bacchus.esa.oz.au (David Burren [Athos]) (03/19/91)
In <2028@bacchus.esa.oz.au> I wrote: >In <11030@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >>The bandwidth of your standard, boring old Ethernet is 10 Mb/s or 1.2 >>MB/s. >Say what? If you can get over 1 Mb/s out of an Ethernet I'd like to hear >about it. Bruce Barnett @ GE kindly sent me a copy of a posting to comp.protocols.tcp-ip by Van Jacobson in October 1988. In it he described tests using Sun-3s with two types of Ethernet controller: a Lance and an i82586. The LANCE came out best, with throughputs up to 1000 kbytes/sec, while the Intel part peaked out at 720 kbytes/sec. I stand corrected in what Ethernet can do :-) Mind you, unfortunately I suspect that this optimised code is still absent in many shipped systems. I do not know if the Sonys here incorporate the Van Jacobson TCP. So, Ethernet being capable (depending on controller and software) of sustaining throughputs similar to modern asynch SCSI-1 setups, we're back to the distinct performance difference between local disks and NFS. Eg. in my previous posting: fs performance (block reads) on: SCSI 600 kb/s NFS 270 kb/s Not that I've added all that much to the discussion :-( Back to the experts... - David B.
milburn@me10.lbl.gov (John Milburn) (03/19/91)
In the referenced article torek@elf.ee.lbl.gov (Chris Torek) writes: >Van Jacobson regulary gets around 1 MB/s (8 Mb/s) on Sun-3 (68020) boxes. >4.3BSD-reno (a much less carefully tuned system than Van's) running on a VAX >8250 with a DEUNA, talking to an Encore Multimax running UMax 4.3, receives >data inside FTP at 130 kb/s or just a bit over 1 Mb/s. >(I used `get /vmunix /dev/null' to get this number. Note that this depends >on the rate at which the remote machine can generate data for you.) There are commercial implementations using Van's alogrithms. Using an hp9000s400 (HP/UX 7.03) connected to a locally connected sun4 (SunOS 4.1), and using the same method, "get /vmunix /dev/null", I get a binary transfer rate of 501 Kbyte/sec or .5 MByte/s. The hp is using header prediction, dynamic window sizing, and Phil Karn's clamped retransmission algorithm. If I go to another sun4 on the other side of a cisco router, a Dec LanBridge, and two FDDI <-> ether bridges, the rate drops to 240 Kbyte/s. -jem -- John Milburn milburn@me10.lbl.gov (415) 486-6969 "Inconceivable!" "You use that word a lot. I don't think it means what you think it does."
lm@slovax.Eng.Sun.COM (Larry McVoy) (03/19/91)
In article <11074@dog.ee.lbl.gov> JEMilburn@lbl.gov (John Milburn) writes: >In the referenced article torek@elf.ee.lbl.gov (Chris Torek) writes: > >>Van Jacobson regulary gets around 1 MB/s (8 Mb/s) on Sun-3 (68020) boxes. >>4.3BSD-reno (a much less carefully tuned system than Van's) running on a VAX >>8250 with a DEUNA, talking to an Encore Multimax running UMax 4.3, receives >>data inside FTP at 130 kb/s or just a bit over 1 Mb/s. > >>(I used `get /vmunix /dev/null' to get this number. Note that this depends >>on the rate at which the remote machine can generate data for you.) > >There are commercial implementations using Van's alogrithms. Using an >hp9000s400 (HP/UX 7.03) connected to a locally connected sun4 (SunOS >4.1), and using the same method, "get /vmunix /dev/null", I get a >binary transfer rate of 501 Kbyte/sec or .5 MByte/s. The hp is using >header prediction, dynamic window sizing, and Phil Karn's clamped >retransmission algorithm. The clustering changes give you a bit better performance (both ends are sun 4/60's on a local net, the end w/ /h/XXX has clustering changes. The reason it doesn't get faster the second time is that snafu has only 8MB of memory, so much of the file is reread from disk.) The interesting thing to note is that the disk bandwidth (~1.2MB/sec) and the ethernet are closely matched. What happens when we consider FDDI and ISDN, the fast and slow futures of networking? 220 snafu FTP server (SunOS 4.1.1) ready. ftp> bin 200 Type set to I. ftp> get /h/XXX /dev/null 200 PORT command successful. 150 Binary data connection for /h/XXX (129.144.50.10,1494) (8388608 bytes). 8388608 bytes received in 11 seconds (7.4e+02 Kbytes/s) ftp> get /h/XXX /dev/null 8388608 bytes received in 11 seconds (7.6e+02 Kbytes/s) ftp> quit script done on Mon Mar 18 19:53:19 1991 --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
raj@hpindwa.cup.hp.com (Rick Jones) (03/22/91)
In addition to paying close attention to b's and B's, I have also decided to take what ftp says as merely a best guess for the transfer rate. If you transfer a small enough file on a fast enough system, one can see ftp report transfer rates of 3-4 MB/s (closely watched b's and B's ;-) on an *ethernet* ;-) ;-) ;-) Just about any new WS worth it's silicon should be able to go memory to memory at full ethernet speeds using TCP... rick jones ___ _ ___ |__) /_\ | Richard Anders Jones | HP-UX Networking Performance | \_/ \_/ Hewlett-Packard Co. | "It's so fast, that _______" ;-) ------------------------------------------------------------------------ Being an employee of a Standards Company, all Standard Disclaimers Apply