magnar@sfd.uit.no (Magnar Antonsen) (12/08/89)
While testing the CDC Wren IV disk some dramatic differences between HP and SUN turned up. Running the dd command illustrates the point. We reads 2000 blocks of 8K size from the disk with the command dd if=/dev/<block-special> of=/dev/null bs=8k count=2000 and the time (measured with /bin/time) it takes do this is Sun 3/80 with SunOS 4.03 real: 18.4 sys: 8.2 HP 9000/370 with HP-UX 6.5 real: 144.8 sys: 9.1. The same physical disk was used on both computers. The interface on the HP is 2.7 MB synchronous SCSI. Other tests (e.g. omitting the dd command, other block sizes) gives the same conclusion: Sun is 7 to 9 times faster than HP. Could anyone comment on or explain this difference in performance? A first conclusion of our tests shows that the answer may be found in differences between the device drivers in HP-UX 6.5 and SunOs 4.03. //// Magnar Antonsen // N-9001 TROMSOE, NORWAY / /// Computer Science Department // Phone : + 47 83 44043 // // University of Tromsoe // Telefax: + 47 83 55418 /// / NORWAY // Email: magnar@sfd.uit.no ////
kinsell@hpfcdj.HP.COM (Dave Kinsell) (12/12/89)
>While testing the CDC Wren IV disk some dramatic differences between HP and SUN >turned up. Running the dd command illustrates the point. We reads 2000 blocks >of 8K size from the disk with the command > dd if=/dev/<block-special> of=/dev/null bs=8k count=2000 >and the time (measured with /bin/time) it takes do this is >Sun 3/80 with SunOS 4.03 real: 18.4 sys: 8.2 >HP 9000/370 with HP-UX 6.5 real: 144.8 sys: 9.1. Using dd with the block special device file isn't doing what you think it's doing. For reasons I don't understand, the physical I/O is broken into 2 Kbyte chunks, at least on the HP system. Note that the man page for dd says that the blocksize declaration works only for raw I/O. The big factor in the performance difference is that the Wren IV has readahead which must be specifically enabled. Since it's not a supported drive on the HP system, it doesn't get turned on. It must be getting turned on with the SUN system. You're sending short, consecutive read requests to the disk, which is exactly where readahead shows the biggest improvement. However, it is not at all representative of file system or swap activity. Without readahead, it will take slightly more than one revolution to get each 2K of data (one latency, plus skipping the previously read 2K): 2K/18ms = 114 k/sec Your results: 8K*2000/144.8 = 113 k/sec The Wren IV is a ZBR technology drive, which means the data rate changes significantly depending on what cylinder is being read. This complicates using it for file system benchmarking. -Dave Kinsell use kinsell@hpfcmb.hp.com DISCLAIMER: This is not an official or officious policy statement of the Hewlett-Packard company.
kinsell@hpfcdj.HP.COM (Dave Kinsell) (12/16/89)
{Sorry for the repost, but this likely didn't make it out to the public net} >While testing the CDC Wren IV disk some dramatic differences between HP and SUN >turned up. Running the dd command illustrates the point. We reads 2000 blocks >of 8K size from the disk with the command > dd if=/dev/<block-special> of=/dev/null bs=8k count=2000 >and the time (measured with /bin/time) it takes do this is >Sun 3/80 with SunOS 4.03 real: 18.4 sys: 8.2 >HP 9000/370 with HP-UX 6.5 real: 144.8 sys: 9.1. Using dd with the block special device file isn't doing what you think it's doing. For reasons I don't understand, the physical I/O is broken into 2 Kbyte chunks, at least on the HP system. Note that the man page for dd says that the blocksize declaration works only for raw I/O. The big factor in the performance difference is that the Wren IV has readahead which must be specifically enabled. Since it's not a supported drive on the HP system, it doesn't get turned on. It must be getting turned on with the SUN system. You're sending short, consecutive read requests to the disk, which is exactly where readahead shows the biggest improvement. However, it is not at all representative of file system or swap activity. Without readahead, it will take slightly more than one revolution to get each 2K of data (one latency, plus skipping the previously read 2K): 2K/18ms = 114 k/sec Your results: 8K*2000/144.8 = 113 k/sec The Wren IV is a ZBR technology drive, which means the data rate changes significantly depending on what cylinder is being read. This complicates using it for file system benchmarking. -Dave Kinsell use kinsell@hpfcmb.hp.com DISCLAIMER: This is not an official or officious policy statement of the Hewlett-Packard company.
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (12/19/89)
[somebody has done this test:] > dd if=/dev/<block-special> of=/dev/null bs=8k count=2000 >and the time (measured with /bin/time) it takes do this is >Sun 3/80 with SunOS 4.03 real: 18.4 sys: 8.2 >HP 9000/370 with HP-UX 6.5 real: 144.8 sys: 9.1. In article <17330009@hpfcdj.HP.COM> kinsell@hpfcdj.HP.COM (Dave Kinsell) writes: Using dd with the block special device file isn't doing what you think it's doing. [ ...some comments about read ahead buffering... ] Actaully, what's happening here isn't what you think either. Read ahead buffering in the device does not enter the picture at all. SunOS 4 does ALL io (save raw devices!) via memory mapping of files (including block devices as well I believe), and does not read data in, if that's written to /dev/null immediately thereafter (this because of copy-on-write). Try doing 'time cp /vmunix /dev/null' under SunOS 4, SunOS 3, and HP-UX, and you will see; especially if run under the C shell, that gives you a count of IO operations. Otherwise we would be seeing a miracolous 1 MByte a sec out of SunOS, and I do not believe in miracles; in much the same conditions, but with SunOS 3.5 on a 3/50, I get real: 151 and sys: 17, which are compatible with the slower CPU speed of the 3/50 w.r.t. the 3/80; I do not think that SunOS 4 has drivers and a filesystem that are 10 times as fast as SunOS 3, because they are largely the same. In other words, the numbers posted above by somebody are completely bogus, and it is my general impression that HP-UX and SunOS are really within the same league of IO bandwidth. The only machines that seem to have decent or above average IO bandwidth are the MIPSco ones, and they took a lot of care in that. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
guy@auspex.auspex.com (Guy Harris) (12/20/89)
>Actaully, what's happening here isn't what you think either. Nor is it what you think.... >SunOS 4 does ALL io (save raw devices!) via memory mapping of >files (including block devices as well I believe), and does not >read data in, if that's written to /dev/null immediately >thereafter (this because of copy-on-write). SunOS does, in fact, do UFS and NFS I/O by something that basically amounts to memory mapping. What a "read" of a UFS or NFS file, or, I think, a block special file amounts to is "map the region being read into the kernel's address space, and then copy from that mapped region into the user's buffer". The kernel obviously has no idea that the data in question is going to be written to "/dev/null", so it copies it anyway, which means it has to fault the data in from the file if it's not already in memory. (Yes, I've read the code; that's how it works.) >Try doing 'time cp /vmunix /dev/null' under SunOS 4, SunOS 3, and >HP-UX, and you will see; especially if run under the C shell, >that gives you a count of IO operations. That's different from "dd". "cp" doesn't do "read"s from "/vmunix", it "mmap"s "/vmunix" into *its* address space and then writes that mapped region to "/dev/null". *That* write, obviously, never touches the data, so it doesn't have to be faulted in. "dd", on the other hand, actually does "read"s from its input file (as proven by running "trace" on it), so "dd if=/vmunix of=/dev/null" does actually read all the data in "/vmunix", as opposed to "cp /vmunix /dev/null".
pcg@aber-cs.UUCP (Piercarlo Grandi) (12/22/89)
In article <2771@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Actaully, what's happening here isn't what you think either. Nor is it what you think.... Well, apparently you are right, but in a sense this is disappointing; Sun could have been cleverer, vut has been clever enough. SunOS does, in fact, do UFS and NFS I/O by something that basically amounts to memory mapping. What a "read" of a UFS or NFS file, or, I think, a block special file amounts to is "map the region being read into the kernel's address space, and then copy from that mapped region ^^^^ into the user's buffer". If they do copy on write, and especially if the buffers are page aligned, nothing need happen. In particular, the stdio library and even open(2), read(2), ... are really layered onto memory mapping in SunOS 4. The old Unix I/O system is nearly dead; open(2) maps the file, and read(2) accesses it direct. In other words, the traditional Unix I/O is done only for some character devices (some use streams instead). SunOS 4 emulates a PDP on a Multics... The kernel obviously has no idea that the data in question is going to be written to "/dev/null", so it copies it anyway, which means it has to fault the data in from the file if it's not already in memory. Not necessarily true. With read(2) implemented as copy-on-write, the data pages would never be actually touched, they could as well remain on disc. This of course would more easily be true if Sun's dd had page aligned buffers, or if SunOS implemented unaligned copy-on-write (much difficult). (Yes, I've read the code; that's how it works.) This settles it. "dd", on the other hand, actually does "read"s from its input file (as proven by running "trace" on it), Again, if dd buffers were page aligned (they should be) and copy-on-write were used, this would not need happen. Unfortunately SunOS does not do this, so more investigation is needed. I have got some more data points, that show something interesting, after some simple tests under both SunOS 3 and 4. The discs in both cases are broadly comparable; CPU speeds are not very important here. To do the reads, and be sure that pages were faulted, I used a trivial program like this: main() { char buf[16*1024]; /* hope the optimizer does not do funny tricks */ while (read(0,buf,sizeof buf) > 0) buf[1*1024] = buf[9*1024] = 'x'; } which reads two Sun pages at a time, and modifies a byte in each to make sure that copy on write if present is exercised, and faults the page in. The following figures are quite approximate, but representative: SunOS Mbytes Type Seconds KB/sec I/Os Machine 3 10 block 95 100 5100 Sun 3/50 4 10 raw 20 500 ? Sun 3/50 4 24 - 45 500 530 Sun 3/280 What I read here is less optimistic results then some that have been posted, but impressive nonetheless. Getting over 500KB/s out of a disc is no mean feat. SunOS manages to do that both using raw device access under 3 and either under 4. The fact that SunOS 4 gives with the block device the same performance as SunOS 3 on the block device means that mapping disc blocks is effective in not requiring any additional overhead associated with buffer cache management. Actually, the interesting column is the "I/Os" column, that tells us how many I/O operations were scheduled. It is not available for SunOS 3 raw devices, but it tells us that (and I have other data that confirms this) that even if we are reading by two pages at a time, SunOS 4 actually fetches six at a time, that is does heavvy clustering os I/O requests (hopefully only when it detects sequential access). This looks like a big win, *for sequential access*. As to the actual faulting and page copying, apparently it does not matter a lot, given the CPU speed and the overlapping of I/O operations. So the result here is that eliminating the buffer cache means that mapped devices exploit well the available bandwidth, while using buffer cache and passing thru the strategy function therein reduces effective bandwidth to 20%. It would be interesting to see the bandwidth reduction due to the filesystem under both technologies. If anybody wants to have a go... Remember that you must unmount and remount a filesystem before each test, to invalidate any in-core pages. It would be interesting to see similar number for HP-UX (one of whose incarnations used to have an extent based filesystem, BTW). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk