tbray@watsol.waterloo.edu (Tim Bray) (04/02/88)
In a recent meeting we were analyzing the performance of this application that is rather I/O bound - in particular, it performs a lot of very random accesses here and there in large (> 100 Mb) files. Somebody said "Now, we'll assume that Unix can do a maximum of 30 disk I/O's a second". Somebody else remarked that that figure had been remarkably constant for quite some time. Somebody else proposed that it was a fundamental law of Computer Science. (Of course, we are poor peons restricted to the use of Vaxes and Suns). Anyhow - Presumably there are other people out there limited by this particular bottleneck. Are there reasonably-priced unix systems out there that do better? Are there a set of benchmarks which reliably characterize system performance in this area? To address this problem, I half-seriously propose a new metric: Application Disk I/Os per Second, named, obviously, ADIOS. Adios, amigos. Tim Bray, New Oxford English Dictionary Project, U of Waterloo, Ontario
sl@van-bc.UUCP (pri=-10 Stuart Lynne) (04/02/88)
In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: >that Unix can do a maximum of 30 disk I/O's a second". Somebody else remarked >that that figure had been remarkably constant for quite some time. Somebody >else proposed that it was a fundamental law of Computer Science. (Of course, >we are poor peons restricted to the use of Vaxes and Suns). Probably related to your average seek time plus rotational delay plus data transfer. On most popular, extant Unix systems 20 - 30 ms is a reasonable figure for average seek. Average rotational latency is 8.5 ms. Transfer time for a one sector, say about 1 ms. Given a fast 20 ms drive, you probably should approach 30 disk I/O's per second. Given a slow 30 ms drive, probably closer to 25, 40 ms about 20. Other factors which will help are controllers which will overlap seeks; multiple disk's to localize file accesses (allowing average seeks times to decline); larger block sizes (actually getting the information in is only a small part of the battle, getting there is the largest component for small random reads). -- {ihnp4!alberta!ubc-vision,uunet}!van-bc!Stuart.Lynne Vancouver,BC,604-937-7532
ron@topaz.rutgers.edu (Ron Natalie) (04/03/88)
Not only is it not a constant, it's not even true. The sad fact is most disk controllers for minis/micros are pretty horendous. Sun's unfortunate use of the Xylogics 450/451 is a prime example. Anyway, with decent controllers (or multiple controllers) there is no reason why the figure 30 can't be exceeded and is on decent Unix systems. -Ron
aland@infmx.UUCP (Dr. Scump) (04/03/88)
In article <3842@watcgl.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes: > (misc. comments about UNIX disk i/o performance, etc.) > > To address this problem, I half-seriously propose a new metric: Application > Disk I/Os per Second, named, obviously, ADIOS. > > Adios, amigos. > Tim Bray, New Oxford English Dictionary Project, U of Waterloo, Ontario Sorry, ADIOS has already been used (and I think copyrighted). [Company Name deleted] developed an access method (ISAM) for IBM mainframes running OS/MVT, OS/MVS, etc. called ADIOS (acronym for "Another Disk I/O System"). It was coded in assembler and accessed from COBOL or assembler as callable functions, and outperforms the "standard" stuff by a mile. Plus, the terminal i/o control portion of the in-house ADIOS-based realtime system was named TACOS ("Terminal and Communications Operating System", I think). Please, no anti-IBM, anti-COBOL, mainframe-bashing, etc. flames here. Mainframes are not necessarily evil (or, is that "necessary evil"? :-]). And, no "too late for April Fool's Day" comments -- this is a true story. Only the names were changed to protect the innocent. Alan S. Denney | {pyramid|uunet}!infmx!aland Informix Software, Inc. | CAUTION: This terminal makes wide right turns! Disclaimer: These opinions are mine alone. If I am caught or killed, the secretary will disavow any knowledge of my actions. -- Alan S. Denney | {pyramid|uunet}!infmx!aland Informix Software, Inc. | CAUTION: This terminal makes wide right turns! Disclaimer: These opinions are mine alone. If I am caught or killed, the secretary will disavow any knowledge of my actions.
wcs@ho95e.ATT.COM (Bill.Stewart.<ho95c>) (04/04/88)
In article <1703@van-bc.UUCP> sl@van-bc.UUCP (Stuart Lynne) writes: :In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: :>that Unix can do a maximum of 30 disk I/O's a second". Somebody else remarked :On most popular, extant Unix systems 20 - 30 ms is a reasonable figure :for average seek. Average rotational latency is 8.5 ms. Transfer.. 1ms [Note 3600rpm = 16.6 sec * 50% = 8.3] Optimal scheduling can of course reduce this a lot; for relatively large transfers (even with small blocks), you should get a lot of blocks/seek, and latency will be lower than 50% rotation. Unfortunately, stdio BUFSIZE is still typically 512-1024 (i.e. 1 block), so stdio-based input (and probably output) tends to break this up. Systems with 4K blocks may do a bit better. -- # Thanks; # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs # So we got out our parsers and debuggers and lexical analyzers and various # implements of destruction and went off to clean up the tty driver...
scb@juniper.UUCP (Steve Blair) (04/05/88)
1) Change some kernal related parameters for swapping algorithms, 2) Manage window control better, 3) Get faster disks & controllers. An interesting talk goven at the USENIX conference by some folks from Convex(Tm), spoke of their rather large block sizes on their disks(I think it was 16k/block). This was one of the ways that they were dealing with speed issues. I can't do this since I don't have source fot SUN O/S. I can only speak for some of my customers who've I done consulting for; I yanked the 451's and installed the Interphase controllers and some of these newer , much faster drives. Their performances have rised much more than I could have envisioned; load times for some Lips transactions went from 25+ minutes to 7-10 minutes. It's al relative to the speed of DARK...... Steve Blair $CBlairnix(tm) Software Inc. Cedar Park, Texas uucp{backbone}!sun!austsun!ascway!blair Expires: References: <3842@watcgl.waterloo.edu> <Apr.2.19.04.51.1988.14798@topaz.rutgers.edu> Sender: Reply-To: scb@juniper.UUCP (Steve Blair) Followup-To: Distribution: Organization: Austin UNIX Users' Group, Austin, TX Keywords: Disk I/O throughput
clewis@spectrix.UUCP (Chris Lewis) (04/09/88)
In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: >In a recent meeting we were analyzing the performance of this application that >is rather I/O bound - in particular, it performs a lot of very random accesses >here and there in large (> 100 Mb) files. Somebody said "Now, we'll assume >that Unix can do a maximum of 30 disk I/O's a second". Somebody else remarked >that that figure had been remarkably constant for quite some time. Somebody >else proposed that it was a fundamental law of Computer Science. (Of course, >we are poor peons restricted to the use of Vaxes and Suns). > >Anyhow - Presumably there are other people out there limited by this >particular bottleneck. Are there reasonably-priced unix systems out there >that do better? Are there a set of benchmarks which reliably characterize >system performance in this area? Yes. Depending on scenarios, even a Tower 32/400 can beat 30 I/O's per second. Yes to the second question, and I'll post it when it's totally cleaned up. How fast do our disks go? Well, since I'm doing some performance analysis I thought I'd show some numbers extracted from our database. Environment: Standard NCR Tower 32/400 (16Mhz 68020) without CPU caching and with some relatively slow memory. Disk: Hitachi 85Mb with 28Ms. avg seeks (moderately fast). The standard Tower figures below are using the standard NCR disk controllers (ST506). The other numbers are for a new controller we're working with (same type of disk) that uses a SCSI interface. Explanation of tests: "Random" is simply a series of lseek(... 512*random()...). read(.. bsize); Linear is simply continuous read(...bsize) and "Reread" is continous "lseek(... 0 ...); read(... bsize)" (wierd testing is so we can intuit some absolute max bandwidths). In the tables below "bsize" is the request size in bytes, "req/sec cooked" is number of requests of bsize per second thru buffer cache, "bw cooked" is bytes per second thru buffer cache. Similarly, the remaining two columns are req/sec and bandwidth for raw interface. Obviously, we should be doing this to specific files rather than directly thru the blocked or unblocked special devices. Given the amount of resources we can commit to this evaluation, and the behaviour of the caches we figure that only by running the real application on top will get the true application figures. A lot of these numbers need to be taken with a fair grain of salt - UNIX buffer cache hits (and controller cache hits) are occuring, so they don't necessarily reflect *true* physical disk speed. Just UNIX I/O throughput. For the standard Tower, the req/seq and bandwidth is true disk speed on raw. On the second environment, it's difficult to say - the controller caches blocks too. Remember, the "req/sec" figures are blocks of bsize bytes in size. So, the raw Linear test with bsize of 1/2 meg with standard tower is actually transfering about 800 blocks per second. Even buffered it's approx 50 blocks per second. Standard Tower: Random bsize req/sec bw req/sec bw cooked cooked raw raw 512 35.3103 18078 35.9298 18396 1024 16.5161 16912 35.3103 36157 2048 10.449 21399 32 65536 4096 6.09524 24966 28.4444 116508 8192 3.2 26214 25.6 209715 16384 1.72973 28339 16 262144 32768 0.864865 28339 10.6667 349525 65536 0.435374 28532 5.56522 364722 131072 0.217687 28532 2.90909 381300 262144 0.109589 28728 1.45455 381300 524288 0.0547945 28728 0.727273 381300 Reread bsize req/sec bw req/sec bw cooked cooked raw raw 512 862.316 441505 59.9049 30671 1024 546.133 559240 60.0147 61455 2048 327.68 671088 59.7956 122461 4096 170.667 699050 60.2353 246723 8192 89.0435 729444 30.1176 246723 16384 44.5217 729444 20.0784 328965 32768 10.6667 349525 10.6667 349525 65536 6.4 419430 6.4 419430 (UNIX buffer cache filled up) 131072 0.214765 28149 2.90909 381300 262144 0.108475 28435 1.45455 381300 524288 0.0547009 28679 0.727273 381300 Linear bsize req/sec bw req/sec bw cooked cooked raw raw 512 55.3097 28318 55.8036 28571 1024 27.9018 28571 52.521 53781 2048 13.9509 28571 48.0769 98461 4096 7.06787 28950 39.05 159948 8192 3.48661 28562 28.9259 236961 16384 1.74888 28653 16.9565 277815 32768 0.870536 28525 11.4706 375868 65536 0.431111 28253 5.70588 373940 131072 0.216216 28339 2.82353 370085 262144 0.104803 27473 1.41176 370085 524288 0.0547945 28728 0.705882 370085 New Controller: Random bsize req/sec bw req/sec bw cooked cooked raw raw 512 170.667 87381 157.538 80659 1024 170.667 174762 170.667 174762 2048 73.1429 149796 170.667 349525 4096 51.2 209715 128 524288 8192 25.6 209715 64 524288 16384 16 262144 64 1048576 32768 6.4 209715 32 1048576 65536 3.55556 233016 16 1048576 131072 1.82857 239674 7.11111 932067 262144 0.914286 239674 3.55556 932067 524288 0.444444 233016 1.77778 932067 Reread bsize req/sec bw req/sec bw cooked cooked raw raw 512 840.205 430185 158.3 81049 1024 780.19 798915 167.184 171196 2048 481.882 986895 146.286 299593 4096 273.067 1118481 113.778 466033 8192 146.286 1198372 81.92 671088 16384 78.7692 1290555 48.7619 798915 32768 32 1048576 32 1048576 65536 16 1048576 10.6667 699050 131072 8 1048576 8 1048576 (UNIX buffer cache filled up) 262144 0.888889 233016 3.55556 932067 524288 0.450704 236298 1.77778 932067 Linear bsize req/sec bw req/sec bw cooked cooked raw raw 512 231.481 118518 162.338 83116 1024 231.481 237037 173.611 177777 2048 115.741 237037 148.81 304761 4096 57.8519 236961 120.154 492150 8192 28.9259 236961 78.1 639795 16384 13.9286 228205 48.75 798720 32768 7.22222 236657 27.8571 912822 65536 1.83019 119943 4.04167 264874 131072 0.872727 114390 1.84615 241979 262144 0.413793 108473 0.923077 241979 524288 0.26087 136770 0.666667 349525 Sorry for the format of the tables, but this is something I hacked out of one of my statistics gathering awk scripts in a few minutes. ps: people were making comments about "2Mb/sec" controllers only transferring 1Mb per second on Multibus. Well, when the manufacturers quote bandwidths they're usually quoting instantaneous max transfer rate thru the disk interface. Eg: "Standard SCSI" is actually about 1Mbyte/sec rated that way. Then, you have to consider: - disk driver overhead - UNIX system overhead - missed rotations/interleave - actual max disk output. A standard 512 byte per sector 5.25" disk that rotates at 3600 RPM has the bytes going by the head at only 522K or so bytes/second (disregarding seeks and any controller overhead). You can't go faster than that no matter what you do. Besides, Multibus is slow.... -- Chris Lewis, Spectrix Microsystems Inc, UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis Phone: (416)-474-1955