[comp.arch] How fast are your disks?

tbray@watsol.waterloo.edu (Tim Bray) (04/02/88)

In a recent meeting we were analyzing the performance of this application that
is rather I/O bound - in particular, it performs a lot of very random accesses
here and there in large (> 100 Mb) files.  Somebody said "Now, we'll assume
that Unix can do a maximum of 30 disk I/O's a second".  Somebody else remarked
that that figure had been remarkably constant for quite some time.  Somebody
else proposed that it was a fundamental law of Computer Science.  (Of course,
we are poor peons restricted to the use of Vaxes and Suns).

Anyhow - Presumably there are other people out there limited by this
particular bottleneck.  Are there reasonably-priced unix systems out there
that do better?  Are there a set of benchmarks which reliably characterize
system performance in this area?

To address this problem, I half-seriously propose a new metric: Application
Disk I/Os per Second, named, obviously, ADIOS.

Adios, amigos.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo, Ontario

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (04/02/88)

In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
>that Unix can do a maximum of 30 disk I/O's a second".  Somebody else remarked
>that that figure had been remarkably constant for quite some time.  Somebody
>else proposed that it was a fundamental law of Computer Science.  (Of course,
>we are poor peons restricted to the use of Vaxes and Suns).

Probably related to your average seek time plus rotational delay plus data
transfer. 

On most popular, extant Unix systems 20 - 30 ms is a reasonable figure
for average seek. Average rotational latency is 8.5 ms. Transfer time for a
one sector, say about 1 ms. 

Given a fast 20 ms drive, you probably should approach 30 disk I/O's per
second. 

Given a slow 30 ms drive, probably closer to 25, 40 ms about 20.

Other factors which will help are controllers which will overlap seeks;
multiple disk's to localize file accesses (allowing average seeks times to
decline); larger block sizes (actually getting the information in is only a
small part of the battle, getting there is the largest component for small
random reads).

-- 
{ihnp4!alberta!ubc-vision,uunet}!van-bc!Stuart.Lynne Vancouver,BC,604-937-7532

ron@topaz.rutgers.edu (Ron Natalie) (04/03/88)

Not only is it not a constant, it's not even true.  The sad fact
is most disk controllers for minis/micros are pretty horendous.
Sun's unfortunate use of the Xylogics 450/451 is a prime example.
Anyway, with decent controllers (or multiple controllers) there is
no reason why the figure 30 can't be exceeded and is on decent Unix
systems.

-Ron

aland@infmx.UUCP (Dr. Scump) (04/03/88)

In article <3842@watcgl.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:
>       (misc. comments about UNIX disk i/o performance, etc.)
> 
> To address this problem, I half-seriously propose a new metric: Application
> Disk I/Os per Second, named, obviously, ADIOS.
> 
> Adios, amigos.
> Tim Bray, New Oxford English Dictionary Project, U of Waterloo, Ontario

Sorry, ADIOS has already been used (and I think copyrighted).  [Company 
Name deleted] developed an access method (ISAM) for IBM mainframes 
running OS/MVT, OS/MVS, etc. called ADIOS (acronym for "Another Disk I/O
System").  It was coded in assembler and accessed from COBOL or assembler
as callable functions, and outperforms the "standard" stuff by a mile.

Plus, the terminal i/o control portion of the in-house ADIOS-based realtime
system was named TACOS ("Terminal and Communications Operating System", I
think).

Please, no anti-IBM, anti-COBOL, mainframe-bashing, etc. flames here.  
Mainframes are not necessarily evil (or, is that "necessary evil"? :-]).

And, no "too late for April Fool's Day" comments -- this is a true story.
Only the names were changed to protect the innocent.

 Alan S. Denney                | {pyramid|uunet}!infmx!aland
 Informix Software, Inc.       | CAUTION: This terminal makes wide right turns!
 Disclaimer: These opinions are mine alone.  If I am caught or killed,
             the secretary will disavow any knowledge of my actions.
-- 
 Alan S. Denney                | {pyramid|uunet}!infmx!aland
 Informix Software, Inc.       | CAUTION: This terminal makes wide right turns!
 Disclaimer: These opinions are mine alone.  If I am caught or killed,
             the secretary will disavow any knowledge of my actions.

wcs@ho95e.ATT.COM (Bill.Stewart.<ho95c>) (04/04/88)

In article <1703@van-bc.UUCP> sl@van-bc.UUCP (Stuart Lynne) writes:
:In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
:>that Unix can do a maximum of 30 disk I/O's a second".  Somebody else remarked
:On most popular, extant Unix systems 20 - 30 ms is a reasonable figure
:for average seek. Average rotational latency is 8.5 ms. Transfer..  1ms
			[Note 3600rpm = 16.6 sec * 50% = 8.3]

Optimal scheduling can of course reduce this a lot; for relatively
large transfers (even with small blocks), you should get a lot of
blocks/seek, and latency will be lower than 50% rotation.
Unfortunately, stdio BUFSIZE is still typically 512-1024 (i.e. 1 block),
so stdio-based input (and probably output) tends to break this up.
Systems with 4K blocks may do a bit better.
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
# So we got out our parsers and debuggers and lexical analyzers and various 
# implements of destruction and went off to clean up the tty driver...

scb@juniper.UUCP (Steve Blair) (04/05/88)

1) Change some kernal related parameters for swapping algorithms, 2) Manage window control better, 3) Get faster disks & controllers. An interesting talk goven at the USENIX conference by some folks from Convex(Tm), spoke of their rather large block sizes on their disks(I think it was 16k/block). This was one of the ways that they were dealing with speed issues. I can't do this since I don't have source fot SUN O/S. I can only speak for some of my customers who've I done consulting for; I yanked the 451's 

and installed the Interphase controllers and some of these newer , much faster drives. Their performances have rised much more than I could have envisioned; load times for some Lips transactions went from 25+ minutes to 7-10 minutes. It's al relative to the speed of DARK......




Steve Blair $CBlairnix(tm) Software Inc.
Cedar Park, Texas
uucp{backbone}!sun!austsun!ascway!blair
Expires: 
References: <3842@watcgl.waterloo.edu> <Apr.2.19.04.51.1988.14798@topaz.rutgers.edu>
Sender: 
Reply-To: scb@juniper.UUCP (Steve Blair)
Followup-To: 
Distribution: 
Organization: Austin UNIX Users' Group, Austin, TX
Keywords: Disk I/O throughput

clewis@spectrix.UUCP (Chris Lewis) (04/09/88)

In article <3842@watcgl.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
>In a recent meeting we were analyzing the performance of this application that
>is rather I/O bound - in particular, it performs a lot of very random accesses
>here and there in large (> 100 Mb) files.  Somebody said "Now, we'll assume
>that Unix can do a maximum of 30 disk I/O's a second".  Somebody else remarked
>that that figure had been remarkably constant for quite some time.  Somebody
>else proposed that it was a fundamental law of Computer Science.  (Of course,
>we are poor peons restricted to the use of Vaxes and Suns).
>
>Anyhow - Presumably there are other people out there limited by this
>particular bottleneck.  Are there reasonably-priced unix systems out there
>that do better?  Are there a set of benchmarks which reliably characterize
>system performance in this area?

Yes.  Depending on scenarios, even a Tower 32/400 can beat 30 I/O's per second.
Yes to the second question, and I'll post it when it's totally cleaned up.

How fast do our disks go?  Well, since I'm doing some performance analysis
I thought I'd show some numbers extracted from our database.

Environment: Standard NCR Tower 32/400 (16Mhz 68020) without CPU caching 
and with some relatively slow memory.  Disk: Hitachi 85Mb with 28Ms. avg 
seeks (moderately fast).  The standard Tower figures below are using the 
standard NCR disk controllers (ST506).  The other numbers are for a new
controller we're working with (same type of disk) that uses a SCSI interface.

Explanation of tests: "Random" is simply a series of lseek(... 512*random()...).
read(.. bsize);  Linear is simply continuous read(...bsize) and "Reread"
is continous "lseek(... 0 ...); read(... bsize)"  (wierd testing is so
we can intuit some absolute max bandwidths).  In the tables below
"bsize" is the request size in bytes, "req/sec cooked" is number of requests
of bsize per second thru buffer cache, "bw cooked" is bytes per second thru
buffer cache.  Similarly, the remaining two columns are req/sec and bandwidth
for raw interface.

Obviously, we should be doing this to specific files rather than directly
thru the blocked or unblocked special devices.  Given the amount of
resources we can commit to this evaluation, and the behaviour of the caches
we figure that only by running the real application on top will get the 
true application figures.

A lot of these numbers need to be taken with a fair grain of salt - UNIX
buffer cache hits (and controller cache hits) are occuring, so they don't
necessarily reflect *true* physical disk speed.  Just UNIX I/O throughput.

For the standard Tower, the req/seq and bandwidth is true disk speed on
raw.  On the second environment, it's difficult to say - the controller
caches blocks too.

Remember, the "req/sec" figures are blocks of bsize bytes in size.
So, the raw Linear test with bsize of 1/2 meg with standard tower is actually 
transfering about 800 blocks per second.  Even buffered it's approx 50 blocks
per second.

Standard Tower:

Random
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	35.3103	18078	35.9298	18396
1024	16.5161	16912	35.3103	36157
2048	10.449	21399	32	65536
4096	6.09524	24966	28.4444	116508
8192	3.2	26214	25.6	209715
16384	1.72973	28339	16	262144
32768	0.864865	28339	10.6667	349525
65536	0.435374	28532	5.56522	364722
131072	0.217687	28532	2.90909	381300
262144	0.109589	28728	1.45455	381300
524288	0.0547945	28728	0.727273	381300

Reread
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	862.316	441505	59.9049	30671
1024	546.133	559240	60.0147	61455
2048	327.68	671088	59.7956	122461
4096	170.667	699050	60.2353	246723
8192	89.0435	729444	30.1176	246723
16384	44.5217	729444	20.0784	328965
32768	10.6667	349525	10.6667	349525
65536	6.4	419430	6.4	419430
(UNIX buffer cache filled up)
131072	0.214765	28149	2.90909	381300
262144	0.108475	28435	1.45455	381300
524288	0.0547009	28679	0.727273	381300

Linear
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	55.3097	28318	55.8036	28571
1024	27.9018	28571	52.521	53781
2048	13.9509	28571	48.0769	98461
4096	7.06787	28950	39.05	159948
8192	3.48661	28562	28.9259	236961
16384	1.74888	28653	16.9565	277815
32768	0.870536	28525	11.4706	375868
65536	0.431111	28253	5.70588	373940
131072	0.216216	28339	2.82353	370085
262144	0.104803	27473	1.41176	370085
524288	0.0547945	28728	0.705882	370085

New Controller: 

Random
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	170.667	87381	157.538	80659
1024	170.667	174762	170.667	174762
2048	73.1429	149796	170.667	349525
4096	51.2	209715	128	524288
8192	25.6	209715	64	524288
16384	16	262144	64	1048576
32768	6.4	209715	32	1048576
65536	3.55556	233016	16	1048576
131072	1.82857	239674	7.11111	932067
262144	0.914286	239674	3.55556	932067
524288	0.444444	233016	1.77778	932067

Reread
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	840.205	430185	158.3	81049
1024	780.19	798915	167.184	171196
2048	481.882	986895	146.286	299593
4096	273.067	1118481	113.778	466033
8192	146.286	1198372	81.92	671088
16384	78.7692	1290555	48.7619	798915
32768	32	1048576	32	1048576
65536	16	1048576	10.6667	699050
131072	8	1048576	8	1048576
(UNIX buffer cache filled up)
262144	0.888889	233016	3.55556	932067
524288	0.450704	236298	1.77778	932067

Linear
bsize	req/sec	bw	req/sec	bw
	cooked	cooked	raw	raw
512	231.481	118518	162.338	83116
1024	231.481	237037	173.611	177777
2048	115.741	237037	148.81	304761
4096	57.8519	236961	120.154	492150
8192	28.9259	236961	78.1	639795
16384	13.9286	228205	48.75	798720
32768	7.22222	236657	27.8571	912822
65536	1.83019	119943	4.04167	264874
131072	0.872727	114390	1.84615	241979
262144	0.413793	108473	0.923077	241979
524288	0.26087	136770	0.666667	349525

Sorry for the format of the tables, but this is something I hacked out
of one of my statistics gathering awk scripts in a few minutes.

ps: people were making comments about "2Mb/sec" controllers only transferring
1Mb per second on Multibus.  Well, when the manufacturers quote bandwidths
they're usually quoting instantaneous max transfer rate thru the disk
interface.  Eg: "Standard SCSI" is actually about 1Mbyte/sec rated that
way.  Then, you have to consider:
	- disk driver overhead
	- UNIX system overhead
	- missed rotations/interleave
	- actual max disk output.  A standard 512 byte per sector 5.25" disk
	  that rotates at 3600 RPM has the bytes going by the head at
	  only 522K or so bytes/second (disregarding seeks and any controller
	  overhead).  You can't go faster than that no matter what you do.

Besides, Multibus is slow....
-- 
Chris Lewis, Spectrix Microsystems Inc,
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis
Phone: (416)-474-1955