[comp.arch] Workstation Disk I/O

sritacco@hpdmd48.boi.hp.com (Steve Ritacco) (10/04/90)

Due to some recent experiences, and some reviews I have read, I've started
to wonder about disk performance on workstations.  I thought this might
be an interesting topic for discussion in this group.

Let's consider the SPARCstations, DECstations, Personal Iris, Mips Magnum,
NeXT, HP s300/s400, Sony NEWS, etc.

How do each of these systems handle their disk I/O, what are the cost/
performance advantages of the approaches, what is the realized performance,
what is the potential performance, and what is the bottle-neck?

This is prompted by my personal experience with the NeXT, HP, Mips M-120,
Sun 3, and DECstation.  The disk through-put of these machines varies
as greatly as their configurations.  I was also intrigued by the UNIX Review
story on the Mips Magnum which found it to be very fast at disk I/O even
though it used 3.5" drives.

By the way, I don't work for HP's workstation or disk drive division, this
is just personal interest.

jtc@van-bc.wimsey.bc.ca (J.T. Conklin) (10/05/90)

In article <14900016@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes:
>Let's consider the SPARCstations, DECstations, Personal Iris, Mips Magnum,
>NeXT, HP s300/s400, Sony NEWS, etc.
>
>How do each of these systems handle their disk I/O, what are the cost/
>performance advantages of the approaches, what is the realized performance,
>what is the potential performance, and what is the bottle-neck?

The June 1990 ACM Computer Architecture News contains the article
"IOStone: A Synthetic File System Benchmark" by Park, Becker, and
Lipton.  They report the results of their benchmark on a diskfull and
a diskless Sparcstation, a Sun 3/80, a Decstation 3100, a NeXT, a Sun
2/120, a Vaxstation II, an Apollo DN3000, and a Macintosh II.

The thing I found interesting was Sun's variable size disk cashe
really skewed the results of the benchmark.  The diskless Sparcstation
outpreformed the Vaxstation, the Apollo, and the Mac.

	--jtc

-- 
J.T. Conklin	UniFax Communications Inc.
		...!{uunet,ubc-cs}!van-bc!jtc, jtc@wimsey.bc.ca

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (10/06/90)

In article <2387@van-bc.wimsey.bc.ca>, jtc@van-bc.wimsey.bc.ca (J.T. Conklin) writes:

>The June 1990 ACM Computer Architecture News contains the article
>"IOStone: A Synthetic File System Benchmark" by Park, Becker, and
>Lipton.  They report the results of their benchmark on a diskfull and
>a diskless Sparcstation, a Sun 3/80, a Decstation 3100, a NeXT, a Sun
>2/120, a Vaxstation II, an Apollo DN3000, and a Macintosh II.
>
>The thing I found interesting was Sun's variable size disk cashe
>really skewed the results of the benchmark.  The diskless Sparcstation
>outpreformed the Vaxstation, the Apollo, and the Mac.

Not surprised. I bet some of that rolls back to the CPUs of the latter 3
machines....VAXstation II is not exactly a screamer in today's society.

khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) (10/09/90)

In article <2387@van-bc.wimsey.bc.ca> jtc@van-bc.wimsey.bc.ca (J.T. Conklin) writes:

...
   The thing I found interesting was Sun's variable size disk cashe
   really skewed the results of the benchmark.  The diskless Sparcstation
   outpreformed the Vaxstation, the Apollo, and the Mac.

If the benchmark truly represents what "really goes on in systems"
this suggests that the disk cache design was very clever and useful;
not that the results are "skewed".

The $64K question is always, does the benchmark reflect real system
loads.
--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

lm@slovax.Sun.COM (Larry McVoy) (10/09/90)

In article <2387@van-bc.wimsey.bc.ca> jtc@van-bc.wimsey.bc.ca (J.T. Conklin) writes:
>In article <14900016@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes:
>>Let's consider the SPARCstations, DECstations, Personal Iris, Mips Magnum,
>>NeXT, HP s300/s400, Sony NEWS, etc.
>>
>>How do each of these systems handle their disk I/O, what are the cost/
>>performance advantages of the approaches, what is the realized performance,
>>what is the potential performance, and what is the bottle-neck?
>
>The June 1990 ACM Computer Architecture News contains the article
>"IOStone: A Synthetic File System Benchmark" by Park, Becker, and
>Lipton.  They report the results of their benchmark on a diskfull and
>a diskless Sparcstation, a Sun 3/80, a Decstation 3100, a NeXT, a Sun
>2/120, a Vaxstation II, an Apollo DN3000, and a Macintosh II.
>
>The thing I found interesting was Sun's variable size disk cashe
>really skewed the results of the benchmark.  The diskless Sparcstation
>outpreformed the Vaxstation, the Apollo, and the Mac.

I can answer the original question since I've been active in FS performance
at Sun.  Before I do that I want to comment on the referenced "benchmark".

IOstone is misnamed and misleading.  It ought to be called "cachestone" or
"buffer cachestone".  It does *not* measure I/O performance.  It measures
cache performance.  It *assumes* a buffer cache model and gives misleading
information in the face of unified memory systems such as Multics, Mach, 
and SunOS.  This is a poor benchmark.  Understand my position:  I work for
Sun,  I'm interested in performance,  I want Sun to look good,  this benchmark
makes Sun look good,  and I am saying this benchmark is junk.  I say this
because it gives people the wrong impression.  I would be very happy with
a benchmark that showed off Sun's VM system;  I'm unhappy with a benchmark 
that shows off Sun's VM system and claims to be measuring the I/O system.

OK, that said, what's going on at Sun wrt to FS performance?  Well, good news
and bad news.  The good news is that if you move big files around on UFS you
will be very happy with the next release of SunOS.  I sort of added extents
without adding extents :-)  Basically, we now do 56KB chunks of I/O where before
we would have done 8KB chunks of I/O.  But, big but, Idid not change the on
disk format at all (well, not quite;  the format is the same but the tuning
parameters are different; files are laid out contiguously).

Anyway, there's a paper coming to the winter Usenix, read that for details.

The bad news is that this doesn't help small I/O performance at all.  Still
the same old slow story there, and we haven't got a fix yet.  We're actively
looking at things like Ousterhout's logfs.

In the meantime, tmpfs is much faster in the next release, there was a bug,
with the bug fix kernel builds go 20% faster.  I work in tmpfs almost
exclusively for just this reason (I wrote a little daemon that migrates my 
files to safe storage).
---
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

abe@mace.cc.purdue.edu (Vic Abell) (10/09/90)

In article <143502@sun.Eng.Sun.COM>, lm@slovax.Sun.COM (Larry McVoy) writes:
> In article <2387@van-bc.wimsey.bc.ca> jtc@van-bc.wimsey.bc.ca (J.T. Conklin) writes:
> >In article <14900016@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes:
> 
> IOstone is misnamed and misleading.  It ought to be called "cachestone" or
> "buffer cachestone".  It does *not* measure I/O performance.  It measures
> cache performance.

Agreed.  I've had a chance to run IOStone on a number of different systems.
I found one machine with an extremely aggressive buffer cache mechanisim
and SCSI bus I/O (not a Sun  :-) whose IOStone rating exceeded that of a
Cray YMP.

It is very difficult to disable the effect of a good buffer cache when you
really want to measure I/O device or channel speed.  So far I've not seen
any test that does a good job of that, nor have I been able to construct
one myself.

Vic Abell <abe@mace.cc.purdue.edu>

patrick@convex.COM (Patrick F. McGehearty) (10/10/90)

In article <5725@mace.cc.purdue.edu> abe@mace.cc.purdue.edu (Vic Abell) writes:
...deleted stuff
>It is very difficult to disable the effect of a good buffer cache when you
>really want to measure I/O device or channel speed.  So far I've not seen
>any test that does a good job of that, nor have I been able to construct
>one myself.

I agree, direct measurement of HW performance can be tricky in the presence
of buffering, caching, etc.  When we (at Convex) wanted to measure our new
disks actual transfer rate potentials, I needed to solve this problem.

The first approach I tried was to write many large files, hoping they would
have the effect of flushing the test file out of the cache.  This method was
not particularly reliable, as it depended on the cache replacement algorithm.

The approach I like is to write a large file, dismount and remount the file
system being tested (which clears the buffer cache in our Berkeley derived
but heavily tuned implementation) and then measure the time to read the
file.  Repeat full process several times to obtain an idea of stability and
measurement error.  Of course, I had a dedicated machine with full root access.
But measuring I/O performance on a non-dedicated machine is not particularly
reliable anyway.

kinsell@hpfcdj.HP.COM (Dave Kinsell) (10/11/90)

>If the benchmark truly represents what "really goes on in systems"
>this suggests that the disk cache design was very clever and useful;
>not that the results are "skewed".

>The $64K question is always, does the benchmark reflect real system
>loads.

Well, if your real system workload consists of one trivially small
program that reads the same data file over and over, then yes, this
sort of benchmark will accurately predict system performance.

If, on the other hand, you have large programs constantly getting knocked
out of main RAM by all the caching of the data files, then these benchmarks
do a somewhat less accurate job of predicting "real" performance.  Would
this qualify as the understatement of the century?

Being careful to keep the buffer cache much smaller than the test file,
you can easily prove to yourself that systems with memory mapped
files are superior by factors of 20X or so.  I've also seen systems with
buffer caches give substantially better benchmark results than those with
mapped files, if the buffer cache was sized appropriately.  It all depends
on what you're trying to prove.



>--
>----------------------------------------------------------------
>Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
>SMI 2550 Garcia 12-33			 | (415 336 2648)   
>    Mountain View, CA 94043
>----------

Dave Kinsell
kinsell@hpfcmb.hp.com

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (10/11/90)

In article <143502@sun.Eng.Sun.COM> lm@sun.UUCP (Larry McVoy) writes:

>I would be very happy with
>a benchmark that showed off Sun's VM system;  I'm unhappy with a benchmark 
>that shows off Sun's VM system and claims to be measuring the I/O system.

Sorry, but, at least here, VM system dose not win. With dynamic buffer
caching, the same IOstone performance can be obtained.

MIPS and SONY already have such OS.

						Masataka Ohta

tbray@watsol.waterloo.edu (Tim Bray) (10/15/90)

sritacco@hpdmd48.boi.hp.com (Steve Ritacco) wrote:
  Let's consider the SPARCstations, DECstations, Personal Iris, Mips Magnum,
  NeXT, HP s300/s400, Sony NEWS, etc.
  How do each of these systems handle their disk I/O, what are the cost/
  performance advantages of the approaches, what is the realized performance,
  what is the potential performance, and what is the bottle-neck?
 
Discussion followed, among which this from Vic Abell <abe@mace.cc.purdue.edu>: 
  It is very difficult to disable the effect of a good buffer cache when you
  really want to measure I/O device or channel speed.  So far I've not seen
  any test that does a good job of that, nor have I been able to construct
  one myself.

This is nearly right.  What you almost always want to measure is *unix
filesystem performance*, a complex aggregate of bottlenecks that is remarkably
independent in some respects of the performance of the underlying I/O hardware.

I have written (and posted to this group some months back) a benchmark named
'Bonnie' that claims to do at least some of what is wanted.  In particular, it
exercises sequential buffered/char and block I/O to large files, stressing
(separately) the file allocation and cache management code.  Finally, it does a
random I/O test that makes an *aggressive* effort to bust the caching mechanism
(IFF you can dedicate filespace many times the size of physical memory to the
benchmark).  I claim the numbers it produces are a useful metric to compare
certain aspects of filesystem performance from system to system.

The benchmark grows out of experience on the New Oxford English Dictionary
project at the University of Waterloo - a multi-year struggle against I/O and
memory bottlenecks.

I would be happy to provide the source code on request, and to collate results.

Cheers, Tim Bray (tbray@watsol.waterloo.edu)