[comp.sys.mac] Reekes on Disktimer

mjohnson@Apple.COM (Mark Johnson) (11/10/88)

I'm requesting this message to be posted to this net.
I am not a member. -JR

My original comments that I've heard about being published, once 
again, contains some inaccuracies.  My intent at that time, some 
18 months ago, was to stir up some controversy and present the 
'con' view point regarding the results being published in popular 
magazines and on local BBSs.  That may or may not have been a good 
idea, since I'm still trying to explain myself.  At any rate, I 
also recently read S. Brecher's comments dated Nov. 8 on yet 
'another network' rebutting the old feud once again.  Steve's 
corrections to my statements are accurate.  The old discussion 
that Steve and I had from times of old gave me more to think 
about.  His rebuttal corrected some of my inaccuracies.  I've 
since learned of the errors in that original article.  This 
original article taken out of context cannot be appreciated unless 
the entire discussion was also presented.  I wish to withdraw the 
original article due to mistakes it contained, and after my re-
examining of the Disk Timer source code.  Although, this has not 
changed my opinion of Disk Timer.

I do have more than 'no idea' of what I'm talking about, but my 
misunderstandings republished recently may lead one to believe 
otherwise.  Then about 12 months ago, another fellow decided to 
revise Disk Timer and bring forth a new version.  I had a long 
series of discussions on yet 'another network'.  The following is 
my comments made at that time, which I still believe true.  [I 
made a few new edits, shown in brackets]  I still stand my 
following statements made during the second discussion of the 
continuing Disk Timer "Pro vs. Con" series.

________________________________________
Date:     Mon Sep 8, 1987  10:45 am  EST 

I wish to express my sentiments regarding the resent proposal of 
revising DiskTimer.  Before I make my comments, I'll add that I 
have great respect for Steve Brecher's work.  My first reaction 
was, "Why do we need another DiskTimer?"  The earlier versions 
caused more misunderstandings and misrepresentations than anything 
else.  Benchmarks are not to be considered lightly.  As an 
example, consider how BYTE magazine can't produce a valid set of 
tests for the IBM vs. Macintosh benchmarks.  There are issues that 
must be addressed if we are going to be publishing the results of 
any benchmark test.  It must consider the methods of how a system 
operates in real life.  So, if DiskTimer is to be considered at 
all then it needs to test a drive under "real world" situations.  
This means it is going to require standard file I/O (e.g. using 
the Mac's File Manager).  Without the use of the test files, the 
results are too far removed from reality.

In other words, the purpose of a drive is to read and write files.  
Any benchmark of a drive's performance must test its ability to do 
this, which is done via the File Manager.  Steve has defended the 
non-use of the File Manager by stating, "I developed DiskTimer II 
precisely to avoid using File Manager calls, i.e., to provide a 
benchmark that could be run without initializing the disk".  This 
is ridiculous statement in my opinion.  How many people can use a 
hard disk that hasn't been initialized? [text deleted -JR]

I understand that DiskTimer is actually testing the driver's 
lowest level of the interface, so then maybe it should be renamed 
to "DriverTimer".  I also understand that DiskTimer is not 
intended to be a test of actual system performance.  But then, of 
what use are the results in that case?  The fact is that DiskTimer 
is going to be scooped up by everyone with a hard disk and the 
results are going to be publicized at every user level.  (Witness 
the 'special' version of DiskTimer that SuperMac Technologies has 
produced.  They even claim that their drives are three times 
faster than everyone else's based on these results.)  Some 
magazine editors have stated that they will no longer publish 
performance tests based on DiskTimer results.  (refer to Ric 
Ford's recent article of drives for the Mac SE printed in MacWeek)  
MacWeek has found that the timings of DiskTimer II have no 
correlation with real world performance, and therefore the results 
have little or no value in their comparisons.

The following list is requirements that the new DiskTimer (DT) 
must include, if it is to be rewritten at all.

1.   There needs to be a differentiation in the test between 
multiple vs. single block transfers.  A large multiple block reads 
will always perform best on a drive formatted with a 1:1 
interleave.  In the real world of Macintosh, most drives are 
interleaved.  This causes the multiple block read/write tests to 
be weighted towards any drive formatted with no interleave and 
will handicap a drive that is interleaved. [as we move to higher 
capacity drives, the 1:1 interleave is more common since these 
drives often do not support interleaving -JR]

2.   In the real world of Macintosh, all file I/O requests are 
given by the File Manager.  If DT is to ignore reading and writing 
files stored on the disk, then its results are invalid [in my 
opinion -JR].  Testing the performance of a drive requires the 
testing of the drive under typical conditions.  Since the biggest 
percentage of the over-head in file I/O is caused by the File 
Manager, then DT III must use it.  DT should create, read, and 
write files.  Yes, DT results will then be subjected to a drive's 
fragmentation.  But then again, the FileManager will try to use a 
contiguous space on the drive, which is another reason to use the 
FileManager.  This is the typical situation.

3.   Consideration must be given to the drive's interleave, number 
of read/write heads, number of bytes per block, number of blocks 
per track, and number of tracks per cylinders.  Without this 
information, no test will be fair to drives with dissimilar 
geometries.

4.   DT needs to address drive configurations that can invalidate 
its results, such as a drive with integral caching.  I feel there 
is an inconsistency in the philosophy of DT.  Steve says, "the 
results in such cases will almost certainly be so low (so 'good') 
as to immediately identify them as invalid."  But DT is being 
distributed to the public at large, which does not have the 
technical knowledge to determine when the results are too 'good' 
to be true.  The typical user of DT is Joe Blow trying to beat the 
guy next door.  [this is the reason I take an issue with Disk 
Timer -JR]

5.   I recommend a larger than 24K file to be used in the test.  
The majority of files on a Macintosh system are considerably 
larger than this.

6.   During the test, a warning should be presented to the user 
telling him that interrupting it may cause loss of data, and 
possibly the entire drive.  Although this is true with any and all 
programs that write to a drive, DT is unique.  It quietly sits 
there performing its test.  If a less than knowledgeable user gets 
bored with the test, he may be tempted to abort it by resetting or 
powering off the computer.

7a.   I've saved the criticism of the access time test for last, 
because it has the biggest problems.  With DT's access test, a 
slower seeking drive may test better than a faster seeking drive.  
How?  The answer is in the drive's geometry.  As an example; if a 
drive with 15 read/write heads is matched against a drive with 
only 4 heads, then the bigger drive has access to nearly four 
times as much data without performing any head seeks!  This is 
exactly why DT's access testing methods are not a valid test of 
head seeks.  The drive with only 4 heads will need to travel a lot 
more distance seeking in DT's test.  Any access test needs to 
insure that an actual head seek across a specific number of tracks 
has incurred.  If DT III is going to report access times, then it 
better make certain that the heads have actually moved a certain 
number of tracks.  I've been able to alter the results of the 
access test by changing the interleave of the drive.  Obviously, 
the drive's access time could not have actually changed by this.  
Also, consider the SCSI Seek command in the access test.

7b.   Steve defends his access time results by stating, "Do not 
confuse DiskTimer II's access time test with an attempt to 
simulate hardware vendors' average access time statistic."  Then I 
beg, please, do not call it an "access time" test.  The current 
access time test reads a block and then another block one Megabyte 
into the drive, therefore adding to this result the time it takes 
to read the two blocks it is seeking to.  Steve believes "this 
confoundment (sic) is not significant".  But in fact it could very 
well be, considering the interleave of the drive.  The heads may 
have found the proper track, but the block was not in position 
under the heads.  This will add the time it takes to spin the 
platter(s) to get the block under the heads.  So the problem 
described here is testing seeks across so many Megabytes.  Without 
knowledge of the drive's geometry, this testing method is not 
useful and potentially misleading.

In conclusion, the reason I'm so adamant to the proposed revision 
to DT is that it has been exploited by far too many people.  It 
has established its name as 'the' test to use.  It is the de-facto 
standard, but it is a loaded weapon and in the wrong hands it is 
dangerous.  Mis-information is worse than no information.  I feel 
if DT is to be re-written then it needs to be re-philosophized.  
As Ephraim Vishniac has pointed out, DT has been exploited by 
certain vendors.  Certainly everyone will agree, the purpose of a 
drive is to store files.  With this in mind, any performance 
results of a drive should test the drive's ability to transfer 
files to and from a Macintosh.  DiskTimer in the past has only 
tested the driver's interface.  This is too far removed from the 
real world of hard disk usage.


My final words regarding this 'benchmark war' is this.  People
interested in the evaluation of SCSI drives on the Macintosh need
a copy of SCSI Evaluator.  Send $20 to "Digital Microware, PO Box
3527, Mission Viejo, CA 92690.  In return you will receive a 60+
page manual discussing benchmarks for SCSI drives.  Before 
continuing any further discussion of SCSI test programs, everyone 
concerned needs to read the information in the SCSI Evaluator 
manual.  That is all.  -JR

brecher@well.UUCP (Steve Brecher) (11/11/88)

I persist in the hope that some principles of disk performance testing
may be of interest.  It is not my purpose to defend DiskTimer II as a
superior benchmark.

In article <20306@apple.Apple.COM> posted on his behalf, Jim Reekes writes:

> Steve has defended the  non-use of the File Manager by stating, "I
> developed DiskTimer II precisely to avoid using File Manager calls, i.e.,
> to provide a benchmark that could be run without initializing the disk".
> This is ridiculous statement in my opinion.  How many people can use a 
> hard disk that hasn't been initialized?

A good benchmark that uses the File Manager requires initialization of the
disk followed by loading it with a suite of test files -- the same files
loaded in the same order for each disk under test.  This is necessary
to make the File Manager's view of the disk identical among disks under
test.

> I understand that DiskTimer [II] is actually testing the driver's 
> lowest level of the interface, so then maybe it should be renamed 
> to "DriverTimer".

DiskSubSystemTimer, anyone?  The results are a function of the disk
driver software and the disk hardware, including disk controller
and disk mechanism.  They are also a function of the host Mac
hardware, and any system software (e.g., SCSI Manager) used by
the driver.

> I also understand that DiskTimer is not intended to be a test of
> actual system performance.  But then, of what use are the results
> in that case?

They are of use in many cases in evaluating performance in transferring
24KB blocks of data, and of access time.  (More on the latter below.)
I suspect that the real contribution of DiskTimer II has been its use by
disk system vendors in evaluating and in some cases motivating
improvement in their product performance characteristics.  For example,
the DataFrame XP series grew out of a resolve formed by Steve Edelman
(SuperMac) in response to the results of the original version of DiskTimer
(which was then called DiskBench, I think). That DiskTimer II results have
been abused by advertisers should be cause for complaint about or to the
advertisers, not the program. The most recent abuse, since discontinued,
has been by PLI in claiming miraculous access times (due to cacheing).

> MacWeek has found that the timings of DiskTimer II have no 
> correlation with real world performance.

*No* correlation?  I missed the article referred to, but if Ric
Ford said that he was in error. Consider, from slow to fast,
an Apple serial-port HD20, a typical 20M Seagate N-series based
SCSI subsystem, and a typical large CDC Wren based system. "Slow
to fast" describes both real world performance and DiskTimer II
results.  DiskTimer II's imperfections as a benchmark notwithstanding,
it does actually use the disk in a way that must be correlated with
*some* aspects of real world performance.

> There needs to be a differentiation in the test between 
> multiple vs. single block transfers.  A large multiple block reads 
> will always perform best on a drive formatted with a 1:1 
> interleave.  In the real world of Macintosh, most drives are 
> interleaved.  This causes the multiple block read/write tests to 
> be weighted towards any drive formatted with no interleave and 
> will handicap a drive that is interleaved. [as we move to higher 
> capacity drives, the 1:1 interleave is more common since these 
> drives often do not support interleaving -JR]

Multiple block reads perform best on a drive with 1:1 interleave
provided that interleave is suitable for the subsystem, i.e., that
the subsystem keeps up (does not miss sectors).  If the subsystem
is too slow to keep up with 1:1, then best results will be at
whatever interleave is suitable for it, and 1:1 will not be best.
Drives that require interleaving for best performance are of
course "handicapped" relative to drives that don't -- that's the
whole point.  It's not that the big, fast drives don't *support*
interleaving; it's that they don't *require* it for maximum
throughput.  (Whether they support changing the interleave by
the user is not relevant.)

> Consideration must be given to the drive's interleave, number 
> of read/write heads, number of bytes per block, number of blocks 
> per track, and number of tracks per cylinders.  Without this 
> information, no test will be fair to drives with dissimilar 
> geometries.
> ...
> With DT's access test, a 
> slower seeking drive may test better than a faster seeking drive.  
> How?  The answer is in the drive's geometry.  As an example; if a 
> drive with 15 read/write heads is matched against a drive with 
> only 4 heads, then the bigger drive has access to nearly four 
> times as much data without performing any head seeks!  This is 
> exactly why DT's access testing methods are not a valid test of 
> head seeks.  The drive with only 4 heads will need to travel a lot 
> more distance seeking in DT's test.  Any access test needs to 
> insure that an actual head seek across a specific number of tracks 
> has incurred.

The idea is not to measure only the speed of head movement; the
idea is to measure the speed of access to data.  That speed is
affected by head movement speed, rotation speed, and geometry.
If two users load the same data on two drives, the user whose drive
has more heads, larger tracks, etc. will, other things being equal,
have better performance.  In other words, "the bigger drive [having]
access to [more] data without performing any head seeks" gives
it a performance advantage.

> The current  access time test reads a block and then another block one
> Megabyte into the drive, therefore adding to this result the time it takes 
> to read the two blocks it is seeking to.  Steve believes "this 
> [confounding] is not significant".  But in fact it could very 
> well be, considering the interleave of the drive.  The heads may 
> have found the proper track, but the block was not in position 
> under the heads.  This will add the time it takes to spin the 
> platter(s) to get the block under the heads.  So the problem 
> described here is testing seeks across so many Megabytes.  Without 
> knowledge of the drive's geometry, this testing method is not 
> useful and potentially misleading.

Rotational latency is part of access time.  DiskTimer II avoids
penalizing certain geometries by inserting a random time delay
between the single-block requests, so that the total latency will
approach the drive's average.  Unfortunately, as Ephraim Vishniak has
pointed out, I introduced a design error when I inserted
the random delays, namely synchronization of the requests with
the Mac's 60hz clock.  But if that were corrected, the access
time test would be a reasonably good benchmark for subsystems
not providing block cacheing.

ephraim@think.COM (Ephraim Vishniac) (11/13/88)

In article <7625@well.UUCP> brecher@well.UUCP (Steve Brecher) writes:
>> MacWeek has found that the timings of DiskTimer II have no 
>> correlation with real world performance.
>
>*No* correlation?  I missed the article referred to...

So did I, but:

The current issue of MacWorld (December 1988) has a review of umpteen
20-megabyte drives.  They measured performance in three ways:
DiskTimer II, their own "reality check" (a single 512Kbyte read), and
a series of end-user operations (copying files, launching
applications, opening documents).

From DiskTimer and the reality check, they calculated data transfer
rates.  I was very surprised at the level of agreement.  DiskTimer was
always marginally more optimistic, but the differences were slight.
(BTW, for a good laugh note that the graphs on page 134 are labelled
in "kilobits/second," with the fastest drive rated at about 2.5Kb/S.
Time for a sanity check on those captions...)

The important lesson comes in comparing the transfer rate graphs to
the "Real-World Performance" graph.  In data transfer rates, the
drives vary by a factor of 4:1.  But by "real-world performance," the
range is less than 2:1 and the rankings are different.  I've said it
before, but I'll say it again: data transfer rate is over-rated.

Ephraim Vishniac					  ephraim@think.com
Thinking Machines Corporation / 245 First Street / Cambridge, MA 02142-1214

     On two occasions I have been asked, "Pray, Mr. Babbage, if you put
     into the machine wrong figures, will the right answers come out?"