mjohnson@Apple.COM (Mark Johnson) (11/10/88)
I'm requesting this message to be posted to this net. I am not a member. -JR My original comments that I've heard about being published, once again, contains some inaccuracies. My intent at that time, some 18 months ago, was to stir up some controversy and present the 'con' view point regarding the results being published in popular magazines and on local BBSs. That may or may not have been a good idea, since I'm still trying to explain myself. At any rate, I also recently read S. Brecher's comments dated Nov. 8 on yet 'another network' rebutting the old feud once again. Steve's corrections to my statements are accurate. The old discussion that Steve and I had from times of old gave me more to think about. His rebuttal corrected some of my inaccuracies. I've since learned of the errors in that original article. This original article taken out of context cannot be appreciated unless the entire discussion was also presented. I wish to withdraw the original article due to mistakes it contained, and after my re- examining of the Disk Timer source code. Although, this has not changed my opinion of Disk Timer. I do have more than 'no idea' of what I'm talking about, but my misunderstandings republished recently may lead one to believe otherwise. Then about 12 months ago, another fellow decided to revise Disk Timer and bring forth a new version. I had a long series of discussions on yet 'another network'. The following is my comments made at that time, which I still believe true. [I made a few new edits, shown in brackets] I still stand my following statements made during the second discussion of the continuing Disk Timer "Pro vs. Con" series. ________________________________________ Date: Mon Sep 8, 1987 10:45 am EST I wish to express my sentiments regarding the resent proposal of revising DiskTimer. Before I make my comments, I'll add that I have great respect for Steve Brecher's work. My first reaction was, "Why do we need another DiskTimer?" The earlier versions caused more misunderstandings and misrepresentations than anything else. Benchmarks are not to be considered lightly. As an example, consider how BYTE magazine can't produce a valid set of tests for the IBM vs. Macintosh benchmarks. There are issues that must be addressed if we are going to be publishing the results of any benchmark test. It must consider the methods of how a system operates in real life. So, if DiskTimer is to be considered at all then it needs to test a drive under "real world" situations. This means it is going to require standard file I/O (e.g. using the Mac's File Manager). Without the use of the test files, the results are too far removed from reality. In other words, the purpose of a drive is to read and write files. Any benchmark of a drive's performance must test its ability to do this, which is done via the File Manager. Steve has defended the non-use of the File Manager by stating, "I developed DiskTimer II precisely to avoid using File Manager calls, i.e., to provide a benchmark that could be run without initializing the disk". This is ridiculous statement in my opinion. How many people can use a hard disk that hasn't been initialized? [text deleted -JR] I understand that DiskTimer is actually testing the driver's lowest level of the interface, so then maybe it should be renamed to "DriverTimer". I also understand that DiskTimer is not intended to be a test of actual system performance. But then, of what use are the results in that case? The fact is that DiskTimer is going to be scooped up by everyone with a hard disk and the results are going to be publicized at every user level. (Witness the 'special' version of DiskTimer that SuperMac Technologies has produced. They even claim that their drives are three times faster than everyone else's based on these results.) Some magazine editors have stated that they will no longer publish performance tests based on DiskTimer results. (refer to Ric Ford's recent article of drives for the Mac SE printed in MacWeek) MacWeek has found that the timings of DiskTimer II have no correlation with real world performance, and therefore the results have little or no value in their comparisons. The following list is requirements that the new DiskTimer (DT) must include, if it is to be rewritten at all. 1. There needs to be a differentiation in the test between multiple vs. single block transfers. A large multiple block reads will always perform best on a drive formatted with a 1:1 interleave. In the real world of Macintosh, most drives are interleaved. This causes the multiple block read/write tests to be weighted towards any drive formatted with no interleave and will handicap a drive that is interleaved. [as we move to higher capacity drives, the 1:1 interleave is more common since these drives often do not support interleaving -JR] 2. In the real world of Macintosh, all file I/O requests are given by the File Manager. If DT is to ignore reading and writing files stored on the disk, then its results are invalid [in my opinion -JR]. Testing the performance of a drive requires the testing of the drive under typical conditions. Since the biggest percentage of the over-head in file I/O is caused by the File Manager, then DT III must use it. DT should create, read, and write files. Yes, DT results will then be subjected to a drive's fragmentation. But then again, the FileManager will try to use a contiguous space on the drive, which is another reason to use the FileManager. This is the typical situation. 3. Consideration must be given to the drive's interleave, number of read/write heads, number of bytes per block, number of blocks per track, and number of tracks per cylinders. Without this information, no test will be fair to drives with dissimilar geometries. 4. DT needs to address drive configurations that can invalidate its results, such as a drive with integral caching. I feel there is an inconsistency in the philosophy of DT. Steve says, "the results in such cases will almost certainly be so low (so 'good') as to immediately identify them as invalid." But DT is being distributed to the public at large, which does not have the technical knowledge to determine when the results are too 'good' to be true. The typical user of DT is Joe Blow trying to beat the guy next door. [this is the reason I take an issue with Disk Timer -JR] 5. I recommend a larger than 24K file to be used in the test. The majority of files on a Macintosh system are considerably larger than this. 6. During the test, a warning should be presented to the user telling him that interrupting it may cause loss of data, and possibly the entire drive. Although this is true with any and all programs that write to a drive, DT is unique. It quietly sits there performing its test. If a less than knowledgeable user gets bored with the test, he may be tempted to abort it by resetting or powering off the computer. 7a. I've saved the criticism of the access time test for last, because it has the biggest problems. With DT's access test, a slower seeking drive may test better than a faster seeking drive. How? The answer is in the drive's geometry. As an example; if a drive with 15 read/write heads is matched against a drive with only 4 heads, then the bigger drive has access to nearly four times as much data without performing any head seeks! This is exactly why DT's access testing methods are not a valid test of head seeks. The drive with only 4 heads will need to travel a lot more distance seeking in DT's test. Any access test needs to insure that an actual head seek across a specific number of tracks has incurred. If DT III is going to report access times, then it better make certain that the heads have actually moved a certain number of tracks. I've been able to alter the results of the access test by changing the interleave of the drive. Obviously, the drive's access time could not have actually changed by this. Also, consider the SCSI Seek command in the access test. 7b. Steve defends his access time results by stating, "Do not confuse DiskTimer II's access time test with an attempt to simulate hardware vendors' average access time statistic." Then I beg, please, do not call it an "access time" test. The current access time test reads a block and then another block one Megabyte into the drive, therefore adding to this result the time it takes to read the two blocks it is seeking to. Steve believes "this confoundment (sic) is not significant". But in fact it could very well be, considering the interleave of the drive. The heads may have found the proper track, but the block was not in position under the heads. This will add the time it takes to spin the platter(s) to get the block under the heads. So the problem described here is testing seeks across so many Megabytes. Without knowledge of the drive's geometry, this testing method is not useful and potentially misleading. In conclusion, the reason I'm so adamant to the proposed revision to DT is that it has been exploited by far too many people. It has established its name as 'the' test to use. It is the de-facto standard, but it is a loaded weapon and in the wrong hands it is dangerous. Mis-information is worse than no information. I feel if DT is to be re-written then it needs to be re-philosophized. As Ephraim Vishniac has pointed out, DT has been exploited by certain vendors. Certainly everyone will agree, the purpose of a drive is to store files. With this in mind, any performance results of a drive should test the drive's ability to transfer files to and from a Macintosh. DiskTimer in the past has only tested the driver's interface. This is too far removed from the real world of hard disk usage. My final words regarding this 'benchmark war' is this. People interested in the evaluation of SCSI drives on the Macintosh need a copy of SCSI Evaluator. Send $20 to "Digital Microware, PO Box 3527, Mission Viejo, CA 92690. In return you will receive a 60+ page manual discussing benchmarks for SCSI drives. Before continuing any further discussion of SCSI test programs, everyone concerned needs to read the information in the SCSI Evaluator manual. That is all. -JR
brecher@well.UUCP (Steve Brecher) (11/11/88)
I persist in the hope that some principles of disk performance testing may be of interest. It is not my purpose to defend DiskTimer II as a superior benchmark. In article <20306@apple.Apple.COM> posted on his behalf, Jim Reekes writes: > Steve has defended the non-use of the File Manager by stating, "I > developed DiskTimer II precisely to avoid using File Manager calls, i.e., > to provide a benchmark that could be run without initializing the disk". > This is ridiculous statement in my opinion. How many people can use a > hard disk that hasn't been initialized? A good benchmark that uses the File Manager requires initialization of the disk followed by loading it with a suite of test files -- the same files loaded in the same order for each disk under test. This is necessary to make the File Manager's view of the disk identical among disks under test. > I understand that DiskTimer [II] is actually testing the driver's > lowest level of the interface, so then maybe it should be renamed > to "DriverTimer". DiskSubSystemTimer, anyone? The results are a function of the disk driver software and the disk hardware, including disk controller and disk mechanism. They are also a function of the host Mac hardware, and any system software (e.g., SCSI Manager) used by the driver. > I also understand that DiskTimer is not intended to be a test of > actual system performance. But then, of what use are the results > in that case? They are of use in many cases in evaluating performance in transferring 24KB blocks of data, and of access time. (More on the latter below.) I suspect that the real contribution of DiskTimer II has been its use by disk system vendors in evaluating and in some cases motivating improvement in their product performance characteristics. For example, the DataFrame XP series grew out of a resolve formed by Steve Edelman (SuperMac) in response to the results of the original version of DiskTimer (which was then called DiskBench, I think). That DiskTimer II results have been abused by advertisers should be cause for complaint about or to the advertisers, not the program. The most recent abuse, since discontinued, has been by PLI in claiming miraculous access times (due to cacheing). > MacWeek has found that the timings of DiskTimer II have no > correlation with real world performance. *No* correlation? I missed the article referred to, but if Ric Ford said that he was in error. Consider, from slow to fast, an Apple serial-port HD20, a typical 20M Seagate N-series based SCSI subsystem, and a typical large CDC Wren based system. "Slow to fast" describes both real world performance and DiskTimer II results. DiskTimer II's imperfections as a benchmark notwithstanding, it does actually use the disk in a way that must be correlated with *some* aspects of real world performance. > There needs to be a differentiation in the test between > multiple vs. single block transfers. A large multiple block reads > will always perform best on a drive formatted with a 1:1 > interleave. In the real world of Macintosh, most drives are > interleaved. This causes the multiple block read/write tests to > be weighted towards any drive formatted with no interleave and > will handicap a drive that is interleaved. [as we move to higher > capacity drives, the 1:1 interleave is more common since these > drives often do not support interleaving -JR] Multiple block reads perform best on a drive with 1:1 interleave provided that interleave is suitable for the subsystem, i.e., that the subsystem keeps up (does not miss sectors). If the subsystem is too slow to keep up with 1:1, then best results will be at whatever interleave is suitable for it, and 1:1 will not be best. Drives that require interleaving for best performance are of course "handicapped" relative to drives that don't -- that's the whole point. It's not that the big, fast drives don't *support* interleaving; it's that they don't *require* it for maximum throughput. (Whether they support changing the interleave by the user is not relevant.) > Consideration must be given to the drive's interleave, number > of read/write heads, number of bytes per block, number of blocks > per track, and number of tracks per cylinders. Without this > information, no test will be fair to drives with dissimilar > geometries. > ... > With DT's access test, a > slower seeking drive may test better than a faster seeking drive. > How? The answer is in the drive's geometry. As an example; if a > drive with 15 read/write heads is matched against a drive with > only 4 heads, then the bigger drive has access to nearly four > times as much data without performing any head seeks! This is > exactly why DT's access testing methods are not a valid test of > head seeks. The drive with only 4 heads will need to travel a lot > more distance seeking in DT's test. Any access test needs to > insure that an actual head seek across a specific number of tracks > has incurred. The idea is not to measure only the speed of head movement; the idea is to measure the speed of access to data. That speed is affected by head movement speed, rotation speed, and geometry. If two users load the same data on two drives, the user whose drive has more heads, larger tracks, etc. will, other things being equal, have better performance. In other words, "the bigger drive [having] access to [more] data without performing any head seeks" gives it a performance advantage. > The current access time test reads a block and then another block one > Megabyte into the drive, therefore adding to this result the time it takes > to read the two blocks it is seeking to. Steve believes "this > [confounding] is not significant". But in fact it could very > well be, considering the interleave of the drive. The heads may > have found the proper track, but the block was not in position > under the heads. This will add the time it takes to spin the > platter(s) to get the block under the heads. So the problem > described here is testing seeks across so many Megabytes. Without > knowledge of the drive's geometry, this testing method is not > useful and potentially misleading. Rotational latency is part of access time. DiskTimer II avoids penalizing certain geometries by inserting a random time delay between the single-block requests, so that the total latency will approach the drive's average. Unfortunately, as Ephraim Vishniak has pointed out, I introduced a design error when I inserted the random delays, namely synchronization of the requests with the Mac's 60hz clock. But if that were corrected, the access time test would be a reasonably good benchmark for subsystems not providing block cacheing.
ephraim@think.COM (Ephraim Vishniac) (11/13/88)
In article <7625@well.UUCP> brecher@well.UUCP (Steve Brecher) writes: >> MacWeek has found that the timings of DiskTimer II have no >> correlation with real world performance. > >*No* correlation? I missed the article referred to... So did I, but: The current issue of MacWorld (December 1988) has a review of umpteen 20-megabyte drives. They measured performance in three ways: DiskTimer II, their own "reality check" (a single 512Kbyte read), and a series of end-user operations (copying files, launching applications, opening documents). From DiskTimer and the reality check, they calculated data transfer rates. I was very surprised at the level of agreement. DiskTimer was always marginally more optimistic, but the differences were slight. (BTW, for a good laugh note that the graphs on page 134 are labelled in "kilobits/second," with the fastest drive rated at about 2.5Kb/S. Time for a sanity check on those captions...) The important lesson comes in comparing the transfer rate graphs to the "Real-World Performance" graph. In data transfer rates, the drives vary by a factor of 4:1. But by "real-world performance," the range is less than 2:1 and the rankings are different. I've said it before, but I'll say it again: data transfer rate is over-rated. Ephraim Vishniac ephraim@think.com Thinking Machines Corporation / 245 First Street / Cambridge, MA 02142-1214 On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"