gerry@zds-ux.UUCP (Gerry Gleason) (02/23/90)
I have just been going through a bunch of marketing hype for Neal Nelson. He claims that his "Business Benchmark" measures how well machines perform on "tasks like word processing, spread sheets, database management, accounting, programming and CAD," but I have never seen anything that backs this up with analysis or real data. Also in the package are quite a few reprints that prominently feature these benchmarks, including several saying RISC is not much of a win based on his benchmarks. (Federal Computer Week, "Tests Challenge Old RISC, CISC Notions; EE Times, "CISC beats RISC in test"; Computerworld, "Unearthing RISC worms") The EE Times article has the results for his Test 5 (Short integer math) showing the Sun-3 to be ~10% faster than a Sun-4, which leads me to believe that the benchmark is bogus. I thought EE Times was a pretty good publication, but the article does not even ask the question of what the benchmark is really measuring. I was hoping that someone has already done some analysis of these benchmarks, and can confirm my suspicion that these test not only are bogus, but don't even measure what they claim to. Unfortunately, at least some important fraction of the market uses these benchmarks to evaluate products, so many of us must apply them to our products even though we suspect them of being misleading. If they really are bogus, what can be done to publicly discredit them, so further harm is not done? Gerry Gleason Gerry Gleason
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/23/90)
In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: | I have just been going through a bunch of marketing hype for Neal | Nelson. He claims that his "Business Benchmark" measures how | well machines perform on "tasks like word processing, spread sheets, | database management, accounting, programming and CAD," but I have | never seen anything that backs this up with analysis or real data. I've been doing benchmarks for years (about 25) and I will say that used carefully I am pretty happy with the NN suite. I have run extensive test and live loads on machines he has tested, and my results are close to his. The secret to any benchmark is using it to predict the future, and depends on both how well the benchmarks represent your load, and how well you *think* the benchmarks represents your load. I usually suggest that NN be used to select a few final machines for additional testing, which is about all I claim for my own suite. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me
steves@conan.SanDiego.NCR.COM (Steve Schlesinger) (02/24/90)
In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: >I have just been going through a bunch of marketing hype for Neal >Nelson. He claims that his "Business Benchmark" measures how >well machines perform on "tasks like word processing, spread sheets, >database management, accounting, programming and CAD," but I have >never seen anything that backs this up with analysis or real data. > > [ paragraph deleted ] > >I was hoping that someone has already done some analysis of these >benchmarks, and can confirm my suspicion that these test not only >are bogus, but don't even measure what they claim to. Unfortunately, >at least some important fraction of the market uses these benchmarks >to evaluate products, so many of us must apply them to our products >even though we suspect them of being misleading. If they really are >bogus, what can be done to publicly discredit them, so further harm >is not done? > >Gerry Gleason *************************************************************** The following is only my opinion * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * My company is a licensee of the Neal Nelson Benchmarks. I have been involved in benchmarking and performance evaluation for many years. I do not have a very high opinion of the NN Benchmarks. NN is covered by a very strict licensing agreement. The results of running the benchmarks cannot be published, ie. a licensee cannot publicly reveal the individual or composite results of the benchmarks. You cannot say my system ran 70 gazillion dhrystones, 80 gathousand Linpacks and 22 on the NN suite. (Heck, another division of my company was also a licensee and they couldn't event tell us their raw results!! Oh yes, the license agreement only permits the source to be on a single machine on a single site.) All you can say is that your system was Y times the performance of Fasta Computer - Model A on the suite. How do you know this, if Fasta Computer didn't publish their numbers ? NN tells you this as part of your license agreement. You report your numbers back to them and they give you the relative numbers of a specified number of other systems. If you want more data, you pay NN more $$. On the technical side, the benchmarks are **VERY** simple. I cannot reveal the details under the license agreement. The only good thing is that you can run multiple copies of the suite in parallel fairly easily. The computational benchmarks are of two types: arithmetic for different data types and memory moves. The arithmetic ones over emphasize the frequency of multiply and divide relative to plus and minus in real programs. This explains some of the article mentioned where a CISC machine (Sun3 with 68020 and 68881 or other FP silicon) "beat" a RISC machine (Sun4 with SPARC which doesn't have multiply/divide for integer arithmetic). The memory move tests will show how the cache/memory perform FOR ONE SPECIFIC TYPE OF MOVES. The disk I/O tests are not as idiosyncratic as the processor memory tests, but they are **VERY** simple. What really bothered me about the tests was the "C" coding style. Yes, I know this doesn't necessarily mean the benchmarks are not meaningful. The code looked like it had originally been written in Cobol, then translated line for line into "C". The program structure (what there was of it) didn't look anything like any "C" program anywhere. It said to me the author had little experience programming in "C". One effect of the coding style, was that it made it difficult to optimize. This can be seen two ways: one, is that it makes the suite more accurate, since it removes the variable of compiler optimization from System comparisons (a "notorious" problem with Dhrystone - especially 1.1). The other is that it makes it less accurate since the compiler's ability to optimize real code is an important attribute of a system. I tend to the second view. But, since NN promotes his suite as a measure of system performance it seems to me the code should test the optimizer. My recommendation is not to pay much attention to anything you see published about results on the NN suite. I place it slightly below dhrystone in overall usability. You can get similar results by spending several hours enhancing the public domain byte benchmarks. I look forward to a future suite from SPEC that will include system I/O tests. If these benchmarks are anything like SPEC Release 1.0, the NN suite will be technically obsolete. I admire NN's ability as a business person. He saw a need and filled it. Most technically naive system buyers don't understand the performance data that floats around. Observe the constant misunderstandings on comp.arch about what a "mip" is. We techies seldom agree on much. NN created a simple set of tests that could be explained to Joe Naiveuser. Joe N. could understand the marketing hype of the test and it gave him confidence in his computer purchase. Steve Schlesinger * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The preceding is only my opinion It does not reflect the opinion of my employer or any other person or organization. ***************************************************************
wsd@cs.brown.edu (Wm. Scott `Spot' Draves) (02/24/90)
In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
Path: brunix!uunet!zds-ux!gerry
From: gerry@zds-ux.UUCP (Gerry Gleason)
Newsgroups: comp.arch
Summary: Do they measure anything?
Date: 22 Feb 90 17:19:19 GMT
Reply-To: gerry@zds-ux.UUCP (Gerry Gleason)
Organization: Zenith Data Systems
Lines: 29
I have just been going through a bunch of marketing hype for Neal
...
The EE Times
article has the results for his Test 5 (Short integer math) showing
the Sun-3 to be ~10% faster than a Sun-4, which leads me to believe
that the benchmark is bogus.
...
Gerry Gleason
This may very well be accurate due to the SPARC's lack of integer
divide.
I would, however, seriously question a benchmark that claims to
measure performance of a certain class of applications
(business/personal productivity in this case), and one of the tests is
a very low-level, MIPS sort of rating.
Scott Draves Space... The Final Frontier
wsd@cs.brown.edu
uunet!brunix!wsd
Box 2555 Brown U Prov RI 02912
amir@smsc.sony.com (Amir ) (02/24/90)
In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: >I have just been going through a bunch of marketing hype for Neal >Nelson. He claims that his "Business Benchmark" measures how >well machines perform on "tasks like word processing, spread sheets, >database management, accounting, programming and CAD," but I have >never seen anything that backs this up with analysis or real data. Quite true. I have seen a lot of bad benchmarks in my time but this is one of the worst. >Also in the package are quite a few reprints that prominently >feature these benchmarks, including several saying RISC is not >much of a win based on his benchmarks. (Federal Computer Week, >"Tests Challenge Old RISC, CISC Notions; EE Times, "CISC beats RISC >in test"; Computerworld, "Unearthing RISC worms") The EE Times >article has the results for his Test 5 (Short integer math) showing >the Sun-3 to be ~10% faster than a Sun-4, which leads me to believe >that the benchmark is bogus. I thought EE Times was a pretty good >publication, but the article does not even ask the question of what >the benchmark is really measuring. I was quite surprised too but it sort of made sense. There were a lot of people who were trying hard for a reason to discredit RISC. >I was hoping that someone has already done some analysis of these >benchmarks, and can confirm my suspicion that these test not only >are bogus, but don't even measure what they claim to. Unfortunately, >at least some important fraction of the market uses these benchmarks >to evaluate products, so many of us must apply them to our products >even though we suspect them of being misleading. If they really are >bogus, what can be done to publicly discredit them, so further harm >is not done? It is really bogus! I had the opportunity to meet Mr. Nelson before he went public with his benchmark. The story that he gave goes as follows: He had written/ported an accounting package to an old Unix box (don't remember which now -- back in 82-84). Then his client bought what he thought was a faster machine and to his surprise, Mr. Nelson's package actually ran slower. So, he started to analyze the problem and this led to his infamous benchmark. As for the contents of the package, I had pretty strong disagreements with him. He takes a simple benchmark that measures something very small (e.g. speed of add operating) and runs multiple copies to simulate "multi-user" response. Then he does the same thing for another simple operationg (e.g. multiply) and so on. So, almost all of his arithmetic tests show the same linear slow down (unless you run out of memory). The test is showing the context switch overhead and not add/multiply times... Then there is the "sync" test. Apparently, his package used to do a lot of unneeded syncs so, he tests how fast you can do syncs. First one copy, then 2 then... Well, you get the idea. Even he agreed that this was stupid (as sync most systems returns immediately before the data is written to disk anyways). But last time I looked, it was still in there. Also, once you look at the source of the benchmark, you'll realize that he is not much of programmer either. It must have more "goto"s and labels than any other C program that I have ever seen. It looks more like a decomplation of an assembler program than anything else. There are numerous other flaws in there that I won't go into now. >Gerry Gleason To be fair, the benchmark, like any other, does generate a set of "data points" that can be useful. What angers me, is that the results are interpolated to mean performance of the system for "business" applications. Since there are no other programs that claim to do this, the benchmark has become fairly popular.... -- Amir H. Majidimehr Operating Systems Group Sony Microsystems amir@smsc.sony.com | ...!{uunet,mips}!sonyusa!amir
steves@ivory.SanDiego.NCR.COM (Steve Schlesinger x2150) (02/28/90)
I posted some comments on this subject last week. In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: >I have just been going through a bunch of marketing hype for Neal >Nelson. He claims that his "Business Benchmark" measures how >well machines perform on "tasks like word processing, spread sheets, >database management, accounting, programming and CAD," but I have ***************************************************************** The following is my opinion * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * This true in the sense that the named applications do add, subtract, multiply, divide, memory move, compare and disk I/O and the NN benchmarks also do these operations. Steve Schlesinger * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The preceding is my opinion and does not reflect the opinion of my employer or any other organization. ***************************************************************** >never seen anything that backs this up with analysis or real data. > >Also in the package are quite a few reprints that prominently >feature these benchmarks, including several saying RISC is not >much of a win based on his benchmarks. (Federal Computer Week, >"Tests Challenge Old RISC, CISC Notions; EE Times, "CISC beats RISC >in test"; Computerworld, "Unearthing RISC worms") The EE Times >article has the results for his Test 5 (Short integer math) showing >the Sun-3 to be ~10% faster than a Sun-4, which leads me to believe >that the benchmark is bogus. I thought EE Times was a pretty good >publication, but the article does not even ask the question of what >the benchmark is really measuring. > >I was hoping that someone has already done some analysis of these >benchmarks, and can confirm my suspicion that these test not only >are bogus, but don't even measure what they claim to. Unfortunately, >at least some important fraction of the market uses these benchmarks >to evaluate products, so many of us must apply them to our products >even though we suspect them of being misleading. If they really are >bogus, what can be done to publicly discredit them, so further harm >is not done? > >Gerry Gleason ***************************************************************** No disclaimer on this part * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * A while ago, NCR posted the System Characterization Benchmark on "net.sources." It measures similar things to the NN benchmark, but is better and is free. As an added bonus, you can read the source and decide for yourself what the results really mean. It is not perfect (useful comments will be forwarded to the author, flames to /dev/null). My advice to anyone looking for benchmarks is first to use the applications you currently run, then use SPEC data for computational results (with your own weightings) and the SCB. Steve :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: steve schlesinger steve.schlesinger@sandiego.ncr.com 619-485-2150 NCR - 4010, 16550 W Bernardo Dr, San Diego, CA 92127 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/01/90)
In article <231@iss-rb.SanDiego.NCR.COM> steves@ivory.SanDiego.NCR.COM (Steve Schlesinger x2150) writes: >A while ago, NCR posted the System Characterization Benchmark >on "net.sources." I wonder if you could post the location of the benchmark on well known archive sites such as uunet or wherever? Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)604-6117
ps@fps.com (Patricia Shanahan) (03/01/90)
In article <231@iss-rb.SanDiego.NCR.COM> steves@ivory.SanDiego.NCR.COM (Steve Schlesinger x2150) writes: ... > >My advice to anyone looking for benchmarks is first to use the >applications you currently run, then use SPEC data for computational >results (with your own weightings) and the SCB. > >Steve >:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: >steve schlesinger steve.schlesinger@sandiego.ncr.com >619-485-2150 NCR - 4010, 16550 W Bernardo Dr, San Diego, CA 92127 >:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: I would reverse the order here. Use "standard" benchmarks to decide which systems are likely to be useful enough to you to be worth really measuring, then use multiple jobs of the types you are really going to run to measure the value of each of those systems to you. The main value I see in standard benchmarks is that you can get numbers for them a lot quicker and cheaper than a full scale benchmarking excersize using actual jobs. I do think the SPEC approach is likely to be more robust in the face of architecture changes than the more abstract benchmark approaches. The real problem with abstract benchmarks is that future architecture changes can make whatever was changed in doing the abstraction critically important. For example, the whetstone benchmark was designed to measure, among other things, the performance of array references that were observed to be a significant component of scientific computing. At the time is was not obvious that vector length mattered, so the whetstone only tests arrays of length 4. By definition an abstract benchmark differs from the real jobs that it models in some ways. Those differences, for a well-designed benchmark, will not be significant for CURRENT architecture and compiler technology. On the other hand, if you select real jobs, they have a better chance of looking like real jobs in aspects that are unimportant to current systems but that may be critical to performance on future systems. -- Patricia Shanahan ps@fps.com uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps phone: (619) 271-9940
kaul@icarus.eng.ohio-state.edu (Rich Kaul) (03/01/90)
In article <43902@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: In article <231@iss-rb.SanDiego.NCR.COM> steves@ivory.SanDiego.NCR.COM (Steve Schlesinger x2150) writes: >A while ago, NCR posted the System Characterization Benchmark >on "net.sources." I wonder if you could post the location of the benchmark on well known archive sites such as uunet or wherever? You can find it on cheops.cis.ohio-state.edu [128.146.8.62] in pub/net.sources/ncrscb.[1-4].Z. -rich -=- Rich Kaul | "Horse sense is what keeps horses from kaul@icarus.eng.ohio-state.edu | betting on what people will do." or ...!osu-cis!kaul | -Damon Runyon
pb@idca.tds.PHILIPS.nl (P. Brouwer) (03/01/90)
In article <2557@ncr-sd.SanDiego.NCR.COM> steves@conan.SanDiego.NCR.COM (Steve Schlesinger) writes: >In article <196@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: >>I have just been going through a bunch of marketing hype for Neal >>Nelson. He claims that his "Business Benchmark" measures how >> [ paragraph deleted ] >> >>Gerry Gleason > >*************************************************************** > The following is only my opinion >* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * This is valid for me too!!!!!!!! > > I do not have a very high opinion of the NN Benchmarks. I do think so to , and to add another argument to the ones mentinioned in the previous postings: All times measured in the benchmark are measured with a 1 second resolution. This means that fasts tests that will only take a few seconds will have a bad accuracy. So when you see comparison between machines this this into account. For instance text x takes 5 seconds on machine A and 7 on machine B. The difference is (7 - 5 ) / 7 = 28.6 % This might be (7.99 - 5 ) / 7.99 = 37.4 % or (7 - 5.99 ) / 7 = 14.4 % Draw your own conclusions: > > >* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > The preceding is only my opinion > It does not reflect the opinion of my employer or > any other person or organization. >*************************************************************** Again this is valid for me as well. -- Peter Brouwer, # Philips Telecommunications and Data Systems, NET : pb@idca.tds.philips.nl # Department SSP-P9000 Building V2, UUCP : ....!mcvax!philapd!pb # P.O.Box 245, 7300AE Apeldoorn, The Netherlands. PHONE:ext [+31] [0]55 432523, #