eugene@eos.arc.nasa.gov (Eugene Miya) (11/13/90)
In article <9135@ncar.ucar.edu> pack@acd.UCAR.EDU (Dan Packman) writes: >Excellent point. The best benchmark is clearly ones own application >or set of applications including multi-process loads. It might seem like this is true. But it is not. It depends on what you want benchmarks to do. Buying a machine to print checks is one thing. The above is fine. Buying a machine to solve changing fluid dynamics problems is another (moving target). Testing (diagnosing performance) is a third. We know the answer. It's 42. At least those of us who read THAT book. It's how you ask the question. Not knowing how to ask the question results in supercomputers being one generation behind the problems we would really like to solve. So the problem is not quite as clear as one would like. I agree, there is some truth to using specific existing applications, but to ignore flaws invites trouble. This is why SPEC's decision to normalize performance on the DEC VAX-11/780 or say the IBM PC, simply because they are or were common is bad. This is akin to to taking the wind up clock I used to wake up as a youngster and designating it as a time standard. If you think this analogy is bogus, consider taking a common ruler, 12 inches or a meter stick, designating that as THE Meter or THE Foot, those wooden things, then try to make sub-micron lines for your memories or CPUs. You won't produce consistent chips for long. This is called "gold-plating" a metric. Ref: "Foundations of Metrology." We must start at a few more basic building blocks and proceed in a progression. Trying to rush the process will only confuse the issue. --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.
rnovak@mips.COM (Robert E. Novak) (11/14/90)
To clarify my previous comments on SPEC membership: SPEC membership costs: Initiation $10,000 Annual Dues $ 5,000 SPEC Associate: Initiation $2,500 Annual Dues $1,000 To qualify as a SPEC associate, you must be an accredited educational institution or a non-profit organization. An associate has no voting privileges. An associate will receive the newsletter and the benchmark tapes as they are available. In addition, an associate will have early access to benchmarks under development so that an associate may act in an advisory capacity to SPEC. The SPEC tape still costs $699 which includes the cost of a 1 year subscription to the SPEC newsletter. The tape by itself costs $300. There are no discounts for the SPEC tape/newsletter. SPEC c/o Waterside Associates 39510 Paseo Padre Parkway Suite 350 Fremont, CA 94538 415/792-3334 After January 1, 1991 SPEC c/o Waterside Associates 39138 Fremont Blvd. Fremont, CA 94538 Same Phone Number. -- Robert E. Novak Mail Stop 5-10, MIPS Computer Systems, Inc. {ames,decwrl,pyramid}!mips!rnovak 950 DeGuigne Drive, Sunnyvale, CA 94086 rnovak@mips.COM (rnovak%mips.COM@ames.arc.nasa.gov) +1 408 524-7183
eugene@eos.arc.nasa.gov (Eugene Miya) (11/14/90)
Bob-- Did I just see you a while ago? 8^) Must spend all your time posting... I'll add the SPEC address, Perfect, NISTLIB, and a few other things to the FAQ. --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.
lewine@cheshirecat.webo.dg.com (Donald Lewine) (11/15/90)
In article <7581@eos.arc.nasa.gov>, eugene@eos.arc.nasa.gov (Eugene Miya) writes: |> |> This is why SPEC's decision to normalize performance on the DEC VAX-11/780 |> or say the IBM PC, simply because they are or were common is bad. |> This is akin to to taking the wind up clock I used to wake up as a youngster |> and designating it as a time standard. If you think this analogy is |> bogus, consider taking a common ruler, 12 inches or a meter stick, |> designating that as THE Meter or THE Foot, those wooden things, |> then try to make sub-micron lines for your memories or CPUs. But SPEC took a particular VAX-11/780. The 11/780 time for gcc is 1482 seconds. It is not what you get on your particular VAX. This is more like taking a gold bar in Paris and saying that is the standard meter. As look as there is only one gold bar, that is not a problem. I think that 29 SPECmarks is more understandable than saying that the Geometric Mean of the benchmark times is 133.3 seconds. -------------------------------------------------------------------- Donald A. Lewine (508) 870-9008 Voice Data General Corporation (508) 366-0750 FAX 4400 Computer Drive. MS D112A Westboro, MA 01580 U.S.A. uucp: uunet!dg!lewine Internet: lewine@cheshirecat.webo.dg.com
eugene@eos.arc.nasa.gov (Eugene Miya) (11/15/90)
In article <1146@dg.dg.com> uunet!dg!lewine writes: > But SPEC took a particular VAX-11/780. The 11/780 time for > gcc is 1482 seconds. It is not what you get on your particular > VAX. This is more like taking a gold bar in Paris and saying > that is the standard meter. As look as there is only one gold > bar, that is not a problem. Particular: that's right. Two things to add: 1) DEC knew that the performance of 780 models varied by as much as 10%. Is 10% acceptable? In some cases yes, others no. Using your bar analogy (I recall it's really platnium-iridum) that's why I gave the Metrology paper as a reference. The former NBS director used the term "Gold plating." John Mash[ey]@mips.com said "VAX under glass" at one time (Cute, I like it!). Will you use a ruler which maybe as much as 10% off? I think our society is beyond that. That is why the US NIST (was NBS) maintains an atomic clock. A highly instrumented multi-million $$ piece of hardware. Taken to the extreme 2) why a VAX (or PC), well, why not an ENIAC? You don't want an ENIAC for the same reason you won't want a VAX in the future. You are cutting your own long-term throat. That's what a platnium bar is. That's why the NIST uses the frequency of Kp atoms to not only specify, but also length (distance). We must go beyond that. Your H/W engineers use the best oscilloscopes? Right? Yet we software types are in the dark ages. > I think that 29 SPECmarks is more understandable than saying > that the Geometric Mean of the benchmark times is 133.3 > seconds. John Hennessy came today and blasted geometric mean (in favor of weighted arithmetic mean. I will commit to no statistic before its time. My opinion is that we must understand the sample before applying any statistic. I hate to say it: give me raw numbers and then I will think about sending them to S (or BMDP or whatever). About 29 (or 42), I don't think it's the number of benchmarks. I had a talk at one time entitled "The Next 700 Benchmarks." [If you didn't know there have been a string of papers beginning with "The Next 700 Programming Languages."] And in fact Carl Ponder (LLNL) gave a talk about adding benchmark information can just cloud the issue. It's not just the number of measurements or observations you take. --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT. I copy Hennessy's five viewgraphs, I do not think John will mind since he brought Net disucssion up. I generally support most of what he had to say. Only 1-3 were presented, 4 and 5 were left over and covered orally: #1 Some Comments to SPEC + Means for summarizing performance + Choosing benchmarks + Guidelines for running benchmarks [Comment: "You (SPEC members) have a responsibility for what your marketing people say." I agree.] + The dangerous of SPECthroughput [Hennessy expressed worry about ideas cast in concrete. I really fear this as well, and it may be too late.] #2 Why not geometric mean? +Example Absolute time Relative performance M1 M2 M3 M1 M2 M3 B1 5 10 10 1 0.5 0.5 B2 10 5 10 1 2 1 GM 1 1 0.7 B means benchmark, M means Machine, GM is geometric mean +To replace the summary indicated by geometroc mean for M1 and M2: run each 50% of total workload M1 and M3: run B1 57% and B2 43% of total workload! M2 and M3: run B1 43% and B2 57% of total workload! ---------------------------------------------------- #3 Why weighted arithmetic mean + Single weighting yield results proportional to execution time! + Suggested weighting: equal time on base machine. Results: weights for earlier example are 2/3, 1/3. Weighted execution times M1 M2 M3 B1 10/3 20/3 20/3 can also use a weighted harmonic B2 10/3 5/3 10/3 mean [I know this ref to be Worlton] AM 20/3 25/3 30/3 Perf. 1.0 0.8 0.67 --inverse of execution time Not shown but discussed: #4 Choosing benchmarks + Some evaluation procedurs need to be established to choose benchmarks. + These need to focus on questions like: - is this a real program - how many lines constitute the 90% or 95% point -is the input appropriate + How will you know the potential defects before choosing the benchmark? [I have thought of some of these questions and I wish to discuss them and some ideas and will try to prsent them in the coming days and weeks.] #5 Guidelines for running programs + Serious problems can arise because guidelines for running benchmarks (typo) are not precise. [No kidding, this was a point in one SPEC discussion, I am not a SPEC member but was invited. Maybe I should post a few notes or impressions. Basically SPEC is kinda of a good thing; only I wish it had been ANSI instead [some minuses]] +Some examples - what routines can be replaced by libraries? - what are the requirements for runtime checks such as bounds checking and FP exception checks. I should not that I am not innocent, and one of the SPEC benchmarks came from me (and we have serious contraints on running that program, it was renamed).
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (11/15/90)
>>>> On 15 Nov 90 06:54:35 GMT, eugene@eos.arc.nasa.gov (Eugene Miya) said: Eugene> In article <1146@dg.dg.com> uunet!dg!lewine writes: > But SPEC took a particular VAX-11/780. The 11/780 time for > gcc is 1482 seconds. It is not what you get on your particular > VAX. This is more like taking a gold bar in Paris and saying > that is the standard meter. As look as there is only one gold > bar, that is not a problem. Eugene> Particular: that's right. Two things to add: 1) DEC knew that Eugene> the performance of 780 models varied by as much as 10%. Is Eugene> 10% acceptable? In some cases yes, others no. I am surprised that such a sensible person as Eugene would imply that *any* benchmark number had a precision of <10%. I don't believe that it is possible to take any combination of "general-purpose" benchmarks and use that data to predict your application (or workload) performance to within 10%. In fact, it is all to easy to have 10% changes in the performance of your application itself if (as is inevitable) it is run under conditions that differ from the formal benchmark test. Minor changes like operating system or compiler upgrades, changes in the system background load, or even disk fragmentation can produce 10% changes in wall-clock time quite easily.... So how precise do I think the numbers are? Well, with 6 or so years of experience in performance evaluation of supercomputers and high-performance workstations, I can generally [i.e., not always] estimate the performance of my codes to within 20-25% based on a broad suite of benchmark results (LINPACK 100x100, LINPACK 1000x1000, Livermore Loops, hardware description with cycle counts, and maybe a bit more). (I deliberately ignore PERFECT since the one code that I know in some detail [the ocean model from GFDL/Princeton] is a mess, and I would not blame a compiler at all for having trouble vectorizing or optimizing it [or even understanding what it is supposed to be doing!]). -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@vax1.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
grover@brahmand.Eng.Sun.COM (Vinod Grover) (11/16/90)
In article <7589@eos.arc.nasa.gov> eugene@eos.UUCP (Eugene Miya) writes: >[If you didn't know there have been a string of papers beginning with >"The Next 700 Programming Languages."] I am aware of Peter Landin and F.L. Morris' papers, are there others? Would you mind posting them? Thanks Vinod Grover Sun Microsystems
rafael@ucbarpa.Berkeley.EDU (Rafael H. Saavedra-Barrera) (11/16/90)
In article <7581@eos.arc.nasa.gov> ucbvax!agate!shelby!eos!eugene writes: > Particular: that's right. Two things to add: 1) DEC knew that the > performance of 780 models varied by as much as 10%. Is 10% acceptable? > In some cases yes, others no. ... Gene, you are completely missing the point. The 10% variation on the VAX 780 has nothing to do with defining a unit of performance and using it to measure. If you look at the SPEC reports, you'll notice that they use the SAME execution times for the reference machine in all reports and for all machines. There is no 10% variation. What is needed is something we can use to measure and that everyone uses period. From the Enciclopedia Britannica: Measuring a quantity means acertaining its ratio to some other fixed quantity of the same kind, known as the unit of that kind of quantity. A unit is an *abstract conception*, defined either by reference to some arbitrary material or to natural phenomena. The key words are: ratio, fixed, and arbitrary. The SPEC people made a reasonable, but arbitrary definition of what represents their fixed quantity. Once you have done that, everything else follows, and there is no nothing else to discuss. All you require from a unit of measurement is: 1) that it is fixed; 2) that has validity for some significant group of people; 3) that can be verified. The SPECratio certainly satisfies the 3 conditions. As long as the SPEC people keep the *same* VAX 780, with the same software, and run the programs under the same conditions, there is nothing to object. I don't know if they are doing this, but they should, if they want to avoid problems in the future. > Taken to the extreme 2) why a VAX (or PC), well, why not an ENIAC? > You don't want an ENIAC for the same reason you won't want a VAX in the > future. You are cutting your own long-term throat. Wrong! The SPEC people didn't choose an ENIAC, because there are no ENIACS that can be used to run benchmarks and very few people alive ever use an ENIAC. Why not a CRAY, because it is very expensive to keep a CRAY in glass (same software, etc) just to run benchmarks in it. But in priciple an ENIAC, a PC, or a CRAY is as good as a VAX 780 for the purposes of what the SPEC people are doing. One of the nice properties of the SPECmark is that is INVARIANT to the machine you use as your reference point. The relative performance between a MIPS M/2000 and a Sparcstation I, or between a DEC 3100 and a IBM RS/6000 530 is the same, independent of whether you use a VAX 780, an ENIAC, or any other machine. So the issue of using a 780 dissapears. Your arguments sound very similar to the ones, one of your fictitious ancestors the Marquis Eugene De La Milla made, with respect to the use of an Earth's quadrant to define the meter in 1790. He asked, why use a quadrant of the earth as reference when there are bigger planets, that may be more relevant to future generations of human beings? Why 1/10,000,000th of the quadrant instead of 1/3,141,592th which looks more like pi? Where is te center of Paris? The center of the Ille de la Cite, or Notre Dame? [I bet you didn't know you has a french ancestor]. Do you know how the french measured the particular quadrant of the earth from the North Pole to the Equator and passes through Paris, one hundred years before the first man reached the North Pole? Did it make a difference? There are a lot of more interesting questions to ask about the SPEC benchmarks. For example, What does each program measures? Are the programs really exercising different aspects of the machine? How representative is matrix300 of typical linear algebra codes? Can a clever compiler writer make minimal changes to the compiler that will improve significantly the SPECmark for a particular machine, but will have a marginal benefit in most users' workloads? How do I estimate the performance of my workload by looking at the SPEC results? Why is the SPECratio of spice2g6 low on most machines when other double precision codes have better performance? Is the geometric mean a good statistic? This are a few questions I like to know the answer (some I know). > About 29 (or 42), I don't think it's the number of benchmarks. > I had a talk at one time entitled "The Next 700 Benchmarks." > [If you didn't know there have been a string of papers beginning with > "The Next 700 Programming Languages."] And in fact Carl Ponder (LLNL) > gave a talk about adding benchmark information can just cloud the issue. > It's not just the number of measurements or observations you take. I don't agree. 29 benchmarks are better than 10 benchmarks, *if* the 29 benchmarks are well chosen. Every benchmark represents an empirical observation of the performance of the machine. More observations are better than few, especially after seeing the results for the Stardent 3010. Here all benchmarks have SPECratios between 14.7 and 62.9, except matrix300 that has a ratio of 108.5! Is this an isolated point or are there many more programs that give similar results? However, you are right in saying that everytime we add a new benchmark we have to know what it measures and why we are including it? What new information it provides? I like the SPEC methodology for measuring SPECmarks, but I agree with J. Hennessy about the SPECthruput, the SPEC guys erred here. I don't agree with J. Hennessy that the weighted arithmetic mean (WAM) is better than the geometric mean for the SPEC benchmarks, but I agree with him when he says that the WAM is the correct statistic to use in the example he presented. I am contradicting myself? No, the two problems are different and therefore different statistics should be used. More on this later. rafael
lewine@cheshirecat.webo.dg.com (Donald Lewine) (11/16/90)
In article <7589@eos.arc.nasa.gov>, eugene@eos.arc.nasa.gov (Eugene Miya) writes: |> In article <1146@dg.dg.com> uunet!dg!lewine writes: |> > But SPEC took a particular VAX-11/780. The 11/780 time for |> > gcc is 1482 seconds. It is not what you get on your particular |> > VAX. This is more like taking a gold bar in Paris and saying |> > that is the standard meter. As look as there is only one gold |> > bar, that is not a problem. |> |> Particular: that's right. Two things to add: 1) DEC knew that the |> performance of 780 models varied by as much as 10%. Is 10% acceptable? |> In some cases yes, others no. You are still missing the point. Here are the reference execution times for the SPECmarks: gcc 1482 Seconds expresso 2266 Seconds spice 2g6 23951 Seconds doduc 1863 Seconds nasa7 20093 Seconds li 6026 Seconds eqntott 1101 Seconds matrix300 4525 Seconds fpppp 3038 Seconds tomcatv 2649 Seconds These numbers will not vary by 10%. They will not vary by .001%. They are fixed. Ignore the fact that they were measured by someone on a VAX someplace. They are now the definition of 1.000 SPECmarks of performance. Maybe the 11/780 was not a good choice for a base, but at this point it does not matter. The reference is *NOT* the 780 but the numbers listed above. THERE IS NOT VARIATION! -- -------------------------------------------------------------------- Donald A. Lewine (508) 870-9008 Voice Data General Corporation (508) 366-0750 FAX 4400 Computer Drive. MS D112A Westboro, MA 01580 U.S.A. uucp: uunet!dg!lewine Internet: lewine@cheshirecat.webo.dg.com
rosenkra@convex.com (William Rosencranz) (11/20/90)
--- i dunno, maybe i am just daft, so ignore this if you beg to differ. it is not meant to offend, so if you read something into it, pls reread. it is also my opinion, not that of my employer... i have been reading this newsgroup for a week or so, and SPECmark is the current hot topic. i am a bit confused over some of the issues raised, so maybe i'll raise some of my own. first off: what are SPEC ratings (or any standard bm ratings for that matter) meant to do? answer this question in your mind first before proceeding... i really see no point whatsoever in relating an execution time on one machine to that of another "standard" machine, no matter how standard, (except possibly the old "that's the way we've ALWAYS done it before", e.g. "MIPS"), just to come up with some single "standard" unit of performance. if I were buying (instead of selling :-), i'd want to see wallclock and cpu times, because i, as a human being, can relate to time far easier the "SPECs" or whatever. if something runs in 10 seconds, compared to 100 seconds, i know i can sit and wait, call it "interactive". if something runs in 10 min vs 1 hour, i know i can go out to lunch in the latter case. a SPEC of 1.345 vs a SPEC of 4.345 means nothing, until i translate to time anyway. time is easier to "heft", as it were. further, i'd want to see how the "standard" bm results scale with problem size, especially on cache-based memory systems. because a buy decision based on a single number could come back to haunt me. i'd also want to know what sort of performance enhancements i could expect if i wanted to put 1 hour, 1 day, and 1 week's effort into the optimization of any particular code, if possible. i'd also want to compare a vendor's peak performance with how well it did on standard bm's or on my own. finally, i'd want to see what sort of support i can expect from the vendor. granted, pre-sales and post-sales activities can vary greatly, but i think i can shake out a vendor during the sales cycle, as most saavy buyers can. why the need for complication, other than perhaps marketing fog? and believe me, if i see 2 or 3 systems with uni-number ratings within say 5% of each other, i sure as heck would not say "these machines are identical, so let's buy the cheaper one becasue it has better price/SPECperformance". i'd want to look at the raw data anyway, and probably run my sort of workload on them to really get an idea of what i can expect. similarly, if i see two machines that differ by alot in some particular individual tests, i' want to know why. in fact, unless i expect to buy a machine to do just one job (or one job at a time), i would more than likely ignore these uni-job ratings altogether, since, from my experience, in "real life", multi-job thruput is where productivity gains are made, and is where strengths and weaknesses in architectures (e.g. cache vs widely interleaved memory) are really determined anyway (in many, if not most cases). probably without exception, the SPEC'd machines are general purpose systems, especially workstations, which would get lots of differnt tasks from text processing to dbms to finite element analysis to ... the basic problem i see with these uni-number ratings is that people can make up their minds, even subconsciously, based on a first impression. this is human nature. you always have that in the back of your mind. and it is easy to just say "2 > 1.5" rather than "based on some real workload, and on problem size, and on vendor support, and on application availability, and on whatever, 2 is not necessarily > 1.5". distilling machine performance down to one number tends to make it easy to abuse it, to misrepresent it. if in fact these sorts of performance quotients are (good faith?) attempts to enlighten, then why not enlighten thru education rather than simplification? surely we can give more credit to the intellect of people making buy decisions than that? why not a "SPECparagrah" that sheds more light? consider this my entry in the standard bm sweepstakes :-). please don't argue the merits of standards. i am well aware of the risks an benefits therein. i also know that shopping for supercomputers is different that shopping for workstations, though in my mind buying 100 w/s at $20k a pop is still spending $2M and it might be better to buy 100 w/s at $10k and a central system at $1M with my $2M. the SPEC numbers in no way help me here, i think. having spent the last 15 years dealing with supercomputers, and only 5 or 6 with workstations and pc's, i am somewhat biased, i suppose, though i like to at least think i have an open mind about these sorts of issues. personally, i think i'll wait for the SPECthroughput bm... -bill rosenkranz rosenkra@convex.com -- Bill Rosenkranz |UUCP: {uunet,texsun}!convex!c1yankee!rosenkra Convex Computer Corp. |ARPA: rosenkra%c1yankee@convex.com
eugene@eos.arc.nasa.gov (Eugene Miya) (11/21/90)
Please excuse my delay time between postings. I am not at my office (trying to benchmark from a remote site). I saw Rafael's post and the others about reference. I'd like to address that, but this news systems only keeps articles 3 days. --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.
de5@ornl.gov (Dave Sill) (11/21/90)
In article <108988@convex.convex.com>, rosenkra@convex.com (William Rosencranz) writes: > >first off: what are SPEC ratings (or any standard bm ratings for that >matter) meant to do? answer this question in your mind first before >proceeding... They're meant to make it possible to get some idea of the performance one can expect a system to provide, without requiring that one observe the performance directly. >i really see no point whatsoever in relating an execution time on one >machine to that of another "standard" machine, no matter how standard, >(except possibly the old "that's the way we've ALWAYS done it before", >e.g. "MIPS"), just to come up with some single "standard" unit of >performance. The reason for relating performance on an unknown system to that of a known one is to give the numbers some relevance. If you tell me your system does 15 gigafloogles/second, that tells me nothing unless I know what a floogle is. But if you tell me your system scored 11.2 SPECfloogles, I can get a handle on whether 15 GF/s is fast or not, at least if I have any VAX experience--or another machine whos SPECfloogle score I know. >if I were buying (instead of selling :-), i'd want to see wallclock and >cpu times, because i, as a human being, can relate to time far easier >the "SPECs" or whatever. Sure, but the absolute wall clock isn't going to tell you anything. It when you compare the values for different systems that gain information from the results. So what if the floogle benchmark runs in 1:26? That means nothing. Give me a list of floogle times, and I'll probably normalize them on some machine I'm familiar with (or maybe the slowest machine in the list). It's the relative performance that's important. >if something runs in 10 seconds, compared to >100 seconds, i know i can sit and wait, call it "interactive". if >something runs in 10 min vs 1 hour, i know i can go out to lunch in the >latter case. a SPEC of 1.345 vs a SPEC of 4.345 means nothing, until >i translate to time anyway. time is easier to "heft", as it were. Only if you are intimately familiar with what's being done. What's the difference between 10 minutes versus 100 minutes and 10 SPECfloogles versus 100 SPECfloogles? Both indicate the same relative performace, and both measure the same absolute performance. The difference is that with the former you need two numbers to compare, but with the latter you have the built-in VAX value: 10 SPECfloogles is 10 times faster than SPEC's VAX 11/780. Not perfect, but better than nothing. >further, i'd want to see how the "standard" bm results scale with >problem size, especially on cache-based memory systems. because a >buy decision based on a single number could come back to haunt me. This is a valid point, but has nothing to do with whether wall clock or relative-to-known values are reported. >i'd also want to know what sort of performance enhancements i could >expect if i wanted to put 1 hour, 1 day, and 1 week's effort into >the optimization of any particular code, if possible. Lotsa luck. I don't know of any benchmarks that attempt to anticipate what gains could be made by optimization, by you or anyone else. >i'd also want to compare a vendor's peak performance with how well >it did on standard bm's or on my own. Just ask the vendors, they'll be glad to give you peak performance figures. :-) >finally, i'd want to see what sort of support i can expect from the >vendor. granted, pre-sales and post-sales activities can vary >greatly, but i think i can shake out a vendor during the sales >cycle, as most saavy buyers can. This isn't a benchmarking issue at all. Benchmarking can't and shouldn't attempt to prevent the foolish buyer from buying foolishly. Raw performance is just one criterion that should be part of a procurement effort. >and >believe me, if i see 2 or 3 systems with uni-number ratings within >say 5% of each other, i sure as heck would not say "these machines >are identical, so let's buy the cheaper one becasue it has better >price/SPECperformance". I couldn't agree more. >i'd want to look at the raw data anyway, and >probably run my sort of workload on them to really get an idea of >what i can expect. SPEC provides the real data. The SPECmark is just a handy single figure of merit. Better that than dhrystone-mips. As for testing them yourself: have at it. Sometimes that's not feasible, and that's what benchmarks are for. >similarly, if i see two machines that differ by >alot in some particular individual tests, i' want to know why. Again, I agree. But identifying the reason is not benchmarking issue. Identifying the difference *is*. >the basic problem i see with these uni-number ratings is that >people can make up their minds, even subconsciously, based on a >first impression. So what do you propose? Outlawing single figures of merit? Better to have one subject to much scrutiny and well-understood than to have something ad hoc, unreliable, informal, etc. >this is human nature. you always have that in >the back of your mind. and it is easy to just say "2 > 1.5" >rather than "based on some real workload, and on problem size, >and on vendor support, and on application availability, and on >whatever, 2 is not necessarily > 1.5". No, 2 *is* greater than 1.5. Always. The problem is that there may be more important issues that aren't so easily quantifiable. >surely we can give more credit to the intellect of people making >buy decisions than that? You're the one who seems to think people are going to base their decisions solely on a SFM. The detailed data is available to those who want it. I don't mean to come across as some kind of SPEC apologist, I just think what they're doing is better than what was done before they existed. -- Dave Sill (de5@ornl.gov) Martin Marietta Energy Systems Workstation Support