gordoni@chook.adelaide.edu.au (Gordon Irlam) (04/05/91)
When running the SPEC 1.2 benchmarks the times reported are supposed to be "the best times that can be consistently reproduced". How should the word "consistently" be interpreted? As meaning "nearly always". Eg. "We ran the benchmark and consistently got a time of less than 90.6 seconds." Or as meaning "more than once". Eg. "One in every 7 runs gives a time of 90.6 seconds or better, and thus we are able to consistently reproduce the reported time." Is it permitted to run a benchmark several times in succession, rather than cycle through them in order if this gives better results? (For instance the operating system might dynamically tune its memory management algorithms in response to the decreased demand for new pages). Running benchrun results in 3 sh's, make, and tee occupying memory as well as the benchmark. On a system with little memory this could result in some extra paging when the benchmark first starts to run. In addition a tee is likely to cause several undesirable context switches. Is it permissible to run the benchmark from make? What about typing in the time command directly to the shell? (This would prevent a cp, which might make it easier for some operating systems to keep the executable in memory. It would also prevent the cost of scheduling the write back of the files that would have been cp'ed - update is likely to perform a sync while the benchmark is running. For gcc we are talking of about writing back roughly 1M)? What about typing in the cp, and then waiting 30 seconds before starting the benchmark? If you are very clever you could time when make started running so that a sync occurs just before the cp has finished, but before the benchmark starts running. Is this allowed? Are the results always required to be written out to a file, or once the results have been validated, can /dev/null be used. Should the SPECmark figure be computed using infinite precision arithmetic, or should the rounded SPECratios of each benchmark be used? Which of the above possibilities, while stricly permitted would be considered to depart from the honest, cooperative spirit in which SPEC was established? Which of the above possibilities do most vendors employ? Thanks for any help, Gordon Irlam (gordoni@cs.adelaide.edu.au)
lewine@cheshirecat.webo.dg.com (Donald Lewine) (04/06/91)
In article <2767@sirius.ucs.adelaide.edu.au>, gordoni@chook.adelaide.edu.au (Gordon Irlam) writes: |> When running the SPEC 1.2 benchmarks the times reported are supposed to |> be "the best times that can be consistently reproduced". [[Many ways to cheat deleted]] |> Which of the above possibilities do most vendors employ? |> It is fair game for a customer to say, "I want you to reproduce your claimed SPECmark numbers on *my* configuration as part of the acceptance test." Vendors must be able to say, "No problem" or point out that the SPECmark numbers cannot be reproduced because the customer is buying a smaller configuration than the published one. In any event, the vendor must be able to tell the customer what SPECmark (or any other) benchmark result to expect. The customer should insist on getting that result prior to paying. Enough customers do insist that vendors tend to live by the rules and don't send output to /dev/null or do other tricks they would be unwilling to repeat in front of a customer. On the other hand, Data General uses two different C compilers. The best compiler for gcc is not the best compiler for expresso. We document which compiler is used for which benchmark. One would need to buy both compilers to reproduce our exact results. -------------------------------------------------------------------- Donald A. Lewine (508) 870-9008 Voice Data General Corporation (508) 366-0750 FAX 4400 Computer Drive. MS D112A Westboro, MA 01580 U.S.A. uucp: uunet!dg!lewine Internet: lewine@cheshirecat.webo.dg.com
jvm@hpfcso.FC.HP.COM (Jack McClurg) (04/07/91)
Without answering the questions asked here ( I think that you have an excellent series of questions; I hesitate to answer, since my answer may be thought to be "official" and it is not; Some reported results probably have used some of the techniques described; BTW, which do you think are legitimate?), I merely point out that SPEC distributes the source code to the SPEC benchmarks so that the results printed in the SPEC newsletter may be reproduced. Also, the results summary page should have complete configuration information and notes sufficient to ensure reproducibility. If you have difficulty duplicating results, call the vendor for assistance, and then tell us what transpired. This should include instances where your results were better than what the vendor claimed. Jack McClurg jack_mcclurg@fc.hp.com 303-229-2126 Chair, SPEC Steering Committee