eugene@eos.arc.nasa.gov (Eugene Miya) (11/16/90)
In article <MCCALPIN.90Nov15090025@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: I noted 10% performance variation in VAXen. >Eugene> 10% acceptable? In some cases yes, others no. John> John>I am surprised that such a sensible person as Eugene would imply that John>*any* benchmark number had a precision of <10%. I don't believe that John>it is possible to take any combination of "general-purpose" benchmarks I'm sorry, let me make a clarification. If you check the original post, I mentioned nothing about benchmarks. The performance variation measured was strictly cycle time. If you want the start of the references, let me know. Me sensible? I'm honored. I think we both know that some aspects of benchmarking are counter-intuitive. We have to challenge a few fundamental ideas. P.S. What's "general purpose?" And, you hit a key word which I forgot to mention when posting last evening "combination." That's significant. It only occurred to me after I got home, and you remind me here. John>and use that data to predict your application (or workload) John>performance to within 10%. In fact, it is all to easy to have 10% John>changes in the performance of your application itself if (as is John>inevitable) it is run under conditions that differ from the formal John>benchmark test. Minor changes like operating system or compiler John>upgrades, changes in the system background load, or even disk John>fragmentation can produce 10% changes in wall-clock time quite John>easily.... Good points. We should explore these in discussion. I hope I can post some numbers. Some codes merely need a compiler switch. John>So how precise do I think the numbers are? Well, with 6 or so years John>of experience in performance evaluation of supercomputers and John>high-performance workstations, I can generally [i.e., not always] John>estimate the performance of my codes to within 20-25% based on a broad John>suite of benchmark results (LINPACK 100x100, LINPACK 1000x1000, John>Livermore Loops, hardware description with cycle counts, and maybe a John>bit more). (I deliberately ignore PERFECT since the one code that I John>know in some detail [the ocean model from GFDL/Princeton] is a mess, John>and I would not blame a compiler at all for having trouble vectorizing John>or optimizing it [or even understanding what it is supposed to be John>doing!]). I don't think the solution will lie SOLELY in benchmark programs. I don't think Hennessy made that clear enough, and I want to try and show some of that, but I must also respect certain copyright and non-disclosure information. We have to instrument OSes more, the hardware, and the compiler (what am I forgetting?). Sadily, this won't immediately help the guy who just wants to buy a PC, but it will help architects and compiler writers and I hope it "trickles down." --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.