dgh@validgh.com (David G. Hough on validgh) (06/10/91)
There's been an ongoing discussion of issues of performance vs. robustness in comp.arch. How much is it worth to improve the performance on the "common" case at the cost of producing wrong results on "uncommon" cases? "uncommon" is not necessarily rare, of course. IEEE 754 floating-point arithmetic was intended to increase the domain of problems for which good results could be obtained, and to increase the likelihood of getting error indications if good results could not be obtained, all without significantly degrading performance on "common" cases and without significantly increasing total system cost relative to sloppy arithmetic. Since none of these goals are very quantitative, people will argue about how well they've been achieved. Part of the problem is that the benchmark programs in use measure only common cases. Various versions of the Linpack benchmark all have in common that the data is taken from a uniform random distribution, producing problems of very good condition. So the worst possible linear equation solver algorithm running on the dirtiest possible floating-point hardware should be able to produce a reasonably small residual, even for input problems of very large dimension. Such problems may actually correspond to some realistic applications, but many other realistic applications tend at times to get data that is much closer to the boundaries of reliable performance. This means that the same algorithms and input data will fail on some computer systems and not on others. In consequence benchmark suites that are developed by consortia of manufacturers proceeding largely by consensus, such as SPEC, will tend to exclude applications, however realistic, for which results of comparable quality can't be obtained by all the members' systems. The user's own applications is always the best benchmark, but because of practical difficulties it makes more sense for consortia of users in various industries to specify in a general way the problems to be solved in terms of input data, sample algorithms, and a measure of correctness of the computed results, and then allow fairly radical recoding of the algorithms to solve the problems on particular architectures. This is more like the way the PERFECT club works than like SPEC. The most important thing is that the published performance and correctness results should be accompanied by the source code (and Makefiles etc.) that achieves them so that prospective purchasers can determine for themselves the relevance of the results to their particular requirements. Most users place a great premium on NOT rewriting any source code, but others are willing to do whatever is necessary to get their jobs done. The rules for the 1000x1000 Linpack benchmark, for instance, are that you have to use the provided program for generating the input data and testing the results, but you get to recode the linear equation solution itself, the part that gets timed, in any way that makes sense for a particular architecture. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.ornl.gov