dgh@validgh.com (David G. Hough on validgh) (05/30/91)
There's been some confusion recently on comp.arch about the difference between errors and error bounds. If you have computed something and you know the error, you can get the correct answer just be adding the error to the computed value, then you have no error. A more realistic situation is that you know a bound on the error, which confines the correct answer to some region around the computed value. "Accuracy benchmark" programs which attempt to measure the quality of computer arithmetic by comparing computed results to the known correct answers are common but not very interesting because they don't correspond to the much more typical situation in which the correct answer is not known. Not many people routinely spend most of their time solving problems whose answers they already know. A much better kind of accuracy benchmark is one which compares the size of the error bounds. Better arithmetic produces smaller error bounds for comparable computations using the same word size. Unfortunately error bounds also reflect the skill of the error analyst who computed them analytically or numerically, as well as underlying aspects intrinsic to the problem, or inherent in particular algorithms, or due to peculiarities of the hardware and software of a particular computer system. So separating all these aspects is not so easy, particularly for amateurs. Here's an example of simple error bounds: z = x * y in 32-bit floating point. In IEEE single precision or VAX F format, the relative rounding error will be bounded by 2**-24. (IEEE and VAX may produce different answers and hence different errors but the error bounds are identical). In IBM 370 format, the relative rounding error will be bounded by 16**-5 = 2**-20. Although it won't happen in this simplest example, in general it may happen that an IBM 370 will sometimes produce a more accurate answer - with a smaller error - just due to lucky roundoff. But that doesn't help with realistic problems. The error bound on IBM 370 will always be much larger than for IEEE or VAX. Once this was figured out, nobody bothered producing any more base-16 instruction set architectures - if you wanted IBM 370 compatibility, you did it the IBM 370 way, otherwise something different. As has been observed previously, this difference in the size of error bounds is largely responsible for the transition from 32 to 64 bits as the default for scientific computation; the other major factor was the appearance of the CDC 6600 at about the same time which offered primarily 60-bit floating point, perhaps because it was faster to do that quick and dirty than to do something shorter and cleaner - many, but not all, computations may be immunized against dirty arithmetic by using enough precision. Here's another example: s = s + x * y s, x, and y are all IEEE double precision. In a conventional arithmetic system there will be two rounding errors to double precision, one following the multiply and one following the add. In an IBM RS/6000-style multiply-add architecture there will be one rounding error to double precision. (The PA-RISC multiply-add does not work this way). In a 68040 or 80486 extended-precision architectures there will be three rounding errors, two in extended precision following the multiply and the add, and one in double precision. Which is best? The RS/6000 will produce the smallest error bound, followed very closely by the extended-precision architectures; the conventional architecture will produce a somewhat larger error bound. The superiority of extended-precision architectures is manifested in slightly more complicated examples like h = x*x + y*y or in complex multiplication, and in the latter case extended precision offers superior immunity to gratuitous intermediate overflow and underflow. However both extended precision and RS/6000-style multiply-add are problematic in cases like a = p*p; b = a - p*p; where a mental model based on conventional arithmetic, that expects b to be zero, is inappropriate but widespread. Much otherwise commendable software founders either on extended precision or multiply-add or both. (Extended precision problems are further aggravated by allocating storage-precision variables to extended-precision registers.) Thus most versions of Kahan's paranoia program detect extended- precision expression accumulation correctly and decline to analyze it further, but inadvertently detect IBM RS/6000-style multiply-add as a design flaw. In light of the foregoing one might well ask if it's reasonable to expect that IEEE 754 implementations provide identical numerical results. I consider that to be a reasonable expectation, provided that it's tempered with the understanding that whatever uniform paradigm is prescribed for expression evaluation, base conversion, elementary transcendental function evaluation, it will cost something on all systems and a lot on many, so that there will always be a need for alternative non-uniform modes of operation favoring performance, and these modes will vary among systems. ----------------------------------------------------------------------------- The best published book on floating-point arithmetic is by Sterbenz and long out of print. The Computing Surveys article by David Goldberg is a good place to start, however, and much more up to date. Though even after 25 years there is very little that needs updating in Forsythe and Moler which explains one particular mathematical software area about as simply as it will ever be explained. I also maintain a mailing list that discusses such issues in the context of computer system hardware and software design and implementation. Interested persons may send requests to numeric-interest-request@validgh.com. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.ornl.gov
dik@cwi.nl (Dik T. Winter) (05/31/91)
A minor quible: In article <391@validgh.com> dgh@validgh.com (David G. Hough on validgh) writes: ... > "Accuracy benchmark" programs which attempt to measure the quality of computer > arithmetic by comparing computed results to the known correct answers are > common but not very interesting because they don't correspond to the much > more typical situation in which the correct answer is not known. > Not many people routinely spend most of their time solving problems whose > answers they already know. > Although you mention later down versions of Kahan's paranoia (in that it fails to recognize and analyze extended precision intermediate results), I would like to mention that the program still has many virtues. Similar for the Floating Point Validation suit from NAG. As long as architects do not understand the basics of floating point arithmetic we will have such anomalies as: 1. If a equals b it is not necessarily true that b equals a. 2. For some value v close to 1.0, 1.0/v = 0.0 (luckily the instruction where that comes from is not used on that machine). paranoia and FPV are able to detect such basic design flaws. Of course you should be aware that such programs can give you false alarms. And also of course, those programs are dependent on both hardware and compiler. I have some experience. Compiler writers are also in general completely unaware of the properties of floating point arithmetic. In Ada each floating point type has some attributes associated with it that tell you the properties of the arithmetic (e.g. what is tha machine radix, does overflow trap and so on). I have written a program that verifies those attributes (it will be published in a forthcoming issue of Ada Letters). Some time ago I observed that: 1. For every attribute there was a compiler that got it wrong. 2. For every compiler there was a wrong attribute. True at that time. Luckily things are improving. You can draw your own conclusion. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl