[comp.arch] errors and error bounds in floating-point arithmetic

dgh@validgh.com (David G. Hough on validgh) (05/30/91)

There's been some confusion recently on comp.arch about the difference between
errors and error bounds.

If you have computed something and you know the error, you can get the correct
answer just be adding the error to the computed value, then you have
no error.  A more realistic situation is that you know a bound on the error,
which confines the correct answer to some region around the computed value.

"Accuracy benchmark" programs which attempt to measure the quality of computer
arithmetic by comparing computed results to the known correct answers are
common but not very interesting because they don't correspond to the much
more typical situation in which the correct answer is not known.
Not many people routinely spend most of their time solving problems whose
answers they already know.

A much
better kind of accuracy benchmark is one which compares the size of the 
error bounds.  Better arithmetic produces smaller error bounds for comparable
computations using the same word size.

Unfortunately error bounds also reflect the skill of the error analyst who
computed them analytically or numerically, as well as underlying aspects
intrinsic to the problem, or inherent in particular algorithms, or due to
peculiarities of the hardware and software of a particular computer system.
So separating all these aspects is not so easy, particularly for amateurs.

Here's an example of simple error bounds:

	z = x * y

in 32-bit floating point.  In IEEE single precision or VAX F format,
the relative rounding error will be bounded by 2**-24.  (IEEE and VAX
may produce different answers and hence different errors but the error
bounds are identical).  In IBM 370
format, the relative rounding error will be bounded by 16**-5 = 2**-20.
Although it won't happen in this simplest example, in general it may happen
that an IBM 370 will sometimes produce a more accurate answer - with a 
smaller error - just due to lucky roundoff.    But that doesn't help with
realistic problems.  The error bound on IBM 370 will always be much larger
than for IEEE or VAX.   Once this was figured out, nobody bothered producing
any more base-16 instruction set architectures - if you wanted IBM 370
compatibility, you did it the IBM 370 way, otherwise something different. 

As has been observed previously, this difference in the size of error bounds
is largely responsible for
the transition from 32 to 64 bits as the default for scientific computation;
the other major factor was the appearance of the CDC 6600 at about the same
time which offered primarily 60-bit floating point, perhaps
because it was faster to do
that quick and dirty than to do something shorter and cleaner - many, but
not all, computations may be immunized against dirty arithmetic by using
enough precision.

Here's another example:

	s = s + x * y

s, x, and y are all IEEE double precision.

In a conventional arithmetic system there will be two rounding errors to
double precision, one following the multiply and one following the add.

In an IBM RS/6000-style multiply-add architecture there will be one rounding
error to double precision.  (The PA-RISC multiply-add does not work this
way).

In a 68040 or 80486 extended-precision architectures there will be three
rounding errors, two in extended precision following the multiply and the
add, and one in double precision.

Which is best?  The RS/6000 will produce the 
smallest error bound, followed very closely by the extended-precision
architectures; the conventional architecture will produce a somewhat larger
error bound.  The superiority of extended-precision architectures is 
manifested in slightly more complicated examples like

	h = x*x + y*y

or in complex multiplication, and in the latter case
extended precision offers superior immunity
to gratuitous intermediate overflow and underflow.  

However both extended precision and RS/6000-style multiply-add are problematic
in cases like

        a = p*p;
        b = a - p*p;

where a mental model based on conventional arithmetic, that expects b to
be zero, is inappropriate but widespread.   Much otherwise commendable
software founders either on extended precision or multiply-add or both.
(Extended precision problems are further aggravated by allocating 
storage-precision variables to extended-precision registers.)
Thus most versions of Kahan's paranoia program detect extended-
precision expression accumulation correctly and decline to analyze it further, 
but inadvertently detect IBM RS/6000-style multiply-add as a design flaw.

In light of the foregoing one might well ask if it's reasonable to expect
that IEEE 754 implementations provide identical numerical results.  I consider
that to be a reasonable expectation,  provided that it's tempered with the
understanding that whatever uniform paradigm is prescribed for

	expression evaluation,
	base conversion,
	elementary transcendental function evaluation,

it will cost something on all systems and a lot on many, 
so that there will always
be a need for alternative non-uniform modes of operation favoring performance,
and these modes will vary among systems.

-----------------------------------------------------------------------------

The best published book on floating-point arithmetic is by Sterbenz and long
out of print.  The Computing Surveys article by David Goldberg is a good place
to start, however, and much more up to date.  Though even after 25 years
there is very little that needs updating in Forsythe and Moler which 
explains one particular mathematical software area about as simply as it will
ever be explained.

I also maintain a mailing list that discusses such issues in the context
of computer system hardware and software
design and implementation.  Interested persons
may send requests to numeric-interest-request@validgh.com.
-- 

David Hough

dgh@validgh.com		uunet!validgh!dgh	na.hough@na-net.ornl.gov

dik@cwi.nl (Dik T. Winter) (05/31/91)

A minor quible:

In article <391@validgh.com> dgh@validgh.com (David G. Hough on validgh) writes:
...
 > "Accuracy benchmark" programs which attempt to measure the quality of computer
 > arithmetic by comparing computed results to the known correct answers are
 > common but not very interesting because they don't correspond to the much
 > more typical situation in which the correct answer is not known.
 > Not many people routinely spend most of their time solving problems whose
 > answers they already know.
 > 
Although you mention later down versions of Kahan's paranoia (in that it
fails to recognize and analyze extended precision intermediate results),
I would like to mention that the program still has many virtues.  Similar
for the Floating Point Validation suit from NAG.  As long as architects
do not understand the basics of floating point arithmetic we will have
such anomalies as:
1.	If a equals b it is not necessarily true that b equals a.
2.	For some value v close to 1.0, 1.0/v = 0.0 (luckily the instruction
	where that comes from is not used on that machine).
paranoia and FPV are able to detect such basic design flaws.  Of course you
should be aware that such programs can give you false alarms.  And also of
course, those programs are dependent on both hardware and compiler.

I have some experience.  Compiler writers are also in general completely
unaware of the properties of floating point arithmetic.  In Ada each
floating point type has some attributes associated with it that tell you
the properties of the arithmetic (e.g. what is tha machine radix, does
overflow trap and so on).  I have written a program that verifies those
attributes (it will be published in a forthcoming issue of Ada Letters).
Some time ago I observed that:
1.	For every attribute there was a compiler that got it wrong.
2.	For every compiler there was a wrong attribute.
True at that time.  Luckily things are improving.

You can draw your own conclusion.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl