[comp.unix.wizards] Floating Point Expectations

amull@Morgan.COM (Andrew P. Mullhaupt) (05/25/90)

In article <411@yonder.UUCP>, michael@yonder.UUCP (Michael E. Haws) writes:
> Is it reasonable to expect that the following code should
> work the same on all platforms, and produce "good" results?
> 
>         if ((45.0 / 100.0) == .45)
> 		printf("good\n");
> 	else
> 		printf("bad\n");

No. Thou shalt not compare floating point results for equality.
They might depend on compiler options for optimization, etc. even
on the same machine.

> I remember seeing somewhere (possible a different group) an article which
> made reference to some software which analyzed the accuracy of floating
> point arithmetic on whatever host it was running.  Anyone know where
> this code can be found?

You may be referring to paranoia, which I recommend for investigating
your floating point reliability. However, it does not work by doing
computations and then simply comparing results to fixed values. It
usually computes various relationships which should be true among its
results and reports on discrepancies.

You might wonder if there are clever ways around this kind of problem,
like subtracting and comparing to zero, but it if you are expecting to
get the code to agree bit for bit across "all platforms", you're going
to have to something like resorting to emulating the floating point
arithmetic in integer C code. Consider that IBM 360 style floating point
is not bit-normalized, and that Cray double precision is 64 bit, etc.

Later,
Andrew Mullhaupt

john@newave.UUCP (John A. Weeks III) (05/25/90)

In article <411@yonder.UUCP> michael@yonder.UUCP (Michael E. Haws) writes:
>Is it reasonable to expect that the following code should
>work the same on all platforms, and produce "good" results?
>
>        if ((45.0 / 100.0) == .45)
>		printf("good\n");
>	else
>		printf("bad\n");

E-gads.  Don't ever try "exact" comparisions of floating point numbers.
After all, floating point numbers in a computer are a binary representation
of an "approximation" to the real number value that you immagined when you
typed in the base 10 symbols.

You can compare integers representations, within the limits of the machine
representation, but floating point numbers should be range checked.  For
example, try the following:

	double ratio;

	ratio = 45.0 / 100.0;

	if ( ratio > 0.44999 && ratio < 0.45001 )
	  printf( "good\n" );
	else
	  printf( "bad\n" );

Some people like to use a subtract and a "fabs" of some sort (macro
or a procedure):

	if ( fabs( (45.0 / 100.0) ) < 0.001 )
	  printf( "good\n" );
	else
	  printf( "bad\n" );

In this manner, you can substiture a define or a variable for the
0.001 incase you decide to globally change your tolerance level.

There are a few legit places that you can compare real numbers, but
this is best left for experts that know _when_ and _why_ it is legit.

-john-

-- 
===============================================================================
John A. Weeks III               (612) 942-6969               john@newave.mn.org
NeWave Communications                ...uunet!rosevax!bungia!wd0gol!newave!john
===============================================================================

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/25/90)

In article <1990May24.132423.3080@eddie.mit.edu> aryeh@eddie.MIT.EDU (Aryeh M. Weiss) writes:
>The moral of the story is, just don't do == between two FP numbers.
>Exactitude is simply unrealistic in floating point.

Although this is correct as far as it goes, you can still get into trouble.
To avoid trouble, you must first truly believe that floating-point
representations are APPROXIMATIONS to real numbers.  This mind set
will enable you to be suspicious of any algorithmic assumptions
that rely on mathematical properties of the real-number system
being true of floating-point representations.  Some such properties
ARE true for some implementations of floating point, but not ALL
such properties and not for ALL implementations.  Textbooks on
numerical methods in computation should be consulted for more details.

amull@Morgan.COM (Andrew P. Mullhaupt) (05/25/90)

In article <1990May24.132423.3080@eddie.mit.edu>, aryeh@eddie.mit.edu (Aryeh M. Weiss) writes:
> In article <411@yonder.UUCP> michael@yonder.UUCP (Michael E. Haws) writes:
> >Is it reasonable to expect that the following code should
> >work the same on all platforms, and produce "good" results?
> >        if ((45.0 / 100.0) == .45)
> >		printf("good\n");
> >
> 
> Only numbers that can be expressed exactly in base 2 will produce
> "good" results.  0.1 and 0.01 are prime examples of numbers that
> cannot.  (A common textbook example is accumulating 0.01 100 times and
> not getting 1.0.) 

Not so fast there! There are still computers around with BCD (binary
coded decimal) floating point, both in hardware and software. There
are even machines which do not have an ordinary radix, such as the
Wang 'logarithmic' floating point representation. What you really
intend to say here is that that floating point numbers which are
rational fractions with denominators given by a power of the radix
may escape rounding. 

Portable and careful use of floating point is not based on assumptions
about when rounding may occur or in what way numbers will be rounded.
A good example of this is "Moler's expression", found in various
places, (e.g. LINPACK). This expression attempts to determine the
machine epsilon on the assumption that the radix of the representation
is not a power of three, and even though there are very few (if any)
machines for which this assumption is not valid, there is explicit
comment indicating this in the sources. 

(a FORTRAN version of Moler's expression might be:

	((4.0/3.0) - 1.0) * 3.0 - 1.0

in single precision.)

Now it is sometimes quite useful to assume that the radix of the
floating representation is known, but this assumption cannot be
regarded as portable. Two such cases are balancing a matrix before
applying the QR algorithm to compute eigenvalues, where the diagonal
is scaled by a vector of powers of the base of the floating point
representation:

	"The diagonal matrix D is chosen to have the form
	 diag(beta^i1, beta^i2, ..., beta^in) where beta is the
	 floating point base. Note that D^-1 A D can be computed
	 without roundoff."

From: Golub and van Loan, _Matrix Computations_, 2d edition, p. 381
Johns Hopkins, Baltimore, (1989).

Then with respect to computing the matrix exponential by scaling
and squaring, the same authors (in the same book on p. 557) point
out that division by a power of two is particularly convenient
because the resultant power amounts to repeated scaling. They _do
not_ make any remark about the absence of roundoff. This is because
the algorithm is particularly suited to squaring, and the division
is by a power of two for this reason, not because it may be the 
floating radix. If they had in mind only machines with radix 2, they
would likely have remarked on the roundoff advantage as well.

It becomes quite clear that this authoritative and very recent
reference is written with the point of view that you should _not_
assume the radix of the floating point representation is 2, even
when it might be mildly beneficial to take advantage of this. In
any case where "all platforms" are under consideration, or even 
"all UNIX platforms", one should not _expect_ that the dyadic
rationals are exempt from rounding. They are in fact mute on the
subject of non-radix floating representations, since on p. 61 they
give a radix model for floating point representations. 

Later,
Andrew Mullhaupt

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/25/90)

In article <995@s8.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>It becomes quite clear that this authoritative and very recent
>reference is written with the point of view that you should _not_
>assume the radix of the floating point representation is 2, ...

It is even more obvious when you consider the wide variation even among
binary-radix floating-point processors with respect to guard bits,
rounding modes, and so on that it is insane to think than any simple
model of floating-point is going to apply accurately universally.
Rather than attempting to model exact behavior of floating-point systems,
I recommend devising algorithms that are robust in the face of
considerable cruft from the floating-point unit.  This is virtually
always possible, if you undertake the task with sufficient suspicion.

dik@cwi.nl (Dik T. Winter) (05/26/90)

In article <995@s8.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
 > Not so fast there!
Not so fast here!
 >                    There are still computers around with BCD (binary
 > coded decimal) floating point, both in hardware and software. There
 > are even machines which do not have an ordinary radix, such as the
 > Wang 'logarithmic' floating point representation. What you really
 > intend to say here is that that floating point numbers which are
 > rational fractions with denominators given by a power of the radix
 > may escape rounding. 
   ***

Very true.  The keyword here is may.  There is that infamous binary
machine where division or multiplication by two need not be exact.
Also there were (still are?) machines without representation for 0.0,
they had only plus or minus very small.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

brnstnd@stealth.acf.nyu.edu (05/28/90)

In article <13000@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
> Rather than attempting to model exact behavior of floating-point systems,
> I recommend devising algorithms that are robust in the face of
> considerable cruft from the floating-point unit.

This doesn't remove from FPU makers the responsibility to make chips
satisfying reasonably simple rules. You really can get a lot better
performance out of numerical algorithms when, e.g., you're working under
Knuth's model of floating-point computation. ANSI floating-point is an
excellent start, and it's worth the effort for strict compliance.

Anyway, this discussion belongs in sci.math or alt.chips.poor-engrg.
Followups to the former.

---Dan

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/28/90)

In article <1603:May2720:32:5190@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
-In article <13000@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
-> Rather than attempting to model exact behavior of floating-point systems,
-> I recommend devising algorithms that are robust in the face of
-> considerable cruft from the floating-point unit.
-This doesn't remove from FPU makers the responsibility to make chips
-satisfying reasonably simple rules. You really can get a lot better
-performance out of numerical algorithms when, e.g., you're working under
-Knuth's model of floating-point computation. ANSI floating-point is an
-excellent start, and it's worth the effort for strict compliance.

Unfortunately, in the real world the programmer cannot specify the
properties of the floating-point hardware.  There are numerous kinds
of FP architectures and implementations, each with its own set of
inaccuracies.  I stand by what I said -- instead of relying on the
FP model, make the algorithms robust.

aryeh@eddie.mit.edu (Aryeh M. Weiss) (05/29/90)

In article <995@s8.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>In article <1990May24.132423.3080@eddie.mit.edu>, aryeh@eddie.mit.edu (Aryeh M. Weiss) writes:
>> 
>> Only numbers that can be expressed exactly in base 2 will produce
>> "good" results.  0.1 and 0.01 are prime examples of numbers that
>> cannot.  (A common textbook example is accumulating 0.01 100 times and
>> not getting 1.0.) 
>
>Not so fast there! There are still computers around with BCD (binary
>coded decimal) floating point, both in hardware and software. There

... Well, it IS a textbook example.  I've seen it.

>are even machines which do not have an ordinary radix, such as the
>Wang 'logarithmic' floating point representation. What you really
>intend to say here is that that floating point numbers which are
>rational fractions with denominators given by a power of the radix
>may escape rounding. 

I have seen BCD FP representations in software and I have also seen
BCD fixed point representations in hardware.  I was not aware of the
Wang format.  I should have qualified my remarks to IEEE-like (base 2)
FP formats.  Certainly there have been as many FP formats as manufacturers.
It is always good practice to absolutely avoid assumptions about the
underlying hardware.  The other point is that computations that are
algebraically identical cannot be assumed to be numerically identical.
Algebraic expressions sometimes must be rearranged and performed in careful
order to preserve numerical precision.  A typical problem is taking the
difference of two large but numerically close numbers, resulting in loss
of precision in the small difference.  This is what often screws up
*unsophisticated* linear algebra algorithms.

Anyway this is certainly getting out of bandwidth.

--

rcd@ico.isc.com (Dick Dunn) (05/31/90)

michael@yonder.UUCP (Michael E. Haws) writes:
> Is it reasonable to expect that the following code should
> work the same on all platforms, and produce "good" results?
> 
>         if ((45.0 / 100.0) == .45)
> 		printf("good\n");
> 	else
> 		printf("bad\n");

Leaving aside for a moment all the nasty things that can go wrong with
floating-point equality, you might note a couple of things here and ask
some simpler questions:

Note that the given expression contains a division of a couple of integral
values in float notation.  On almost all architectures, these should be
represented exactly.  The question could be interpreted as whether the
machine is able to do an accurate division for these particular operands.

Another way to look at it is to think about the process of forming the
float constant ".45".  This is often done by taking the string of digits
(45) and scaling by an appropriate power of ten, which is remarkably close
in semantics to 45.0/100.0.

Yet a third viewpoint is that the entire expression is constant; a good
compiler ought to evaluate it carefully.  The likelihood that it will come
up true, while admittedly not a certainty, ought to be higher than or equal
to the likelihood that it comes up true if calculated at run time.

For the particular example given, it's more surprising to get the answer
"false" than for just some random example of a floating-point equality
test.
_ _ _ _ _

Also, in passing I'll note that the business of equality comparison with an
error tolerance can be just as risky as exact equality.  Suppose we define
this near equality as a function (or macro or whatever) "almostequal(a,b)".
The nasty is that this function is not transitive--you can have three
values such that "almostequal(a,b)" and "almostequal(b,c)" are true, but
"almostequal(a,c)" is false.  This can trap an unwary programmer.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Simpler is better.

amull@Morgan.COM (Andrew P. Mullhaupt) (06/01/90)

In article <1990May31.001618.24274@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:
> michael@yonder.UUCP (Michael E. Haws) writes:
> > Is it reasonable to expect that the following code should
> > work the same on all platforms, and produce "good" results?
> > 
> >         if ((45.0 / 100.0) == .45)
> > 		printf("good\n");
> > 	else
> > 		printf("bad\n");
> 
> Leaving aside for a moment all the nasty things that can go wrong with
> floating-point equality, you might note a couple of things here and ask
> some simpler questions:
> 
> Note that the given expression contains a division of a couple of integral
> values in float notation.  On almost all architectures, these should be
> represented exactly.  The question could be interpreted as whether the
> machine is able to do an accurate division for these particular operands.
> 
> Another way to look at it is to think about the process of forming the
> float constant ".45".  This is often done by taking the string of digits
> (45) and scaling by an appropriate power of ten, which is remarkably close
> in semantics to 45.0/100.0.
> 
> Yet a third viewpoint is that the entire expression is constant; a good
> compiler ought to evaluate it carefully.  The likelihood that it will come
> up true, while admittedly not a certainty, ought to be higher than or equal
> to the likelihood that it comes up true if calculated at run time.
> 

On the other hand, consider the possibility that the .45 is converted
to double, as are the exactly representable 45.0 and 100.0. Now if
the division is performed with guard digits, or perhaps in IEEE
extended, then we have to worry about whether or not the comparison
is done strictly as double, or in the wider form. This can depend on
the hardware (do you have a coprocessor or not), on the compiler,
(does it use extended or not), or on the options given to the compiler.
The comparison, if done in the wider format, will result unequal if
the extended and double representations of 45.0/100.0 are not the
same. 

You're probably right that a lot of machines will result in equality
for the given example, but the complexity of ensuring this makes it
an assumption I'd rather skip than justify.

One of the most interesting classes of bugs is based on the failure
of this assumption; a program is developed, but when optimization is
enabled, it breaks because some result which is stored (and therefore
tacitly coerced back to double) is no longer coerced because the
compiler optimized the store/recall sequence away. If you recompile
the program without optimization (as is often necessary) in order to
run the program under a debugger, the bug will go away! 

The hours of enjoyment you might spend chasing this kind of bug 
down is reason enough to steer well clear of code which can be
very sensitive to the floating representation. There are enough
such possible hours that studying numerical analysis and learning
ways to safely exploit fixed and floating point computations (and
a lot of other useful stuff) could be thought of as a time saver.

Later,
Andrew Mullhaupt

moss@cs.umass.edu (Eliot Moss) (06/02/90)

Just a minor followup to the (45.0 / 100.0) == .45 question:

While the integers 45.0 and 100.0 are undoubtedly represented exactly, .45 is
a repeating form in binary (just as, say, 1/7 is in decimal notation). One
*would* expect (50.0 / 100.0) == .50 since .50 *does* have an exact
representation in binary. In general, when writing floating point algorithms
one must demand equality only within a tolerance; choosing that tolerance is a
complex issue, and one can have relative and absolute tolerances. Hope this
adds a little to the original posters understanding ....		Eliot
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

dik@cwi.nl (Dik T. Winter) (06/06/90)

Just a minor followup:
In article <MOSS.90Jun1104004@ibis.cs.umass.edu> moss@cs.umass.edu writes:
 > Just a minor followup to the (45.0 / 100.0) == .45 question:
...
 >                                                                        One
 > *would* expect (50.0 / 100.0) == .50 since .50 *does* have an exact
 > representation in binary.

I would not bet my money on it.  If the evaluation of 50.0 / 100.0 is done
using the hardware operation(s) to do division the result may very well
not be exact.  Try on a Cray:  3.0/3.0, 5.0/5.0, 7.0/7.0 etc.  Some give
exactly 1, others do not.  (The Cray does not have division, but inversion
followed by multiplication.  And all those inverses are not exact.)  Still
worse, on a CDC 205 in double precision division by 2.0 need not be exact.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

hwt@.bnr.ca (Henry Troup) (06/06/90)

It occurs to me that even if the compiler does pre-evaluation of floating
point expressions, you could be compiling on a different machine...
More variables...
--
Henry Troup - BNR owns but does not share my opinions
..uunet!bnrgate!hwt%bwdlh490 or  HWT@BNR.CA