[comp.lang.c] Bug?

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/29/89)

brent@capmkt.COM (Brent Chapman) writes:
>earleh@eleazar.dartmouth.edu (Earle R. Horton) writes:
># It is in general bad practice to apply an equality test to the result
># of a computation.  Instead of
># 	( a == b )
># use
># 	( ( a < b + MINDOUBLE ) && ( a > b - MINDOUBLE ) )
># 
># MINDOUBLE is in <values.h>.
>Yeah, _right_.  Nice, clear, concise code there.  Why shouldn't I reasonably
>expect the compiler to do this for me?

Because it can produce very non-intuitive results. For instance, if the
compiler were to do it automatically it's entirely possible that
	if( ( a == b ) && ( b == c ) && (a != c) ) 
		printf( "What happened?\n" );
would print "What happened?" a lot. The reason is that a could well
equal b within the fuzz factor, and b could equal c, but a could be
2*MINDOUBLE away from c's value and hence not be equal.

That's why compilers don't automatically generate code to 'fix' problems
like this: because it might cause far bigger problems for people
doing lots of floating point ops. In general floating point is a
big problem --- equality tests on floating point numbers are a bad
idea in most languages. Also, things like a*(b+c) often don't equal
a*b+a*c, a*(b*c) is often unequal to (a*b)*c, and so on.

Followups to comp.lang.c, because this is more a C language issue.
It's certainly not specific to A/UX anymore.

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.

tneff@bfmny0.UU.NET (Tom Neff) (09/29/89)

Specifically, if the C floating point equality operator automatically
corrected for the fuzz factor, how would you write a C program to
test the fuzz factor?
-- 
"Take off your engineering hat   | "The filter has      | Tom Neff
and put on your management hat." | discreting sources." | tneff@bfmny0.UU.NET

rob@raksha.eng.ohio-state.edu (Rob Carriere) (09/30/89)

In article <1989Sep29.144716.20921@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu
writes: 
>brent@capmkt.COM (Brent Chapman) writes:
>>earleh@eleazar.dartmouth.edu (Earle R. Horton) writes:
>># [...]  Instead of
>># 	( a == b )
>># use
>># 	( ( a < b + MINDOUBLE ) && ( a > b - MINDOUBLE ) )
>># 
>># MINDOUBLE is in <values.h>.
>>[...]  Why shouldn't I reasonably expect the compiler to do this for me?
>Because it can produce very non-intuitive results. For instance, if the
>compiler were to do it automatically it's entirely possible that
>	if( ( a == b ) && ( b == c ) && (a != c) ) 
>		printf( "What happened?\n" );
>would print "What happened?" a lot. The reason is that a could well
>equal b within the fuzz factor, and b could equal c, but a could be
>2*MINDOUBLE away from c's value and hence not be equal.

Worse yet, in some situations you may want a value significantly larger than
MINDOUBLE for your fuzzing.  This may be because of the numerical conditioning
of your algorithm, or inaccuracies in your input data, or simply because
you don't need a high-precision answer.  The compiler cannot possibly foresee
all these circumstances.  If you object to the clarity of the code, write a
macro or function of the form

int is_equal( double x, double y, double tol );  /* [1] */

that does the evaluation for you.  But whatever you do, *never* assume a
simple equality test on floats or doubles will work.  And *always* be able to
justify the value of the tolerance you used.  If you don't do those things,
you don't know what your program's output means.

SR

amull@Morgan.COM (Andrew P. Mullhaupt) (09/30/89)

When comparing floating point numbers for equality, you are in grave
peril of several incredibly painful bugs, to wit:

1. Forget portability across machines. 

2. The 'storage history' fun: If your computer has a coprocessor or
FPA, you may find that IEEE arithmetic (80 bits wide) is done there,
but your number comes back to the CPU as a double. This means that

    z = functionreturningdouble();
    if ( y == z ) printf("Spew");

and

    if ( y == functionreturningdouble() ) printf("Spew");

need not have the same output. The classic example of this torture is
a bug where an expression needs to be debugged is computed in the
coprocessor (80 bit) registers, If the compiler has arrived at this
situation through optimization, and the program is not correct with
the extra bits of precision, then turning off optimization, (which
almost always happens if you compile for debugging), will make the bug
go away! If you observe this behavior, and decide to resort to the
insertion of diagnostic print statements, (because you realize that
you can't debug the program with the debugger), then the number which
gets printed out (i.e. has been coerced to double as it crossed the CPU
to be output), will appear to be correct, and the print statement may
actually fix the bug!

It is possible that in versions of C which have peculiar single and
double automatic conversions that this kind of bug occurs as well.

3. If I have a teeny tiny double called epsilon can I make this all
go away? Mostly yes, but strictly speaking, NO!!!!! If either you,
or your compiler (in a misguided imitation of APL's comparison tol-
erance) insert efficiency sapping +epsilon and -epsilon all over your
code, it displaces the problem to the comparisons

     y - epsilon < z

and
 
     y + epsilon > z

either of which can suffer from the storage history disease if the
difference involved is between two relevant precisions. In fact, you
may be inviting trouble by giving the compiler more to rearrange in
optimization.


The moral of the story is: Unless you are dead certain that nothing
else will do, comparing floating point numbers for equality is a
bad business, and your compiler ought not pretend to solve this problem
with some invisible epsilon, since it is not really solvable short of
interval arithmetic, and that's another story.

Later,
Andrew Mullhaupt

ark@alice.UUCP (Andrew Koenig) (10/01/89)

In article <14758@bfmny0.UU.NET>, tneff@bfmny0.UU.NET (Tom Neff) writes:

> Specifically, if the C floating point equality operator automatically
> corrected for the fuzz factor, how would you write a C program to
> test the fuzz factor?

Even if the == operator includes a fuzz factor, it's hard to imagine
a sensible implementation in which (a-b)==0 is true unless a and b
are truly identical.
-- 
				--Andrew Koenig
				  ark@europa.att.com

scott@bbxsda.UUCP (Scott Amspoker) (10/02/89)

In article <3146@quanta.eng.ohio-state.edu> rob@raksha.eng.ohio-state.edu (Rob Carriere) writes:
>[good discussion of floatingpoint compares with fuzz]

I remember seeing a language several years ago that had *two* equality
operators.  One of them was "equal" and the other "approx. equal".
The fuzz factor was settable under program control.  It was a long
time ago and I don't remember which language it was.  Does anybody
remember any such languages?

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232

amull@Morgan.COM (Andrew P. Mullhaupt) (10/02/89)

> I remember seeing a language several years ago that had *two* equality
> operators.  One of them was "equal" and the other "approx. equal".
> The fuzz factor was settable under program control.  It was a long
> time ago and I don't remember which language it was.  Does anybody
> remember any such languages?
>
APL has the 'fuzz' factor, and it's called Comparison Tolerance. (quad CT for
you APL types...)

It's not a good idea for floating point arithmetic, as my previous posting
made clear, but it is a necessary evil in a language where you don't have
different integer and real types. Since you can't tell from syntax if a 
variable is real or integer, you often implicitly compare reals for equality
in indexing arrays. The programmer is able to set the level of the fuzz factor
as an environment variable, and this leads to difficulties if this variable
is changed by running in more than one environment. (This happens a lot in
APL; each workspace may have a totally different environment, yet code is
often 'Re-Used' by simply copying it from one to another.)

APL offers the sensible option of setting the comparison tolerance to 0, which
(in good interpreters) eliminates the extra arithmetic entirely and allows
the programmer to develop code which avoids the pitfall of comparing floating
point representations for equality.

Later,
Andrew Mullhaupt 

diamond@csl.sony.co.jp (Norman Diamond) (10/04/89)

In article <9986@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:

>Even if the == operator includes a fuzz factor, it's hard to imagine
>a sensible implementation in which (a-b)==0 is true unless a and b
>are truly identical.

Sorry Mr. Koenig, it's easy to imagine.  Every hardware floating-point
system and most software ones have situations where (a-b)==0 but a != b.

For example (which obviously generalizes), imagine a base-10 system
where exponents can range from -99 to +99.  Let:
  a == 0.100001e-99
  b == 0.100002e-99
Then obviously a != b.  But what happens when (a-b) is computed?
(a-b) --> (-0.000001e-99) --> (-0.100000e-104) --> (0.000000e-99).

-- 
Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work)
  The above opinions are inherited by your machine's init process (pid 1),
  after being disowned and orphaned.  However, if you see this at Waterloo or
  Anterior, then their administrators must have approved of these opinions.

dik@cwi.nl (Dik T. Winter) (10/13/89)

In article <10895@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
 > In article <9986@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
 > 
 > >Even if the == operator includes a fuzz factor, it's hard to imagine
(I may be obtuse, but what is the meaning of Even here?)
 > >a sensible implementation in which (a-b)==0 is true unless a and b
 > >are truly identical.
 > 
 > Sorry Mr. Koenig, it's easy to imagine.  Every hardware floating-point
 > system and most software ones have situations where (a-b)==0 but a != b.
 > 
First off, there are many systems where (a-b)==0 while a and b are not equal
(there are less when you invoke the optimizer!).  On the other hand there
are many systems where (a-b)==0 implies that a and b are identical (IEEE
standard conforming machines come to the mind.)

The point is: how is the expression (a-b)==0 calculated?
1.  Optimized to a comparison of a and b.
1.1.The hardware does a direct comparison: (a-b)==0 implies a==b.
1.2.The hardware does a comparison by subtraction and does not
    know about partial underflow: (a-b)==0 does not imply a==b.
1.3.As 1.2, but partial underflow is allowed: (a-b)==0 implies a==b.
2.  Not optimized as under 1.
2.1.Hardware does not know partial underflow: like 1.2.
2.2.Hardware does know about it: like 1.3.
And I did not yet mention all (e.g. machines that recognize partial
underflow only in some circumstances; you can have: a*2.0=0.0 but
a+a!=0.0).

I agree with Mr. Koenig:
 > >Even if the == operator includes a fuzz factor, it's hard to imagine
 > >a sensible implementation in which (a-b)==0 is true unless a and b
 > >are truly identical.
but non-sensible systems do exist.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

ok@cs.mu.oz.au (Richard O'Keefe) (10/13/89)

In article <10895@riks.csl.sony.co.jp>, diamond@csl.sony.co.jp (Norman Diamond) writes:
> Sorry Mr. Koenig, it's easy to imagine.  Every hardware floating-point
> system and most software ones have situations where (a-b)==0 but a != b.

My understanding is that IEEE denormalised numbers were introduced
precisely so that subtracting one finite number from another would
yield zero ONLY when the two numbers were equal.

Of course most other floating-point systems haven't got denormalised
numbers, and some systems which use IEEE formats don't obey IEEE rules.

henry@utzoo.uucp (Henry Spencer) (10/13/89)

In article <10895@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
>... it's easy to imagine.  Every hardware floating-point
>system and most software ones have situations where (a-b)==0 but a != b.

My understanding is that this is impossible in IEEE floating point (the
only floating-point system any sensible designer would use, and increasingly
the only one system designers *do* use).
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

lmb@ghoti.uucp (Larry Breed) (10/14/89)

In article <10895@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
>In article <9986@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
>
>>Even if the == operator includes a fuzz factor, it's hard to imagine
>>a sensible implementation in which (a-b)==0 is true unless a and b
>>are truly identical.
>
>Sorry Mr. Koenig, it's easy to imagine.  Every hardware floating-point
					  ^^^^^
>system and most software ones have situations where (a-b)==0 but a != b.
>
> ...
>(a-b) --> (-0.000001e-99) --> (-0.100000e-104) --> (0.000000e-99).

Wrong.  ieee 754/854 floating-point hardware has 'gradual underflow' that retains 
significand values in a denormalized form if normalizing would cause exponent
underflow.  In your example, -0.000001e-99 would be retained as is. 
What you (Norm) say is true of most older fp implementations, but
there are an awful lot of good ieee 754 machines out there these days.
What you (Andy) say is true of ieee 754 machines, but there are an awful
lot of older fp implementation out there still.
Disclaimer: Don't blame my employer, blame:
Larry Breed			(415) 855-4460
uucp: uunet!ibmsupt!lmb		inet: ibmsupt!lmb@uunet.uu.net

dg@lakart.UUCP (David Goodenough) (10/17/89)

diamond@csl.sony.co.jp (Norman Diamond) sez:
> In article <9986@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
> 
>>Even if the == operator includes a fuzz factor, it's hard to imagine
>>a sensible implementation in which (a-b)==0 is true unless a and b
>>are truly identical.
> 
> Sorry Mr. Koenig, it's easy to imagine.  Every hardware floating-point
> system and most software ones have situations where (a-b)==0 but a != b.

Agreed. Also would any experts on ones complement systems like to throw
in a couple of cents? I am no expert on ones complement, but it might
be possible there. Then again, it might not be possible ..... H*ll
I don't know.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com			  +---+

Tim_CDC_Roberts@cup.portal.com (10/20/89)

> Regarding (a-b) == 0 && (a != b), would ones complement users like
> to comment?

On the CDC Cyber 70/170s, which are ones complement machines, +0 is
represented by 60 zero bits, and -0 is represented by 60 one bits.  
The machine hardware compensates in most cases, and it is impossible to
generate a -0 result to any arithmetic operation with the single
exception of (-0) - (-0), which produces -0.

So:  a = +0, b = -0:  a - b = +0  and a != b.

The register compare instructions, however, recognize both +0 and -0 as
zero.  (That is, "jump if zero" jumps in both cases).

The only place that one tends to get bit with this kind of anomaly is
in performing Boolean operations.  In that case, it is possible to
generate a result in which a register contains non-zero bits and still
compares as zero.  Thus, in assembly programming where this is apt to
occur, one tends to test for NG (negative) as well as ZR (zero).

Tim_CDC_Roberts@cup.portal.com                | Control Data...
...!sun!portal!cup.portal.com!tim_cdc_roberts |   ...or it will control you.