rick@.tetrauk.UUCP (Rick Jones) (07/30/90)
(This is nothing to do with unsigned integers :-) I don't know if this has been discussed in the dim past, but I haven't seen anything since I have been reading News. The problem is well-known - not every rational decimal number has an exact representation in binary floating-point form. This leads to anomalies such as: a = .291 / 2.0 ; b = .1455 ; if (a == b) /* false! */ Both expressions as expressed in decimal yield exactly .1455, but the computed results may not in fact be equal. This fails on my system when the 3 values are generated from strings using atof(). What does and does not fail will presumably depend on the FP representation used, and possibly on the actual hardware and compiler. The reason is that .1455 does not have an exact binary representation. In this case the nearest two values are .14549999999999999 and .14550000000000002. The two routes to the result get different nearest approximations. Thus the two results are not actually equal. Is there a general solution to this problem without continuously rounding everything? I believe it's possible by applying a "maximum reliable precision" concept to FP numbers. For IEEE 64-bit numbers this seems to be 1 in 10^16, or 15 significant digits. Using this, a function "fpcmp" can be written which takes 2 arguments: fpcmp (double a, double b) returns 0 if a equals b within the reliable precision, else returns 1 if a > b, or -1 if a < b Real problem: how do you write fpcmp() ? I've got one or two possible solutions, but they're not very nice, and computationally inefficient. Has anyone got a really good answer to this? Associated problems: Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same result when a and b are very close? What about a rounding function which will consistently round on "5"? E.g. using (s)printf to convert .1455 to 3 dec. places will give .145 and .146 for the two different close values above. While it's a good discussion subject, I suggest that anything extensive, or with large chunks of code, you email to me. I'll summarise such contributions later. -- Rick Jones You gotta stand for something Tetra Ltd. Maidenhead, Berks Or you'll fall for anything rick@tetrauk.uucp (...!ukc!tetrauk.uucp!rick) - John Cougar Mellencamp
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (07/31/90)
In article <622@.tetrauk.UUCP>, rick@.tetrauk.UUCP (Rick Jones) writes: > fpcmp (double a, double b) > > returns 0 if a equals b within the reliable precision, > else returns 1 if a > b, or -1 if a < b > > Real problem: how do you write fpcmp() ? See Knuth, "The Art of Computer Programming", vol 2, "Semi-Numerical Methods" He defines "approximate" versions of <, >, and *TWO* approximate versions of ==. Note that the appropriate "tolerance" to use does NOT depend only on the properties of your floating-point arithmetic, but on details of your algorithm. The ultimate answer to your question is to take a course on numerical analysis. I sometimes wonder whether there ought to be a generally understood qualification for computer programmers: LFP (licensed to use floating-point) (:-). > Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same result > when a and b are very close? No. In many computer arithmetic systems, a-b may collapse to a precise 0. See the axioms Knuth gave. -- Science is all about asking the right questions. | ok@goanna.cs.rmit.oz.au I'm afraid you just asked one of the wrong ones. | (quote from Playfair)
carlos@urz.unibas.ch (07/31/90)
In article <622@.tetrauk.UUCP>, rick@.tetrauk.UUCP (Rick Jones) writes: > > (This is nothing to do with unsigned integers :-) > > I don't know if this has been discussed in the dim past, but I haven't seen > anything since I have been reading News. > > The problem is well-known - not every rational decimal number has an exact > representation in binary floating-point form. This leads to anomalies such > as: > a = .291 / 2.0 ; > b = .1455 ; > > if (a == b) /* false! */ > > Both expressions as expressed in decimal yield exactly .1455, but the computed > results may not in fact be equal. This fails on my system when the 3 values > are generated from strings using atof(). What does and does not fail will > presumably depend on the FP representation used, and possibly on the actual > hardware and compiler. > > The reason is that .1455 does not have an exact binary representation. In this > case the nearest two values are .14549999999999999 and .14550000000000002. The > two routes to the result get different nearest approximations. Thus the two > results are not actually equal. > > Is there a general solution to this problem without continuously rounding > everything? I believe it's possible by applying a "maximum reliable precision" > concept to FP numbers. For IEEE 64-bit numbers this seems to be 1 in 10^16, or > 15 significant digits. Using this, a function "fpcmp" can be written which > takes 2 arguments: > > fpcmp (double a, double b) > > returns 0 if a equals b within the reliable precision, > else returns 1 if a > b, or -1 if a < b > > Real problem: how do you write fpcmp() ? > > I've got one or two possible solutions, but they're not very nice, and > computationally inefficient. Has anyone got a really good answer to this? > > Associated problems: > > Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same result > when a and b are very close? > > What about a rounding function which will consistently round on "5"? E.g. > using (s)printf to convert .1455 to 3 dec. places will give .145 and .146 for > the two different close values above. > > > While it's a good discussion subject, I suggest that anything extensive, or > with large chunks of code, you email to me. I'll summarise such contributions > later. > > -- > Rick Jones You gotta stand for something > Tetra Ltd. Maidenhead, Berks Or you'll fall for anything > rick@tetrauk.uucp (...!ukc!tetrauk.uucp!rick) - John Cougar Mellencamp
carlos@urz.unibas.ch (07/31/90)
Oooops, sorry for message nr. 2653, it was our first try ever.... Hope this time works...... In article <622@.tetrauk.UUCP>, rick@.tetrauk.UUCP (Rick Jones) writes: > fpcmp (double a, double b) > returns 0 if a equals b within the reliable precision, > else returns 1 if a > b, or -1 if a < b > > Real problem: how do you write fpcmp() ? Since there is only a finite quantity of fp numbers on the computer, an Increase(x) resp. Decrease(x) (to get the next resp. previous fp of x) procedure for fp numbers could do it. The tolerance that you build in your function fpcmp() depends on the fp system you use. You can implement the above functions very easily for IEEE fp formats: Just regard the bit-sequence representing your fp number as an integer, and increase resp. decrease it by 1. The new integer (interpreting it as double) is the next resp. previous possible fp number. This holds automatically in case of changes to other exponents and changes between normalized and denormalized numbers. You must only watch out for a change in sign and that decreasing negative numbers means increasing the mantissa (and exponent). > Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same resul t > when a and b are very close? Yes, if your fp system conforms IEEE Std. 754 due to the existence of denormalized numbers. Good luck, Stefan & Carlos
rob@baloo.eng.ohio-state.edu (Rob Carriere) (08/01/90)
In article <622@.tetrauk.UUCP> rick@tetrauk.UUCP (Rick Jones) writes: > >(This is nothing to do with unsigned integers :-) Are you sure? It's a floating subject after all... >The problem is well-known - not every rational decimal number has an exact >representation in binary floating-point form. This leads to anomalies such >as: [example: computattion != math result] > >Is there a general solution to this problem without continuously rounding >everything? I believe it's possible by applying a "maximum reliable precision" >concept to FP numbers. For IEEE 64-bit numbers this seems to be 1 in 10^16, or >15 significant digits. Using this, a function "fpcmp" can be written which >takes 2 arguments: [and finds whether they are within float epsilon of each other] Hm.. Are you sure this is what you want? In your normal everyday floating point problem, the inaccuracies resulting from real-world problems tend to greatly exceed those from the computer's FP system. If I measure the `resistance' of your average resistor, I had better not peek much further than 3 to 4 decimal digits of precision, or I will see that the resistance of this thing isn't constant after all --> the model (Ohm's Law in this case) doesn't match reality. Further, I doubt my multimeter is going to give me more tham 2 digits anyway (and I wouldn't stake anything on the exactness of the 2nd digit unless the thing's been calibrated recently) --> measurement error. OK, so this resistor is 3.1 kOhm. I tell my nifty-dandy program this and it comes back with the result that the eggs will take 182.14142314159 s to finish. How many digits of that answer do you trust? And we haven't even touched the numerical conditioning of that program yet. :-) In other words, usually what you want is something like int fcmp( double a, double b, double eps ) { if ( fabs( a-b ) < eps ) return 0; if ( a > b ) return 1; return -1; } where eps will be more than large enough that you don't have to worry about the FP system messing up the comparison. (Optimization of the above code is left as an excercise for th reader. After all, I already did the easy part :-) SR ---
ark@alice.UUCP (Andrew Koenig) (08/01/90)
In article <5467@quanta.eng.ohio-state.edu>, rob@baloo.eng.ohio-state.edu (Rob Carriere) writes: > In other words, usually what you want is something like > int fcmp( double a, double b, double eps ) > { > if ( fabs( a-b ) < eps ) return 0; > if ( a > b ) return 1; > return -1; > } Probably not. The trouble is that if a and b are big enough, then a-b will either be exactly 0 or will be larger than eps. (Here, `big enough' is approximately eps * 2^n, where n is the number of significant bits in a floating-point freaction.) There is also the possibility that a-b might overflow, in which case the comparison would surely have been valid if done directly. Floating point arithmetic is *hard*. -- --Andrew Koenig ark@europa.att.com
johnb@srchtec.UUCP (John Baldwin) (08/02/90)
In article <1990Jul31.101803.852@urz.unibas.ch> carlos@urz.unibas.ch writes: > >> Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same ^^^^^^[emphasis mine] >> result when a and b are very close? > >Yes, if your fp system conforms IEEE Std. 754 due to the existence of >denormalized numbers. Actually, it depends on what the previous writer means by "ensure." If the code has to be portable, one can't rely on IEEE-754 representation of floating point values, since that's NOT part of the C standard. Portably, you should be able to "get away with" using the values built in to <float.h> (a required part of the standard library) for the epsilon of your floating point representation. [Just in case this slipped past anyone, epsilon is the smallest quantity "e" such that x + e > x] -- John T. Baldwin | johnb@srchtec.uucp Search Technology, Inc. | johnb%srchtec.uucp@mathcs.emory.edu standard disclaimer: | ...uunet!samsung!emory!stiatl!srchtec.. opinions and mistakes purely my own. | ...mailrus!gatech!stiatl!srchtec...
r91400@memqa.uucp (Michael C. Grant) (08/06/90)
In article <11117@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes: > In article <5467@quanta.eng.ohio-state.edu>, rob@baloo.eng.ohio-state.edu (Rob Carriere) writes: >> In other words, usually what you want is something like >> int fcmp( double a, double b, double eps ) >> { >> if ( fabs( a-b ) < eps ) return 0; >> if ( a > b ) return 1; >> return -1; >> } > Probably not. > > The trouble is that if a and b are big enough, then a-b will either be > exactly 0 or will be larger than eps. (Here, `big enough' is approximately > eps * 2^n, where n is the number of significant bits in a floating-point > freaction.) There is also the possibility that a-b might overflow, > in which case the comparison would surely have been valid if done directly. Perhaps a solution to this problem would be an implementation- dependent library function fcmp, in which 'eps' is not some absolute number, but a fraction of the two nearly-equal numbers themselves: if (fabs(a-b)<(eps*a)) return 0; (assuming a is almost equal to b) The reason it would need to be a library function is that implementing it efficiently would involve extracting the mantissa and exponent and examining them separately. For example, assuming a standard which requires normalization, this algorithm might work (note this is just a near-equality test, not a complete comparison): if (exponent_a!=exponent_b) return false if abs(mantissa_a-mantissa_b)<error_limit return true else return false Of course, if you don't care about portable code just write this function yourself! Michael C. Grant no .sig, no .identity
ark@alice.UUCP (Andrew Koenig) (08/06/90)
In article <4958@memqa.uucp>, r91400@memqa.uucp (Michael C. Grant) writes: > For example, assuming a standard which > requires normalization, this algorithm might work (note this is > just a near-equality test, not a complete comparison): > if (exponent_a!=exponent_b) return false > if abs(mantissa_a-mantissa_b)<error_limit return true > else return false You can use frexp() to split numbers into mantissa and exponent (though I forget whether frexp is actually part of the ANSI standard or not). However, the algorithm shown here doesn't work because the two numbers being compared could straddle a power of 2, which would give them different exponents even though they're arbitrarily close together. -- --Andrew Koenig ark@europa.att.com
gdtltr@freezer.it.udel.edu (Gary Duzan) (08/06/90)
All practical applications aside, isn't it philosophically unnatural to
apply an eqivalence comparison to two floating point (presumably real)
numbers? I know there are finite representations in the machine, etc., but
what mathematics I have had would make me think twice about this anyway.
Gary Duzan
Time Lord
Third Regeneration
--
gdtltr@freezer.it.udel.edu
_o_ -------------------------- _o_
[|o o|] If you can square, round, or cube a number, why not sphere it? [|o o|]
|_O_| "Don't listen to me; I never do." -- Doctor Who |_O_|
ark@alice.UUCP (Andrew Koenig) (08/08/90)
In article <26691@nigel.ee.udel.edu>, gdtltr@freezer.it.udel.edu (Gary Duzan) writes: > All practical applications aside, isn't it philosophically unnatural to > apply an eqivalence comparison to two floating point (presumably real) > numbers? Not if you're doing appropriate operations on a machine with IEEE floating point or some other good floating point system. For example, IEEE guarantees that if I read the decimal representation of a number that can be represented exactly in floating point, that's what I'll get, period. It also guarantees exact results for primitive operations including addition, subtraction, multiplication, division, and square root. Suppose, then that I want to do calculations on integers that are too big to fit in `int' values but small enough to be represented exactly in floating point. I see no reason not to do any comparison on floating point numbers that I would do on integers. As a simple example, double d; for (d = 0; d < 1e10; d++) { /* stuff */ } under IEEE floating point can be counted on to execute precisely 1e10 (ten billion) times. Yes, such techniques are not portable to machines with lousy floating point, but it might be that your program wouldn't run there anyway for some reason. As good floating point systems become more widespread, this becomes less of an issue. -- --Andrew Koenig ark@europa.att.com
doug@ozdaltx.UUCP (Doug Matlock) (08/13/90)
In article <11160@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes: > For example, IEEE guarantees that if I read the decimal representation > of a number that can be represented exactly in floating point, > that's what I'll get, period. It also guarantees exact results > for primitive operations including addition, subtraction, multiplication, > division, and square root. What if these primitive operators generates a result that cannot be represented exactly in IEEE floating point notation? Many simple rational (or irrational, since you've included square root) numbers are not exactly expressesed in decimal notation. > As a simple example, > > double d; > for (d = 0; d < 1e10; d++) { > /* stuff */ > } > > under IEEE floating point can be counted on to execute precisely 1e10 > (ten billion) times. The obvious pitfall here is exceeding the precision of floating point. For 8-byte double, I think the turning point is somewhere around 1e15. If I take a floating point number greater than 1e15 (or so) and add 1 to it, I will get the same number back. (This is generally lots of fun when running computer simulations. ;-) ) -- Doug. "If you want Peace, work for Justice."
bright@Data-IO.COM (Walter Bright) (08/16/90)
In article <6855@ozdaltx.UUCP> doug@ozdaltx.UUCP (Doug Matlock) writes: <In article <11160@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes: << For example, IEEE guarantees that if I read the decimal representation << of a number that can be represented exactly in floating point, << that's what I'll get, period. It also guarantees exact results << for primitive operations including addition, subtraction, multiplication, << division, and square root. <What if these primitive operators generates a result that cannot be <represented exactly in IEEE floating point notation? IEEE enables you to specify that an exception is generated if the result is not exact.
ark@alice.UUCP (Andrew Koenig) (08/17/90)
In article <2646@dataio.Data-IO.COM>, bright@Data-IO.COM (Walter Bright) writes: > In article <6855@ozdaltx.UUCP> doug@ozdaltx.UUCP (Doug Matlock) writes: > <In article <11160@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes: > << It also guarantees exact results > << for primitive operations including addition, subtraction, multiplication, > << division, and square root. > <What if these primitive operators generates a result that cannot be > <represented exactly in IEEE floating point notation? > IEEE enables you to specify that an exception is generated if the result > is not exact. Yes. Also, and this is the point I was trying to make before, for the primitive operations I mentioned, IEEE requires that the result of each operation be exactly what would be obtained by doing the operation in infinite precision and then rounding it. In other words, for those operations, all IEEE implementations must give bit-for-bit identical results. -- --Andrew Koenig ark@europa.att.com