[comp.lang.c] Inherent imprecision of floating point variables

garzione@convex.com (Michael Garzione) (06/26/90)

rond@pro-grouch.cts.com (Ron Dippold) writes:

>In-Reply-To: message from pariyana@hawk.ulowell.edu

>> I have a question regarding floating point variables.  I have a
>> program that requires exact precision in all computations.
>> Floating point variables have given me trouble.  For example, when
>> I enter 32.14 into a floating point variable, then print it out, it
>> prints 32.139998.  This problem is magnified in all computations.
> 
>Check out the types that your compiler supports.  Many support a "double
>float" or something similar with much more precision.  Some also support IEEE
>floating-point numbers, which use 80 bits.


>UUCP: crash!pro-grouch!rond
>ARPA: crash!pro-grouch!rond@nosc.mil
>INET: rond@pro-grouch.cts.com

You will never get "exact" precision with any floating point computation
if any of the numbers used cannot be represented in base 2 exactly (assuming
the exponent in the floating point number is represented as a power of 2 and
the mantissa is also in base 2 and of finite length)

The error you see above has three sources.  
(1) 32.14 (base 10) is not an exact power of 2, so does not have an exact
    representation as a floating point number
(2) If you say something like "float i = 32.14" you get another potential
    error (in this case it's just a result of #1) as the input routines
    convert a base 10 _printed_ representation of a number to its binary
    equivalent (base 2 mantissa and exponent)
(3) Now that you've got an approximation of 32.14, printing it out gets
    still another potential error converting binary representations to
    base 10 to print.

(From Knuth, Vol II, p. 200 : "Note:  Since floating point arithmetic is
inherently approximate...")

If you _really_ need *exact* precision you may want to consider using
a representation where non-integral numbers are represented as the ratio
of two integral numbers and use multiple precision arithmetic on them.
You'll be guaranteed of exact precision in all calculations, but it's
probably not worth the hassle.

Hope this helps.

Mike
Convex Computer Corporation                            
{uunet,sun}!convex!garzione
garzione@convex.com

dik@cwi.nl (Dik T. Winter) (06/27/90)

In article <44436@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
(Replying to articles dealing with exact arithmetic using floating point)
 > The problem has nothing to do with C or the precision of the machine's
 > floating point unit.  It has to do with the precision of the ASCII to float
 > and float to ASCII conversion routines.
The first sentence is correct; the second is bogus.  As has been noticed
before, it has to do with the impossiblility to represent all (decimal)
numbers on machines that have a non-decimal representation for fp numbers.

 >                                          This problem was addressed at the
 > SIGPLAN '90 conference on Program Language Design and Implementation.
I did not read the papers from this conference, but if they address
anything, it is not this problem.  What they might have addressed is the
problem how to write and read numbers such that a number once written and
read back will have exactly the same internal representation (note the order:
first write, next read back in).  I doubt this however, because this problem
has already been solved some 20 years ago (Algol 68 required it).  There might
have been improved algorithms.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dylan@ibmpcug.co.uk (Matthew Farwell) (06/27/90)

In article <44436@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
>The problem has nothing to do with C or the precision of the machine's
>floating point unit.  It has to do with the precision of the ASCII to float
>and float to ASCII conversion routines.  This problem was addressed at the
>SIGPLAN '90 conference on Program Language Design and Implementation.  Two
>papers appear in the proceedings:

Consider the code segment:-

main()
{
	float f;

	f = 0.0;
	while (1) {
		if (f == 10.0) break;
		printf("%f\n", f);
		f += 0.1;
	}
	printf("Stopped\n");
}

If its all to do with conversion routines, why doesn't this stop when f
reaches 10?

Dylan.
-- 
Matthew J Farwell                 | Email: dylan@ibmpcug.co.uk
The IBM PC User Group, PO Box 360,|        dylan%ibmpcug.CO.UK@ukc
Harrow HA1 4LQ England            |        ...!uunet!ukc!ibmpcug.co.uk!dylan
Phone: +44 81-863-1191            | Sun? Don't they make coffee machines?

kaleb@mars.jpl.nasa.gov (Kaleb Keithley) (06/28/90)

In article <b3f.2688bfce@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
-In article <44436@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
-
-Consider the code segment:-
-
-main()
-{
-	float f;
-
-	f = 0.0;
-	while (1) {
-		if (f == 10.0) break;
-		printf("%f\n", f);
-		f += 0.1;
-	}
-	printf("Stopped\n");
-}
-
-If its all to do with conversion routines, why doesn't this stop when f
-reaches 10?
-

Because (10.0 - 0.1) + 0.1 will never be exactly equal to 10.0.


kaleb@thyme.jpl.nasa.gov            Jet Propeller Labs
Kaleb Keithley

"So that's what an invisible barrier looks like"

writer@me.utoronto.ca (Tim Writer) (06/28/90)

In article <b3f.2688bfce@ibmpcug.co.uk> dylan@ibmpcug.co.uk (Matthew Farwell) writes:

>main()
>{
>	float f;

>	f = 0.0;
>	while (1) {
>		if (f == 10.0) break;
>		printf("%f\n", f);
>		f += 0.1;
>	}
>	printf("Stopped\n");
>}

>If its all to do with conversion routines ...

As has been mentioned in several earlier postings, it is *not* all to do with
conversion routines.  It *is* to do with the fact that no floating point
system can represent any real number exactly.  Your documentation will tell
you that a float can represent real numbers within in a specific interval.
However, since we know there are an infinite number of reals in any given
interval, we can deduce that your floating point system can only represent
a relative few of these numbers exactly.  Thus, when you write `0.1' and
`10.0' in your program, your computer may not store exactly `0.1' or `10.0'.
Then, when you continuously add `0.1', you are not actually adding `0.1' but
some value close to `0.1'.  The result is you never get `10.0' because you
have accumulated some error in `f'.

Try this program.

main()
{
        float  f,
               stop;

        f = 0.0;
        stop = 10.0;

        while (1) {
                if (f == stop)
                break;
		f += 0.1;
		(void) printf("f=%.20f\tstop=%.20f\n",f,stop);
	}
}

Now try this program.

main()
{
        float   eps,
		tst;

	eps=1.0;

	do {
		eps=0.5*eps;
		tst=eps+1.0;
	} while (tst>1.0);

	(void) printf("%.10e\n",eps);
}

This gives an approximation of machine epsilon which differs from its real
value by at most a factor of 2.  Machine epsilon is defined to be the
smallest number which when added to 1 produces a result which is greater than
1.  It is a measure of the relative error in the floating point
representation of real numbers.  On my system, this program gives a value
of roughly 5.9e-8.  In other words, I can represent floats with about 8,
yes *only* 8 digits accuracy.  Or, to rephrase, any digits following the
eighth digit of the mantissa are garbage.

Tim

ark@alice.UUCP (Andrew Koenig) (06/28/90)

In article <4186@jato.Jpl.Nasa.Gov>, kaleb@mars.jpl.nasa.gov (Kaleb Keithley) writes:

> Because (10.0 - 0.1) + 0.1 will never be exactly equal to 10.0.

Beg pardon?  I agree that it won't always be exactly 10.0
but `never' is a bit strong.
-- 
				--Andrew Koenig
				  ark@europa.att.com

dylan@ibmpcug.co.uk (Matthew Farwell) (06/28/90)

In article <4186@jato.Jpl.Nasa.Gov> kaleb@mars.UUCP (Kaleb Keithley) writes:
>In article <b3f.2688bfce@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
>-[code segment deleted]
>-
>-If its all to do with conversion routines, why doesn't this stop when f
>-reaches 10?
>Because (10.0 - 0.1) + 0.1 will never be exactly equal to 10.0.

This is exactly the point I was making. The person who I followed up to
was saying it was to do with ascii->float conversion routines. I was
making the point that it wasn't anything to do with them.

Dylan.
-- 
Matthew J Farwell                 | Email: dylan@ibmpcug.co.uk
The IBM PC User Group, PO Box 360,|        dylan%ibmpcug.CO.UK@ukc
Harrow HA1 4LQ England            |        ...!uunet!ukc!ibmpcug.co.uk!dylan
Phone: +44 81-863-1191            | Sun? Don't they make coffee machines?

karl@haddock.ima.isc.com (Karl Heuer) (06/29/90)

In article <4169@jato.Jpl.Nasa.Gov> kaleb@mars.UUCP (Kaleb Keithley) writes:
>I once wrote a rational number package...  All numbers are held in
>numerator/denominator form in a struct.

Certain operations, addition in particular, will quickly cause the denominator
to exceed the range of an integral type.  To avoid this, you have to combine
this with an expandable-int package.  And you still have a problem if you
apply a math function: sqrt(2) can only be approximated, not represented
exactly, by this mechanism.

Several people have posted to say that the exact-representation problem is
"unsolvable" because of this.  This is not necessarily true.  You could use
continued fraction representation, for instance.  sqrt(2) still has no finite
representation, but by using lazy evaluation you don't *need* to have the
whole thing in hand at once: it suffices to have a generator object that will
emit the next term on request.

There's a good discussion of this in HAKMEM, item #101.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

tony@oha.UUCP (Tony Olekshy) (06/29/90)

In <b3f.2688bfce@ibmpcug.co.uk>, dylan@ibmpcug.co.uk (Matthew Farwell) writes:
>
> > [question about]	while (f != 10.0) f += 0.1;
>
> If its all to do with conversion routines, why doesn't this stop when f
> reaches 10?
	[which was a rhetorical question, to the poster he had quoted]

I add...

It does not stop at (f == 10.0) simply because no-one said f = 10.0;
The converse (why did it stop, if it had) is more complex.
Next question.

--
Yours etc., Tony Olekshy.               Internet: tony%oha@CS.UAlberta.CA
					  BITNET: tony%oha@UALTAMTS.BITNET
					    uucp: alberta!oha!tony

jlg@lambda.UUCP (Jim Giles) (06/30/90)

From article <b3f.2688bfce@ibmpcug.co.uk>, by dylan@ibmpcug.co.uk (Matthew Farwell):
> [...compressed example ...]
> main() {
> 	float f; f = 0.0;
> 	while (1) {
> 		if (f == 10.0) break;
> 		printf("%f\n", f);
> 		f += 0.1;}
> 	printf("Stopped\n");}
> 
> If its all to do with conversion routines, why doesn't this stop when f
> reaches 10?

It doesn't (or shouldn't) have anything to do with the conversion routines.
The job of the conversion routines is to convert the decimal into the internal
representations and vice-versa.  They _should_ do this job as accurately as
possible - if the number is exacly representable in both bases you have a
right to expect exact conversion.

The above routine fails (on some machines) to terminate because 0.1(decimal)
is not exactly representable in base 2 (or many other bases for that matter).
The value of the increment is therefore approximated.  Depending upon the
precision and rounding mode of the machine in question, the above program
may or may not terminate.  This fault exists even if the conversion was
carried out to the nearest possible internal representation.

The value of 0.1(decimal) is 0.000110011001100110011...(binary).  This is
a repeating fraction, where the group ^^^^ are the repeated digits.  No
matter what precision your machine carries, it can't represent the whole
thing (not as a floating-point number anyway).  A hunderd repeated adds
with the approximate value _can't_ really equal 10.0 - though it might
coincidentally match if the precision and rounding modes exactlty cancel
each other out.

Solutions to this problem are to introduce new data types (like fixed-point
or rational) or to use a base 10 machine.  Both these solutions carry high
price-tags (in terms of speed or hardware or both).

J. Giles

barmar@think.com (Barry Margolin) (07/06/90)

In article <4186@jato.Jpl.Nasa.Gov> kaleb@mars.UUCP (Kaleb Keithley) writes:
>In article <b3f.2688bfce@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
[A for-loop iterating by 0.1]
>-If its all to do with conversion routines, why doesn't this stop when f
>-reaches 10?

>Because (10.0 - 0.1) + 0.1 will never be exactly equal to 10.0.

Well, it does on my IEEE-compliant Symbolics Lisp Machine.  But other
floating point formats may not have this property.  And there are other
floating point computations that don't obey normal arithmetic axioms; for
instance, 

	10.0 * .1 == 1.0
	.1+.1+.1+.1+.1+.1+.1+.1+.1+.1 == 1.0000001

This latter inequality is the reason the loop fails to terminate.  The
problem is that 1/10 is a repeating fraction in binary, so the internal
representation of .1 can't be exactly right.  The multiplication happens to
round in such a way that the error is cancelled out, but the repeated
additions accumulate the errors because there's a rounding step after each
addition step.  In fact, here's a smaller case that demonstrates this:

	.1 + .1 == .2
	.5 + .1 == .6
	.5 + (.1 + .1) == .7
	(.5 + .1) + .1 == .70000005
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

steve@groucho.ucar.edu (Steve Emmerson) (07/08/90)

In <14429@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

>...
>The job of the conversion routines is to convert the decimal into the internal
>representations and vice-versa.  They _should_ do this job as accurately as
>possible - if the number is exacly representable in both bases you have a
>right to expect exact conversion.

A right?

Granted by whom?

Steve Emmerson        steve@unidata.ucar.edu        ...!ncar!unidata!steve

ark@alice.UUCP (Andrew Koenig) (07/08/90)

In article <7906@ncar.ucar.edu>, steve@groucho.ucar.edu (Steve Emmerson) writes:
> In <14429@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> 
> >...
> >The job of the conversion routines is to convert the decimal into the internal
> >representations and vice-versa.  They _should_ do this job as accurately as
> >possible - if the number is exacly representable in both bases you have a
> >right to expect exact conversion.

> A right?

> Granted by whom?

Granted by the IEEE floating-point standard, for one thing.
If I am using a system whose vendor claims that it supports
IEEE floating point, then I can expect that

	input conversion on a floating point number with an
	exact representation will be exact;

	input conversion at compile time will give precisely
	the same result as input conversion at run time;

	if I write out any number at all with enough significant
	digits, I can read back the result and get exactly
	the same result;

and a whole bunch of other useful properties.

Interestingly, this does not require that conversion be as accurate
as possible, and one can reasonably argue that conversions should
generally *not* be as accurate as possible, because it's quite expensive
to get it exactly right compared with what it costs to get it
*almost* right.  In particular, I do not know how to write an
exact input conversion routine without doing unbounded precision
arithmetic.

For that reason, the IEEE standard does *not* require conversion
to be as accurate as possible -- it allows an error of up to 0.47
times the least significant bit of the number being converted,
in addition to the inevitable error implied by rounding.  The
number 0.47 is carefully chosen to guarantee all the properties
I mentioned above; the IEEE standard refers the reader to Jerome
Coonen's PhD thesis for further detail.
-- 
				--Andrew Koenig
				  ark@europa.att.com

steve@groucho.ucar.edu (Steve Emmerson) (07/09/90)

In <11035@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:

>Granted by the IEEE floating-point standard, for one thing.
>If I am using a system whose vendor claims that it supports
>IEEE floating point, then I can expect that

>	input conversion on a floating point number with an
>	exact representation will be exact;

An earlier discussion of the IEEE standard indicated that it allows an
exactly-representable value to be off by by one bit.

Is this, then, incorrect?

Steve Emmerson        steve@unidata.ucar.edu        ...!ncar!unidata!steve

oz@yunexus.yorku.ca (Ozan Yigit) (07/11/90)

In article <11035@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:

>Interestingly, this does not require that conversion be as accurate
>as possible, and one can reasonably argue that conversions should
>generally *not* be as accurate as possible, because it's quite expensive
>to get it exactly right compared with what it costs to get it
>*almost* right.  In particular, I do not know how to write an
>exact input conversion routine without doing unbounded precision
>arithmetic.

Two excellent papers on this topic that may be of interest can be found in
"Proceedings of the ACM '90 Conference on Programming Language Design and
Implementation" [SIGPLAN NOTICES, v25, #6, June 1990].

Will Clinger: "How to Read Floating Point Numbers Accurately" 

Guy Steel (Jr) and John White: "How to Print Floating Point Numbers
Accurately" 

I suspect the essence of their work will be reflected in various Common Lisp
and Scheme implementations, and their respective standards.

oz
---
Sometimes when you fill a vacuum, it still sucks. | oz@nexus.yorku.ca
				  Dennis Ritchie  | or (416) 736 5257

loren@tristan.llnl.gov (Loren Petrich) (07/14/90)

In article <b3f.2688bfce@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
>In article <44436@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
>Consider the code segment:-
>
>main()
>{
>	float f;
>
>	f = 0.0;
>	while (1) {
>		if (f == 10.0) break;
>		printf("%f\n", f);
>		f += 0.1;
>	}
>	printf("Stopped\n");
>}
>
>If its all to do with conversion routines, why doesn't this stop when f
>reaches 10?

	That is because 0.1 is not represented as a finite-length
number in binary. In binary form it is:

	0.0011001100110011...

	with an infinitely repeating sequence of digits.

	Check it out by multiplying it by 10 (binary: 1010).

	Since a computer will only use finite-length numbers, it will
cut off this infinite-length representation, and get a number that is
not quite 0.1. But no integer multiple of that can equal 10.0, so the
program gets stuck in an infinite loop. The moral: avoid this sort of
test. You won't get exact agreement.

						        ^    
Loren Petrich, the Master Blaster		     \  ^  /
	loren@sunlight.llnl.gov			      \ ^ /
One may need to route through any of:		       \^/
						<<<<<<<<+>>>>>>>>
	lll-lcc.llnl.gov			       /v\
	lll-crg.llnl.gov			      / v \
	star.stanford.edu			     /  v  \
						        v    
For example, use:
loren%sunlight.llnl.gov@star.stanford.edu

My sister is a Communist for Reagan

ark@alice.UUCP (Andrew Koenig) (07/15/90)

In article <7913@ncar.ucar.edu>, steve@groucho.ucar.edu (Steve Emmerson) writes:
> In <11035@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:

> >Granted by the IEEE floating-point standard, for one thing.
> >If I am using a system whose vendor claims that it supports
> >IEEE floating point, then I can expect that

> >	input conversion on a floating point number with an
> >	exact representation will be exact;

> An earlier discussion of the IEEE standard indicated that it allows an
> exactly-representable value to be off by by one bit.

> Is this, then, incorrect?

It depends on what you mean by `off by one bit.'

Every decimal floating-point literal (such as 0.3 or 2.7e-28) is the exact
representation of some rational number.  When trying to fit that number
into an IEEE floating-point representation, some accuracy may have to be
lost by rounding.  The IEEE standard allows that rounding error, of course,
and 0.47 times the value of the low-order bit in additional error.
Among other things, the number 0.47 implies that

	some numbers will have the last bit wrong when converted, but

	if the input exactly represents an IEEE floating-point value,
	then that is the value you will get.  Moreover,

	if you write an IEEE value out with enough significant digits
	and read it back in again, the accumulated error will never be
	enough to change the value.
-- 
				--Andrew Koenig
				  ark@europa.att.com