steve@cohort.UUCP (Steve Bird) (02/28/90)
Forwarded message: >From steve Fri Nov 24 21:46:23 1989 Subject: C code To: evan@telly (Evan Leibovitch) Date: Fri, 24 Nov 89 21:46:23 EST From: Steve Bird <steve@cohort.UUCP> X-Mailer: ELM [version 2.2 PL7] Here's another interesting piece. /* floaterr.c--demonstrates round-off error */ # include <stdio.h> main() { float a,b; b = 2.0e20 + 1.0; a = b - 2.0e20; printf("%f \n",a); } When compiled the program returns the number 4008175468544.000000 . Now when the program is modified to read : /* floaterr.c--demonstrates round-off error */ # include <stdio.h> main() { float a,b; b = 2.0e20 + 1.0; a = 2.0e20; printf("%f \n",b - a); } The program returns 0.000000 . Why ?
hugh (D. Hugh Redelmeier) (03/01/90)
| From: Steve Bird <steve@cohort.UUCP> | | Here's another interesting piece. | | /* floaterr.c--demonstrates round-off error */ | # include <stdio.h> | main() | { | float a,b; | b = 2.0e20 + 1.0; | a = b - 2.0e20; | printf("%f \n",a); | } | | When compiled the program returns the number 4008175468544.000000 . | Now when the program is modified to read : | | /* floaterr.c--demonstrates round-off error */ | # include <stdio.h> | main() | { | float a,b; | b = 2.0e20 + 1.0; | a = 2.0e20; | printf("%f \n",b - a); | } | | The program returns 0.000000 . Why ? In one case, you are evaluating ((double) (float) 2.0e20) - 2.0e20 In the other case, you are evaluating ((double) (float) 2.0e20) - ((double) (float) 2.0e20) The former represents the roundoff in representing 2.0e20 in single precision. The latter ought to be zero. Here is a simpler version of your programs. Note that the "+ 1.0" has no effect. /* floaterr.c--demonstrates round-off error */ # include <stdio.h> int main() { printf("%f\n",((double) (float) 2.0e20) - 2.0e20); /*4008175468544.000000*/ printf("%f\n", ((double) (float) 2.0e20) - ((double) (float) 2.0e20)); /*0.000000*/ return 0; } For questions like this, it is useful to know the compiler and machine. It appears to use IEEE f.p. so I was able the reproduce it using RedC on my Sun 3. Hugh Redelmeier {utcsri, yunexus, uunet!attcan, utzoo, hcr}!redvax!hugh When all else fails: hugh@csri.toronto.edu +1 416 482-8253
clewis@eci386.uucp (Chris Lewis) (03/02/90)
In article <9002280838.AA01415@cohort.uucp> Steve Bird <steve@cohort.UUCP> writes: | float a,b; | b = 2.0e20 + 1.0; | a = b - 2.0e20; | printf("%f \n",a); | | When compiled the program returns the number 4008175468544.000000 . | Now when the program is modified to read : | | float a,b; | b = 2.0e20 + 1.0; | a = 2.0e20; | printf("%f \n",b - a); | | The program returns 0.000000 . Why ? Actually, it's truncation error ;-) The numbers printed in the above cases frequently depend upon the precise C compiler and processor you're running on. The explanation I give is "traditional" C, according to K&R (ANSI C provides for different behaviour under certain circumstances): When a float is passed to a function or used in an expression, the operand is first coerced to a double. Eg: the subtraction in the first fragment has both arguments coerced to double, and then the result is forced into a float. Since floats are usually half the size of doubles, you lose digits off the least significant end. In the second fragment, the subtraction is also done with both as doubles, but since it is being used as a function argument, it is not truncated into a float, and it's passed as a double to printf's %f handler. There are other factors coming into play - depending on your machine, 2e20 + 1 may actually *equal* 2e20, depending upon how many digits of precision the variable that the result is stored in has. (Which is what I suspect that your second example is trying to tell you) Frankly, given that "traditional" C does all floating point operations and argument passing in doubles, I almost never use a float to store the result of an FP operation, and only use floats in large arrays. If you use floats for the results of FP operations, the algorithm should be well understood as to the magnitudes of the operands used. This sort of thing is still possible with doubles, but you can get away with more. If you make everything double, chances are it'll be faster (less coercing required), and only use significant amounts of space in large arrays. -- Chris Lewis, Elegant Communications Inc, {uunet!attcan,utzoo}!lsuc!eci386!clewis Ferret mailing list: eci386!ferret-list, psroff mailing list: eci386!psroff-list
henry@utzoo.uucp (Henry Spencer) (03/02/90)
In article <9002280838.AA01415@cohort.uucp> Steve Bird <steve@cohort.UUCP> writes: > float a,b; > b = 2.0e20 + 1.0; > a = b - 2.0e20; > printf("%f \n",a); > When compiled the program returns the number 4008175468544.000000 . > > a = 2.0e20; > printf("%f \n",b - a); > The program returns 0.000000 . Why ? Single-precision floating-point, aka "float", typically has about 24 bits of precision. Numbers circa 2e20, represented in circa 24 bits, have roundoff error circa 1e13, so 4e12 and 0 probably differ by 1 in the least- significant bit. The arithmetic in these two programs is not quite the same, because all C floating-point arithmetic is done in "double", and you've got an extra double->float->double conversion between subtraction and printing in the first case. Somehow, the code your compiler is generating is ending up introducing different roundoff behavior. Asking for a detailed explanation of a least-significant-bit difference is pointless unless you're willing to examine compiler, generated code, and hardware very closely; too many things can cause such differences. "Floating point is an analog box anyway." -Hugh Redelmeier. (Translation, for the non-hardware types in the crowd: "if there's only one or two bits wrong at the least-significant end, you got what you paid for".) -- MSDOS, abbrev: Maybe SomeDay | Henry Spencer at U of Toronto Zoology an Operating System. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
lamy@cs.utoronto.ca (Jean-Francois Lamy) (03/02/90)
It is interesting to note that one gets the same results on those two pieces of code when running them on Sun 4s (SunOS 4.0.3), Iris 4d (Irix 3.2 with the old MIPS compiler) and under gcc on the Sun 4s. Could compliance to IEEE rules cause all 3 versions of the floating point code to behave essentially the same given there is little room for optimizations? Possibly fishing in the wrong pond, Jean-Francois Lamy lamy@cs.utoronto.ca, uunet!cs.utoronto.ca!lamy Department of Computer Science, University of Toronto, Canada M5S 1A4