riddle@emory.uucp (Larry Riddle) (08/07/88)
The following is a very simple C program compiled and run on a Sun-4
with no special command line options.
main()
{
float x,y;
x = 1.0/10.0;
y = 1677721.0/16777216.0;
printf("x: %x",x);
printf("%20.17f\n",x);
printf("y: %x",y);
printf("%20.17f\n",y);
}
Here is the output:
x: 3fb99999 0.10000000149011612
y: 3fb99999 0.09999996423721313
Notice that x and y, which have been declared as floats, and thus have
a 32 bit representation (according to the manual this obeys IEEE
floating point arithmetic standards), both are printed the same in hex,
but give different values when printed as floats. I believe that the
hex is a straight translation of the internal bit representation. The
division in computing x and y is done in double precision (64 bits) and
then converted to floats.
Can anyone enlighten me on why this output comes out this way?
**********
According to what I have read about the IEEE standard, floats should
have 1 sign bit, a biased exponent of 8 bits, and a 23 bit normalized
mantissa. However, my experiments seem to imply that floats have an 11
bit biased exponent (offset by 1023) and only a 20 bit normalized
mantissa, exactly the same as doubles, except double has a 52 bit
mantissa. For example, the bit pattern 3fb99999 given above for 1/10
corresponds to
exponent mantissa
0 01111111011 10011001100110011001
The 11 bits of this exponent gives 1019-1023 = -4 which coupled with
the mantissa gives the binary number
.0001100110011001100110011001...
which is the (non-terminating) binary representation for 1/10. Notice
also that this 32 bit representation has been chopped, rather than
rounded.
I don't understand this discrepancy either. Any suggestions?
Thanks.
***********
--
Larry Riddle | gatech!emory!riddle USENET
Emory University | riddle@emory CSNET,BITNET
Dept of Math and CS | riddle.emory@csnet-relay ARPANET
Atlanta, Ga 30322 | (404) 727-7922 AT@T
rob@kaa.eng.ohio-state.edu (Rob Carriere) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: > [ C program on Sun-4 gives strange single precision results, to wit, > the results look like mutilated doubles ] I don't *know*, but you might want to look into what your floating point hardware does. Most floating point hardware I know of, only implements IEEE to whatever extent was convenient. Specifically, many of these devices leave results hanging around in the (extended precision) floating registers if at all possible. This means that you may get more precision than you asked for, ie probably your results really are doubles, despite you asking for singles. This would also explain why you see truncated rather than rounded doubles: you are doing the truncation manually when you ask for the hex output. Note that this behaviour, while not IEEE, is ANSI C acceptable. Rob Carriere
jgk@speech2.cs.cmu.edu (Joe Keane) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: >However, my experiments seem to imply that floats have an 11 >bit biased exponent (offset by 1023) and only a 20 bit normalized >mantissa, exactly the same as doubles, except double has a 52 bit >mantissa. (This belongs on comp.lang.c!) The problem is, a float parameter gets converted to double. So you're looking at the first 32 bits of the double representation. To look at the float representation, use this: printf("x: %lx", * (long *) &x); --Joe
braner@batcomputer.tn.cornell.edu (braner) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: > >main() >{ > float x,y; > x = 1.0/10.0; > y = 1677721.0/16777216.0; > printf("x: %x",x); > printf("%20.17f\n",x); > printf("y: %x",y); > printf("%20.17f\n",y); >} > >Here is the output: > >x: 3fb99999 0.10000000149011612 >y: 3fb99999 0.09999996423721313 Seems that the calculation of y is just different enough from 1/10 to cause a difference in the last bit or so of the floats. In the printf() calls the floats are converted to doubles before being passed to the function, and since the doubles have a larger exponent field the difference in the mantissa is out of sight within the (long) int referred to in the "%x". I suggest you try: float x; long *p; ... p = (long *) &x; printf ("representation of x: %lx\n", *p); ...and so on... - Moshe Braner
ok@quintus.uucp (Richard A. O'Keefe) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: >Notice that x and y, which have been declared as floats, and thus have >a 32 bit representation (according to the manual this obeys IEEE >floating point arithmetic standards), both are printed the same in hex, This is >>C<< remember? Floats are 32-bits IN MEMORY, but when you operate on them or pass them to functions they are automatically converted to double. Since you specifically mention the Sun-4, I suggest that you read your copy of the Sun Floating-Point Programmer's Guide. In particular, if you really want to pass 32-bit floating-point numbers in float format, you will need the "-fsingle2" compiler option. (I haven't tried this on a Sun-4, but it works fine on Sun-3s.) The two flags are -fsingle If the operands of a floating-point operation are both 'float', the operation will be done in single precision instead of the normal double precision. -fsingle2 Float arguments are passed to functions as 32 bits, and float results are returned as 32 bits. (Useful for calling Fortran functions.) TRAP: floating-point constants such as 1.0 are DOUBLE precision, so if the compiler sees float x; ... x+1.0, it will do the addition in double precision. In such a case, I do float one = 1.0; ... x+one...
carl@csli.STANFORD.EDU (Carl Schaefer) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: > The following is a very simple C program compiled and run on a Sun-4 > with no special command line options. > > main() > { > float x,y; > x = 1.0/10.0; > y = 1677721.0/16777216.0; > printf("x: %x",x); > printf("%20.17f\n",x); > printf("y: %x",y); > printf("%20.17f\n",y); > } > > Here is the output: > > x: 3fb99999 0.10000000149011612 > y: 3fb99999 0.09999996423721313 > > Notice that x and y, which have been declared as floats, and thus have > a 32 bit representation (according to the manual this obeys IEEE > floating point arithmetic standards), both are printed the same in hex, > but give different values when printed as floats. I believe that the > hex is a straight translation of the internal bit representation. The > division in computing x and y is done in double precision (64 bits) and > then converted to floats. > > Can anyone enlighten me on why this output comes out this way? Floats are converted to doubles when passed as function arguments. The following inherently nonportable code prints the bit patterns of your floats as floats and doubles: #include <stdio.h> void main() { union { double d; float f; int i[2]; } X; float x, y; x = 1.0/10.0; y = 1677721.0/16777216.0; X.f = x; (void) printf("x (float): %8x, %20.17f\n", X.i[0], X.f); X.f = y; (void) printf("y (float): %8x, %20.17f\n", X.i[0], X.f); X.d = x; (void) printf("x (double): %8x/%8x, %20.17f\n", X.i[0], X.i[1], X.d); X.d = y; (void) printf("y (double): %8x/%8x, %20.17f\n", X.i[0], X.i[1], X.d); exit(0); } produces: x (float): 3dcccccd, 0.10000000149011612 y (float): 3dccccc8, 0.09999996423721313 x (double): 3fb99999/a0000000, 0.10000000149011612 y (double): 3fb99999/ 0, 0.09999996423721313 when run on a sun4, sizeof(int) == sizeof(float) == 4, sizeof(double) == 8
leo@philmds.UUCP (Leo de Wit) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: |The following is a very simple C program compiled and run on a Sun-4 |with no special command line options. | |main() |{ | float x,y; | x = 1.0/10.0; | y = 1677721.0/16777216.0; | printf("x: %x",x); | printf("%20.17f\n",x); | printf("y: %x",y); | printf("%20.17f\n",y); |} | |Here is the output: | |x: 3fb99999 0.10000000149011612 |y: 3fb99999 0.09999996423721313 | |Notice that x and y, which have been declared as floats, and thus have |a 32 bit representation (according to the manual this obeys IEEE |floating point arithmetic standards), both are printed the same in hex, |but give different values when printed as floats. I believe that the |hex is a straight translation of the internal bit representation. The |division in computing x and y is done in double precision (64 bits) and |then converted to floats. | |Can anyone enlighten me on why this output comes out this way? I think I can. On a SUN-3 the same results were achieved. The problem is that you do not get the internal representation of x and y, but of x and y converted to double (because that's what is passed to printf). By coincidence the part of the double that is printed of x and y is the same (I tried it also on Ultrix were it was _different_). By using a union that contains a float and a int the following values for x and y were obtained (writing into the float part and reading from the int part): x: 3dcccccd y: 3dccccc8 which shows both that x and y _have_ different binary representations, and that the value is quite different from the original one (3fb99999). All this might also answer the second question in the original article about the IEEE float representation, simply because the wrong values were used. Note that I do not recommend this union kludge; it is however probably the only way to get to the bit patterns; or else, cast x and y to a pointer so that their values will not be converted to double (this is of course not portable because the pointer needs to be the same size as a float). Any more Riddles 8-) ? Leo.
markhall@pyramid.pyramid.com (Mark Hall) (08/08/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: >The following is a very simple C program compiled and run on a Sun-4 >with no special command line options. > >main() >{ > float x,y; > x = 1.0/10.0; > y = 1677721.0/16777216.0; > printf("x: %x",x); > printf("%20.17f\n",x); > printf("y: %x",y); > printf("%20.17f\n",y); >} > >Here is the output: > >x: 3fb99999 0.10000000149011612 >y: 3fb99999 0.09999996423721313 > >Notice that x and y, which have been declared as floats, and thus have >a 32 bit representation. The use of the float as an expression (in this case, as the arguments to printf) is always converted to a double. For example, witness the assembly generated on the pyramid: movw $0x3dcccccd,lr0 movw $0x3dccccc8,lr1 #note the diff here in last 3 bits. cvtfd lr0,tr1 #here's the conversion movw $"x: %x",tr0 call _printf Since exponents in double take up more bits than floats, you may lose any difference in the mantissas with the call to printf. Try putting printf("x: %x %x",x) and comparing the bit patterns for your machine then. On the pyramid, they turn out to be: x: 3fb99999 a0000000 0.10000000149011611 #close to the sun output, eh? y: 3fb99999 00000000 0.09999996423721313 hence the difference when you print out the float. I leave it as an exercise (mostly cuz of the flames for non-portable code) as to how you can print out the bits without converting to floating point. -Mark Hall (smart mailer): markhall@pyramid.pyramid.com (uucp paths ): {amdahl|decwrl|sun|seismo|lll-lcc}!pyramid!markhall
turk@Apple.COM (Ken "Turk" Turkowski) (08/11/88)
In article <3117@emory.uucp> riddle@emory.uucp (Larry Riddle) writes: > float x,y; > x = 1.0/10.0; > y = 1677721.0/16777216.0; > printf("x: %x",x); > printf("%20.17f\n",x); > printf("y: %x",y); > printf("%20.17f\n",y); >Here is the output: > >x: 3fb99999 0.10000000149011612 >y: 3fb99999 0.09999996423721313 > >Notice that x and y, which have been declared as floats, and thus have >a 32 bit representation (according to the manual this obeys IEEE >floating point arithmetic standards), both are printed the same in hex, >but give different values when printed as floats. I believe that the >hex is a straight translation of the internal bit representation. The >division in computing x and y is done in double precision (64 bits) and >then converted to floats. The problem is that floats are converted to doubles (or extendeds on the macintosh). A double has an 11-bit exponent, whereas a float only has 8. If you print out the next word, you'll see that the two hex representations differ somewhere in the next 3 bits. Ken Turkowski @ Apple Computer, Inc., Cupertino, CA UUCP: {mtxinu,sun,decwrl,nsc,voder}!apple!turk CSNET: turk@Apple.CSNET ARPA: turk%Apple@csnet-relay.ARPA Domain-based address: turk@apple.com
jfh@rpp386.UUCP (John F. Haugh II) (08/11/88)
In article <15370@apple.Apple.COM> turk@apple.com.UUCP (Ken "Turk" Turkowski) writes: >The problem is that floats are converted to doubles (or extendeds on the >macintosh). A double has an 11-bit exponent, whereas a float only has 8. >If you print out the next word, you'll see that the two hex representations >differ somewhere in the next 3 bits. correct. obviously there is a difference. here is how i faked out the conversion rules using a union: Script is typescript, started Thu Aug 11 10:27:39 1988 Subscript out of range. 1 - rpp386-> cat puzzle.c main () { float x,y; union { long ul; float uf; } ux, uy; ux.uf = x = 1.0/10.0; uy.uf = y = 1677721.0/16777216.0; printf("x: %08x:%08x, ux.ul: %08x,",ux.uf,ux.ul); printf("%20.17f\n",x); printf("y: %08x:%08x, uy.ul: %08x,",uy.uf,uy.ul); printf("%20.17f\n",y); } 2 - rpp386-> cc puzzle.c puzzle.c 3 - rpp386-> a.out x: a0000000:3fb99999, ux.ul: 3dcccccd, 0.10000000149011612 y: 00000000:3fb99999, uy.ul: 3dccccc8, 0.09999996423721313 4 - rpp386-> logout Not a terminal: Not a character device John's Words of Wisdumb - A great many people think they are thinking when they are merely rearranging their prejudices. -- William James Script done Thu Aug 11 10:27:59 1988 the output shows the float, which was passed as a long for the ux.ul and uy.ul outputs, are different in 32 bits. -- John F. Haugh II +--------- Cute Chocolate Quote --------- HASA, "S" Division | "USENET should not be confused with UUCP: killer!rpp386!jfh | something that matters, like CHOCOLATE" DOMAIN: jfh@rpp386.uucp | -- apologizes to Dennis O'Connor