geoff@desint.UUCP (Geoff Kuenning) (07/23/86)
In article <125@sol.UUCP> chris@sol.UUCP (Christopher Caldwell) writes: > Here are the routines that I talked about in net.lang.c. >... > echo "Extracting format.c (13553 characters)." >... Sigh. My soapbox is getting awfully rickety, and with every passing year it becomes more difficult for my feeble limbs to make the climb onto it. Nevertheless, I feel morally obligated to warn people: DON'T USE THIS SOFTWARE TO CONVERT FLOATING POINT IF YOU WANT ACCURATE RESULTS! In fact, I wouldn't trust it for anything much beyond 5 significant digits with floats, 10 with doubles. I've said it before and I guess it needs saying again: numerical accuracy is a *very* complicated subject. There are people at Livermore who spend 100% of their time pursuing the problem. One of the worst problems is input/output conversion. Here are a few of the things the posted routine gets wrong: (1) [by far the worst] It makes no attempt to round. Thus, the sequence x = 0.99999999999999; format ("{f.1}", x); will produce "0.9" as the output, fully 10% off from the correctly-rounded answer of "1.0". (2) By using division to recalculate "pnum" in the primary loop, errors are introduced with large exponents. For example, 10**38 (the maximum range of IEEE floating point numbers) has, by definition, 38 significant digits. In binary, those translate to well over 100 bits, way past the width of even a double on all but a Cray (which has a much larger floating range). So the loops for (i = 0, x = 1.0; i < 38; i++, x *= 10.0) ; /* null body */ while (--i >= 0) x /= 10.0; will not produce 1.0 in 'x'. (3) Possibly worse than (2) is the fact that, after the digit developed has been placed in the buffer, it is removed from the original number with subtraction, after having the non-identity transformation of (2) applied to it. This may be acceptable (I'm not a numerical expert), since the subtraction takes place in the most-significant bits of the number being converted. I just learned a long time ago that addition and subtraction are the source of the worst errors in floating point, and they should be avoided whenever possible. It makes me very nervous to see subtraction in this loop. So how does one write an accurate output-conversion routine? Well, you should start by asking somebody else; it's not my field. I think the best routines might actually separate the mantissa and exponent. One thing I'd certainly do is get the powers of 10 out of a table, rather than calculating them. Not only is it much faster (and formatted output can take a *lot* of a program's execution time) but it isn't susceptible to accumulative errors. (Oh, and by the way, you should work out any table values by hand and initialize them in octal or hex. It's worth a big time investment to get accurate values there. And you can't trust a compiler writer to have accurate input-conversion routines; compiler people aren't usually numerical mathematicians either.) Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff For those who care, the relevant code from the posted routine follows. > X case 'f': if( pf_double < 0 ) > X then > X { > X buf[bufcnt++] = '-'; > X pf_double = -pf_double; > X } > X ind = 0; > X for(pnum=1.0; pnum<=pf_double; pnum*=pf_base) > X ind--; > X pnum /= pf_base; > X do { > X if( ind++ == 0 ) then buf[bufcnt++]='.'; > X c = (int)(pf_double/pnum); > X buf[bufcnt++] = bintodig( c ); > X pf_double -= (pnum*c); > X pnum /= pf_base; > X } while( ind < pf_dec ); > X cp = buf; > X break; -- Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff
rgenter@BBN-LABS-B.ARPA (Rick Genter) (07/25/86)
With respect to 10**38 being inaccurate: Geoff is correct. Since 10 is not a power of two, to represent 10**38 exactly *does* require many many bits of mantissa (89 to be exact). *However*, on the VAX, thanks to hardware magic like guard digits, a loop of 38 "* 10.0"s *does* yield the exact representation of 10**38 within the constraints of a (double). A subsequent loop of 38 "/ 10.0"s even gives you 1.0. -------- Rick Genter BBN Laboratories Inc. (617) 497-3848 10 Moulton St. 6/512 rgenter@labs-b.bbn.COM (Internet new) Cambridge, MA 02238 rgenter@bbn-labs-b.ARPA (Internet old) linus!rgenter%BBN-LABS-B.ARPA (UUCP)
rbj@icst-cmr.ARPA (Root Boy Jim) (07/25/86)
With respect to 10**38 being inaccurate: Geoff is correct. Since 10 is not a power of two, to represent 10**38 exactly *does* require many many bits of mantissa (89 to be exact). *However*, on the VAX, thanks to hardware magic like guard digits, a loop of 38 "* 10.0"s *does* yield the exact representation of 10**38 within the constraints of a (double). A subsequent loop of 38 "/ 10.0"s even gives you 1.0. Perhaps I should have made myself clear. IEEE format does not require that every integer in it's range be represented exactly. I do agree that 10**38 requires something like 89 bits, but 10**38 - 1 requires more, something on the order of 126 bits. Even with Cray 128 bit doubles, some of the bits have to go to the exponent, so it can't represent all the integers in its range either. Rick Genter BBN Laboratories Inc. (617) 497-3848 10 Moulton St. 6/512 rgenter@labs-b.bbn.COM Cambridge, MA 02238 rgenter@bbn-labs-b.ARPA linus!rgenter%BBN-LABS-B.ARPA (UUCP) (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> A can of ASPARAGUS, 73 pigeons, some LIVE ammo, and a FROZEN DAQUIRI!! ... ... cannot represent it either.