[net.lang.c] format

geoff@desint.UUCP (Geoff Kuenning) (07/23/86)

In article <125@sol.UUCP> chris@sol.UUCP (Christopher Caldwell) writes:

> Here are the routines that I talked about in net.lang.c.
>...
> echo "Extracting format.c (13553 characters)."
>...

Sigh.  My soapbox is getting awfully rickety, and with every passing year
it becomes more difficult for my feeble limbs to make the climb onto it.
Nevertheless, I feel morally obligated to warn people:

DON'T USE THIS SOFTWARE TO CONVERT FLOATING POINT IF YOU WANT ACCURATE RESULTS!
In fact, I wouldn't trust it for anything much beyond 5 significant digits
with floats, 10 with doubles.

I've said it before and I guess it needs saying again:  numerical accuracy
is a *very* complicated subject.  There are people at Livermore who spend
100% of their time pursuing the problem.  One of the worst problems is
input/output conversion.  Here are a few of the things the posted routine
gets wrong:

    (1) [by far the worst] It makes no attempt to round.  Thus, the sequence

	    x = 0.99999999999999; format ("{f.1}", x);

	will produce "0.9" as the output, fully 10% off from the
	correctly-rounded answer of "1.0".
    (2) By using division to recalculate "pnum" in the primary loop, errors
	are introduced with large exponents.  For example, 10**38 (the
	maximum range of IEEE floating point numbers) has, by definition,
	38 significant digits.  In binary, those translate to well over
	100 bits, way past the width of even a double on all but a Cray
	(which has a much larger floating range).  So the loops

	    for (i = 0, x = 1.0;  i < 38;  i++, x *= 10.0) ; /* null body */
	    while (--i >= 0) x /= 10.0;

	will not produce 1.0 in 'x'.
    (3) Possibly worse than (2) is the fact that, after the digit developed has
	been placed in the buffer, it is removed from the original number
	with subtraction, after having the non-identity transformation
	of (2) applied to it.  This may be acceptable (I'm not a numerical
	expert), since the subtraction takes place in the most-significant
	bits of the number being converted.  I just learned a long time
	ago that addition and subtraction are the source of the worst
	errors in floating point, and they should be avoided whenever
	possible.  It makes me very nervous to see subtraction in this loop.

So how does one write an accurate output-conversion routine?  Well, you
should start by asking somebody else;  it's not my field.  I think the
best routines might actually separate the mantissa and exponent.  One thing
I'd certainly do is get the powers of 10 out of a table, rather than
calculating them.  Not only is it much faster (and formatted output can
take a *lot* of a program's execution time) but it isn't susceptible
to accumulative errors.  (Oh, and by the way, you should work out any
table values by hand and initialize them in octal or hex.  It's worth a
big time investment to get accurate values there.  And you can't trust
a compiler writer to have accurate input-conversion routines;  compiler
people aren't usually numerical mathematicians either.)

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

For those who care, the relevant code from the posted routine follows.

> X		case 'f':	if( pf_double < 0 )
> X				  then
> X				    {
> X				    buf[bufcnt++] = '-';
> X				    pf_double = -pf_double;
> X				    }
> X				ind = 0;
> X				for(pnum=1.0; pnum<=pf_double; pnum*=pf_base)
> X				    ind--;
> X				pnum /= pf_base;
> X				do  {
> X				    if( ind++ == 0 ) then buf[bufcnt++]='.';
> X				    c = (int)(pf_double/pnum);
> X				    buf[bufcnt++] = bintodig( c );
> X				    pf_double -= (pnum*c);
> X				    pnum /= pf_base;
> X				    } while( ind < pf_dec );
> X				cp = buf;
> X				break;
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

rgenter@BBN-LABS-B.ARPA (Rick Genter) (07/25/86)

     With respect to 10**38 being inaccurate:

     Geoff is correct.  Since 10 is not a power of two, to represent 10**38
exactly *does* require many many bits of mantissa (89 to be exact).
*However*, on the VAX, thanks to hardware magic like guard digits, a loop of
38 "* 10.0"s *does* yield the exact representation of 10**38 within the
constraints of a (double).  A subsequent loop of 38 "/ 10.0"s even gives
you 1.0.
--------
Rick Genter 				BBN Laboratories Inc.
(617) 497-3848				10 Moulton St.  6/512
rgenter@labs-b.bbn.COM  (Internet new)	Cambridge, MA   02238
rgenter@bbn-labs-b.ARPA (Internet old)	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

rbj@icst-cmr.ARPA (Root Boy Jim) (07/25/86)

	     With respect to 10**38 being inaccurate:
	
        Geoff is correct.  Since 10 is not a power of two, to represent 10**38
	exactly *does* require many many bits of mantissa (89 to be exact).
	*However*, on the VAX, thanks to hardware magic like guard digits,
	a loop of 38 "* 10.0"s *does* yield the exact representation of
	10**38 within the constraints of a (double). 
	A subsequent loop of 38 "/ 10.0"s even gives you 1.0.

Perhaps I should have made myself clear. IEEE format does not require
that every integer in it's range be represented exactly. I do agree that
10**38 requires something like 89 bits, but 10**38 - 1 requires more,
something on the order of 126 bits. Even with Cray 128 bit doubles,
some of the bits have to go to the exponent, so it can't represent
all the integers in its range either.

	Rick Genter 			BBN Laboratories Inc.
	(617) 497-3848			10 Moulton St.  6/512
	rgenter@labs-b.bbn.COM  	Cambridge, MA   02238
	rgenter@bbn-labs-b.ARPA 	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

	(Root Boy) Jim Cottrell		<rbj@icst-cmr.arpa>
A can of ASPARAGUS, 73 pigeons, some LIVE ammo, and a FROZEN DAQUIRI!! ...

... cannot represent it either.