[net.sources.d] format

geoff@desint.UUCP (Geoff Kuenning) (07/23/86)

In article <125@sol.UUCP> chris@sol.UUCP (Christopher Caldwell) writes:

> Here are the routines that I talked about in net.lang.c.
>...
> echo "Extracting format.c (13553 characters)."
>...

Sigh.  My soapbox is getting awfully rickety, and with every passing year
it becomes more difficult for my feeble limbs to make the climb onto it.
Nevertheless, I feel morally obligated to warn people:

DON'T USE THIS SOFTWARE TO CONVERT FLOATING POINT IF YOU WANT ACCURATE RESULTS!
In fact, I wouldn't trust it for anything much beyond 5 significant digits
with floats, 10 with doubles.

I've said it before and I guess it needs saying again:  numerical accuracy
is a *very* complicated subject.  There are people at Livermore who spend
100% of their time pursuing the problem.  One of the worst problems is
input/output conversion.  Here are a few of the things the posted routine
gets wrong:

    (1) [by far the worst] It makes no attempt to round.  Thus, the sequence

	    x = 0.99999999999999; format ("{f.1}", x);

	will produce "0.9" as the output, fully 10% off from the
	correctly-rounded answer of "1.0".
    (2) By using division to recalculate "pnum" in the primary loop, errors
	are introduced with large exponents.  For example, 10**38 (the
	maximum range of IEEE floating point numbers) has, by definition,
	38 significant digits.  In binary, those translate to well over
	100 bits, way past the width of even a double on all but a Cray
	(which has a much larger floating range).  So the loops

	    for (i = 0, x = 1.0;  i < 38;  i++, x *= 10.0) ; /* null body */
	    while (--i >= 0) x /= 10.0;

	will not produce 1.0 in 'x'.
    (3) Possibly worse than (2) is the fact that, after the digit developed has
	been placed in the buffer, it is removed from the original number
	with subtraction, after having the non-identity transformation
	of (2) applied to it.  This may be acceptable (I'm not a numerical
	expert), since the subtraction takes place in the most-significant
	bits of the number being converted.  I just learned a long time
	ago that addition and subtraction are the source of the worst
	errors in floating point, and they should be avoided whenever
	possible.  It makes me very nervous to see subtraction in this loop.

So how does one write an accurate output-conversion routine?  Well, you
should start by asking somebody else;  it's not my field.  I think the
best routines might actually separate the mantissa and exponent.  One thing
I'd certainly do is get the powers of 10 out of a table, rather than
calculating them.  Not only is it much faster (and formatted output can
take a *lot* of a program's execution time) but it isn't susceptible
to accumulative errors.  (Oh, and by the way, you should work out any
table values by hand and initialize them in octal or hex.  It's worth a
big time investment to get accurate values there.  And you can't trust
a compiler writer to have accurate input-conversion routines;  compiler
people aren't usually numerical mathematicians either.)

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

For those who care, the relevant code from the posted routine follows.

> X		case 'f':	if( pf_double < 0 )
> X				  then
> X				    {
> X				    buf[bufcnt++] = '-';
> X				    pf_double = -pf_double;
> X				    }
> X				ind = 0;
> X				for(pnum=1.0; pnum<=pf_double; pnum*=pf_base)
> X				    ind--;
> X				pnum /= pf_base;
> X				do  {
> X				    if( ind++ == 0 ) then buf[bufcnt++]='.';
> X				    c = (int)(pf_double/pnum);
> X				    buf[bufcnt++] = bintodig( c );
> X				    pf_double -= (pnum*c);
> X				    pnum /= pf_base;
> X				    } while( ind < pf_dec );
> X				cp = buf;
> X				break;
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff