[comp.lang.c] Floating point exactness & alternatives

rick@tetrauk.UUCP (Rick Jones) (08/06/90)

Thanks to all those who responded to my query about exactness (or lack of it)
in FP numbers.  It has confirmed my suspicion that there isn't a clean solution
to this problem.

The responses can be grouped into 3 broad categories:

a.	"The effective precision is a function of the algorithm and initial data,
	not just the representation"

	While this is true, the representation puts a limit on the precision
	even when the algorithm may suggest something higher.  But more
	fundamentally, this is an issue of _precision_ as opposed to
	_exactness_, which are not quite the same.  If the context is dealing
	with real-world values, they can all be assumed to be irrational, and
	known only to within some understood precision.  When dealing with more
	artificial numbers (see below), they are presumed to be exact, and
	unexpected problems arise because there is not a 1-1 correspondence
	between exact decimal numbers and exact binary FP numbers.

	As my example showed, you don't _expect_ to get into precision issues
	when checking if .291/2 = .1455

b.	"Analyse the bit-pattern of the IEEE format"

	This came from Stefan & Carlos at U. of Basel, and although possible,
	it feels a bit non-portable & hardware specific. We write portable
	packages which run on all sorts of hardware - just how universal IS the
	IEEE format?  (that's not an entirely rhetorical question)

c.	The "constructive reals" solution

	This is interesting, and I will follow up the various references.  It
	seems quite well-known, but sounds a bit computationally expensive,
	though I don't know enough yet to make a qualified comment.

d.	The "continued fractions" solution

	The reference to this came from Colin Plumb, and is also something I
	shall follow up.  It appears to have an advantage that all rational
	numbers have a finite representation.


For interest (if anyone's still with me :-), I'll explain the problem domain.
I deliberately omitted this from my initial query to get as wide a response as
possible.  This is nothing more taxing (!) than business systems, which are not
known for requiring complex mathematics.  They don't, but they do require
exactness.  Auditors have this annoying view that accounts must balance to the
penny, not 1 part in 10 to-the-something.  There is a good case for using some
form of BCD representation, but there are many programming advantages in using
the embedded numerical types of the language (yes, I do know about OOP and
building a BCD class with overloaded operators, that's one of my options).

In fact, the only disadvantage of FP is the exactness issue.  15 significant
digits is enough, the 8-byte repesentation is compact and fixed, the arithmetic
operators are embedded and the computation done in an FPP in all except toy
computers.  Unfortunately, exactness cannot be controlled by applying a
system-wide precision in terms of decimal places.  To illustrate:

	My system may be handling currency values in units of Lira, Yen, or
	some other inconsiderate currency with a very small basic denomination,
	and need to go up to quantities of 10^9 or more.  I may also have some
	stock kept in bulk, and need to account for quantities down to 4
	decimal places.  This total range is pushing the limits, but no one
	figure needs that total range of precision - I never account for less
	than 1 Yen, and I'm not going to have a billion tons of bulk stock.

Andrew Koenig appreciated this problem; as he said:

	Floating point arithmetic is *hard*.

Precision is dealt with in the appropriate places by rounding, but even this
causes a problem.  I notice that no one tried to provide a solution to the
second, and in many ways more subtle, problem in my original posting.  Given
that the supposed exact but unrepresentable value of .1455 has been arrived at
by two different means where the actual values are incrementally above and
below, and we simply want to round the result to 3 decimal places:  simplistic
rounding will actually _increase_ the discrepancy by yielding .145 and .146
respectively.  Comparison of the values before rounding would indicate them
equal within 15 significant digits.  After rounding, they show a difference in
the 3rd digit.  Accountants find this sort of thing difficult to grasp!
I actually have a general solution for this, but it involves sprintf() and some
sneaky string manipulation, and is hardly elegant.


Thanks for the ideas, they have provided food for thought.

I shall be on holiday for the next 3 weeks;  if you have any Earth-shattering
solutions which haven't come up so far, could you e-mail me please?

-- 
Rick Jones					You gotta stand for something
Tetra Ltd.  Maidenhead, Berks			Or you'll fall for anything
rick@tetrauk.uucp (...!ukc!tetrauk.uucp!rick)	     - John Cougar Mellencamp

henry@zoo.toronto.edu (Henry Spencer) (08/08/90)

In article <713@tetrauk.UUCP> rick@tetrauk.UUCP (Rick Jones) writes:
>... This is nothing more taxing (!) than business systems, which are not
>known for requiring complex mathematics.  They don't, but they do require
>exactness.  Auditors have this annoying view that accounts must balance to the
>penny, not 1 part in 10 to-the-something.  There is a good case for using some
>form of BCD representation, but there are many programming advantages in using
>the embedded numerical types of the language...

Our own experience, in building accounting systems for our own use, is that
the problem can often be eliminated by ignoring the decimal point in dollar
currencies and thinking of money as measured in pennies.  Then integer
arithmetic suffices, if amounts are not huge.  If amounts *are* huge, note
that most floating-point boxes will do precise integer arithmetic if you
are careful to avoid operations that necessarily produce fractions.

Yes, you get fractional amounts if you (say) buy 3.5 tons of stuff that
is priced at $1.17 per ton... but there really *is* a fractional amount
there, and the proper response is to explicitly round it in whatever
way is appropriate to the problem (in this case, the standard business
approach of rounding up).

This isn't necessarily an adequate solution to tricky cases involving things
like multiple currencies, but for simple stuff, it's amazing how much
simpler the problem gets if you don't insist on representing money as a
fraction just because it's written that way.
-- 
The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

johnb@srchtec.UUCP (John Baldwin) (08/09/90)

In article <1990Aug7.173030.2823@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <713@tetrauk.UUCP> rick@tetrauk.UUCP (Rick Jones) writes:
>>  Auditors have this annoying view that accounts must balance to the
>>penny, not 1 part in 10 to-the-something.  There is a good case for using some
>>form of BCD representation, but there are many programming advantages in using
>>the embedded numerical types of the language...
>
>Our own experience, in building accounting systems for our own use, is that
>the problem can often be eliminated by ignoring the decimal point in dollar
>currencies and thinking of money as measured in pennies.

Doesn't this push him back into the "continued sums" implementation?
Remember, conversion rates between different currencies "slide" up and down
at a fairly high frequency.  (Does anybody know just HOW frequently the
exchange rates change?)

This would require an INTERNAL currency which would probably NOT be the same as
a recognized world currency;  one in which any unit of any other world currency
can be expressed as an integer.  This is probably tricky enough to find WITHOUT
having the conversions changing on you all the time!

Rick: how many currencies do you have to deal with and how often do they
change?  Maybe this *would* be a feasible approach: if the numbers are too
big for a (long), store them in a struct.  Have a multidimensional array
of structs containing numerators and denominators for conversions.
-- 
John T. Baldwin                      |  johnb@srchtec.uucp
Search Technology, Inc.              |  johnb%srchtec.uucp@mathcs.emory.edu
standard disclaimer:                 |  ...uunet!samsung!emory!stiatl!srchtec..
opinions and mistakes purely my own. |  ...mailrus!gatech!stiatl!srchtec...

mayne@VSSERV.SCRI.FSU.EDU (William (Bill) Mayne) (08/09/90)

In article <168@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
>In article <1990Aug7.173030.2823@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>>In article <713@tetrauk.UUCP> rick@tetrauk.UUCP (Rick Jones) writes:
>>>  Auditors have this annoying view that accounts must balance to the
>>>penny, not 1 part in 10 to-the-something.  There is a good case for using some
>>>form of BCD representation, but there are many programming advantages in using
>>>the embedded numerical types of the language...
>>
I strongly agree both about BCD and using the embedded types (and operators)
of the language.

SUMMARY of my response:

Seeing the original poster's description of his problem it seems to me 
that what is really needed is not floating point numbers but fixed point 
numbers with a moderately large number of digits and scaling for decimal
point placement. The traditional approach would be to use BCD.
Although the kind of arithmetic support needed is possible
in C, and I describe the approach I'd take below, the point is well taken 
that it is more desirable to use the builtin operators, rather than
have to use explicit function calls for every operation. Thus the best
alternative is a language which provides either BCD or some other
mechanism to support arithmetic in the scale and precision needed. 
If using a language which already has the needed arithmetic features
is ruled out by considerations of hardware, cost, programmer skills,
or what ever the best solution may be to use C++. With operator
overloading once a class for BCD is provided programmers can use
C syntax for their applications. The rest of this article just expands
a little on this summary.

You could write your own routines in C to support BCD or something 
functionally equivalent. One relatively simple alternative would be using 
long integers (which I think would give enough significant digits) plus a 
signed short integer for decimal scaling. Then you would have some 
interesting problems with division and handling overflows and such, but no 
more than floating point implementers face. On a machine without hardware
support for either BCD or floating point I think this scheme of using binary
for the significant part but decimal exponents would be the most satifactory.

A number would be represented by a structure like this:
/* Please no flames if the syntax isn't exactly right,      */
/* I don't use this often and don't have a reference handy. */

typedef BCD
  struct
    {
    long digits;
    short exponent;
    }

This would be normalized so that for all nonzero values
digits%10!=0 and the value represented is digits * 10**exponent. 
Most of the arithmetic would be done in binary, but using a decimal
exponent part avoids the problems of representing values like
0.2, which is a repeating fraction in binary and gives accountants
so much heartburn.

Although solutions in C are possible and even interesting I would
suggest that C may not be the best language for this type of problem.
Instead, consider:

  (1) COBOL
  (2) PL/1
  (3) Turbo Pascal with the BCD option
  (4) C++
  (5) Rexx

COBOL and PL/1 both provide the kind of arithmetic support needed for
this problem, plus business oriented output formatting. Of the two PL/1
is a much better language. However, you may not have these available or
have programmers proficient in using them, especially if you are in a
microcomputer environment.

If you are using PCs you might look into Turbo Pascal. Support for BCD
is (or was the last time I looked) an extra cost option, but quite
reasonable. I haven't used it so I don't know how well it works.

With C++ you might still have to write you own routines to support BCD
(though I wouldn't be surprised if you could find a package somewhere).
A big advantage over doing the same thing in C would be that you could 
then overload the regular arithmetic operators to use the new format.
This would make coding and maintaining your appication much easier, and
is probably the most portable of the solutions suggested here (short of
using C, which as I said is really not well suited to this type of 
problem.) If performance is a problem and your hardware supports BCD
(which I think math coprocessors do) you should try to use it, even though
a small assembly language interface might to required to gain access
to it and of course portability would suffer.

Rexx is a fine interpreted language with support for arbitrary precision
arithmetic, using fixed point when possible and only going to floating
point when the exponents get too small or too large. You can control the
digits setting so this won't happen. It is available on PCs (Personal Rexx
from Mansfield (CT) software), IBM mainframes, amigas, and many unix
platforms (from /wrk/group). Performance and space might be a problem,
depending upon the demands of the application. Rexx stores all values as 
strings. Hence it doesn't take much advantage of hardware support for
arithmetic. Its main use is actually as macro language.

Good luck, and keep the newsgroup posted on your solutions. Thanks to the
original poster for the interesting summary.

henry@zoo.toronto.edu (Henry Spencer) (08/09/90)

In article <168@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
>>Our own experience, in building accounting systems for our own use, is that
>>the problem can often be eliminated by ignoring the decimal point in dollar
>>currencies and thinking of money as measured in pennies.
>
>Doesn't this push him back into the "continued sums" implementation?
>Remember, conversion rates between different currencies "slide" up and down
>at a fairly high frequency...

If you are doing tricky things with exchange rates, all bets are off. :-)
In the simple case, though, if you go to Thomas Cook and buy (say) US$,
they will *not* give you a fraction of a cent even if the mathematical
conversion factor would indicate it.  They will round the amount to an
integer in some way (typically in their favor :-)).  So if you're trying
to keep track of that transaction by program, you don't need to keep the
mathematically-exact amount, which is wrong anyway; it is both sufficient
and correct to use the appropriate rounding function to give an integer,
and store that.
-- 
It is not possible to both understand  | Henry Spencer at U of Toronto Zoology
and appreciate Intel CPUs. -D.Wolfskill|  henry@zoo.toronto.edu   utzoo!henry

amull@Morgan.COM (Andrew P. Mullhaupt) (08/11/90)

In article <1990Aug9.154302.29976@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
> In article <168@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
> 
> If you are doing tricky things with exchange rates, all bets are off. :-)
> ... So if you're trying
> to keep track of that transaction by program, you don't need to keep the
> mathematically-exact amount, which is wrong anyway; it is both sufficient
> and correct to use the appropriate rounding function to give an integer,
> and store that.

If you are marking the resultant of the transaction to market, you
_do_ need to have the fraction. 

One of the most interesting difficulties in an "all-integer" or
single precision financial package is split adjustment of stocks.
I know directly of a case where due to a one-bit error in single
precision rounding, a portfolio was found to be unhedged by a
single share. The trading system in question was capable of ordering
a trade to rebalance, and it did so. It turns out to be extremely
unusual for such a trade to be generated for any really useful
purpose, and so the traders operating the system noticed this
trade, and became concerned that a potentially serious error was
going undetected. After some effort, the source of the error was
found, and (somewhat sheepishly) a second trade of a single share
had to be ordered to reverse the effect of the first. Now it's
kind of funny that a single share trade happens, and maybe the
financial loss is at most 25 cents capital and some unmentionable
transaction cost, but the hassle for the traders is not ignorable.

There is another case I know of where tens of thousands of shares 
went round-trip but this was a combination of a database error and
floating point mishap. 

The moral of the story? Do not make casual assumptions about the
kind of arithmetic required in business applications. Expect that
"getting it right to the penny", or any other fixed accuracy will
not be enough in every situation. There will always be one more
example than you can think of when you're trying to get away with
a low ceiling in a financial application. 

Later,
Andrew Mullhaupt

news@usc.edu (08/19/90)

In article <168@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
   In article <1990Aug7.173030.2823@zoo.toronto.edu>
   henry@zoo.toronto.edu (Henry Spencer) writes: 
   >In article <713@tetrauk.UUCP> rick@tetrauk.UUCP (Rick Jones)
   >writes: 
   >>  Auditors have this annoying view that accounts must balance to
   >>  the penny, not 1 part in 10 to-the-something.  There is a good
   >>  case for using some form of BCD representation, but there are
   >>  many programming advantages in using the embedded numerical
   >>  types of the language... 
   >
   >Our own experience, in building accounting systems for our own
   >use, is that the problem can often be eliminated by ignoring the
   >decimal point in dollar currencies and thinking of money as
   >measured in pennies. 

This is very true.  Units of money should never be stored as
fractions.  BCD is merely one technique of ensuring this.

   Doesn't this push him back into the "continued sums"
   implementation?  Remember, conversion rates between different
   currencies "slide" up and down at a fairly high frequency.  (Does
   anybody know just HOW frequently the exchange rates change?)

   This would require an INTERNAL currency which would probably NOT be
   the same as a recognized world currency; one in which any unit of
   any other world currency can be expressed as an integer.  This is
   probably tricky enough to find WITHOUT having the conversions
   changing on you all the time!

The only solution I know of to the multiple currency problem is to
keep records in the same form as the real world entities they
represent.  If you have two apples and three oranges, and the exchange
rate is currently 1 apple = 1 orange, that does not mean you have 5
apples nor does it mean you have 5 oranges.  The exchange rate only
exists when there is an exchange, and is only set for that exchange.
(Not to be confused with a transaction, which might consiste of many
exchanges.)

wulkwa