[comp.graphics] int math vs. float math

twegner@mwunix.mitre.org (Timothy Wegner) (01/08/91)

jdb@reef.cis.ufl.edu (Brian K. W. Hook) writes:
>I have heard a lot about using integer math instead of floating point
>since it is much faster (e.g. Fractint supposedly uses 32-bit int math
>on a 386 instead of floats and Autodesk theoretically only uses 10% float
>math)....however, I have not been able to find a good reference on this
>subject?  Any help?

I don't know about a reference, but I would like to point out that Fractint
does either integer or floating point math, so that you can easily benchmark
the benefits. The downside of integer math is that fixed point has a limited
dynamic range. But if the nature of the computation is that the numbers stay
within known limits (as in a Mandelbrot calculation), fixed point math using
32 bit registers is much faster. Incredibly faster if your machine does not
have a math coprocessor.

For the technically minded, the Fractint source code is available so that
you can see how it is done.

falk@peregrine.Sun.COM (Ed Falk) (01/08/91)

In article <26189@uflorida.cis.ufl.EDU> jdb@reef.cis.ufl.edu (Brian K. W. Hook) writes:
>Net,
>
>I have heard a lot about using integer math instead of floating point
>since it is much faster (e.g. Fractint supposedly uses 32-bit int math
>on a 386 instead of floats and Autodesk theoretically only uses 10% float
>math)....however, I have not been able to find a good reference on this
>subject?  Any help?

Well, the theory is pretty simple.  Asign an imaginary binary point
to your numbers, some fixed number of bits from the right.  You could
use the same number of bits for all numbers, for instance if you're
working with 32-bit numbers, you might place the binary point 16 bits
from the right.  This would give you numbers with 1-bit sign, 15-bit
integer part and 16-bit fraction.  (Or 16-bit integer and 16-bit
precision if you're working with unsigned numbers.)

There's no rule that says that the binary point has to be the *same*
distance from the right for every number, as long as you keep track of
where it is.  I once wrote an application that had to do matrix
multiplication for a special task all with 16-bit numbers.  Each number
had the binary point in a different place, as I was juggling things to
maximize my precision while avoiding overflows.  It was tricky, and
every line of code had a comment saying what the precision was.
In general, you're better off sticking to a single precision.

The best way to work these out is to picture the operations on the
number one:	0001.0000 (hex)

Anyway, assuming you have 1-bit sign, 15-bit integer and 16-bit
fraction, the basic operations are like this:

  Addition is just integer addition.  Add two fracts together and
  get a fract result.  Ex: 1+1 = 2
	 0001.0000 + 0001.0000 = 0002.0000

  Subtraction is just integer subtraction.
	 0001.0000 - 0001.0000 = 0000.0000

  Multiplying two 32-bit numbers with 16-bit fractions gives you a
  64-bit number with a 32-bit fraction.  Shift the result 16 bits to
  the right to get a 48-bit number with 16-bit fraction.  Discard the
  high-order 16 bits and you have your result.  If the high-order 16
  bits were significant (i.e. not all zeros or all ones), an overflow
  has occurred.  Ex: 1*1 = 1
	 0001.0000 x 0001.0000 = 00000001.00000000 
				     ^^^^^^^^^  (preserve these bits)

  Dividing two 32-bit fracts would leave you with an integer result (i.e.
  total loss of fraction part).  It's better to convert the dividend to
  a 64-bit number with 32-bit fraction part by shifting left 16:
	 00000001.00000000 / 0001.0000 = 0001.0000
  Dividing a 64-bit number with 32-bit fraction part by a 32-bit number
  with 16-bit fraction part gives you a 32-bit number with 16-bit fraction.

As you see, multiplying and dividing require that you be able to work
with 64-bit numbers and would probably require coding in machine language
*if* your computer supports it.  If not, things get tricky.

  To multiply, you need to break your multiplication into four steps.
  First, operate on the high-order 16 bits of the two numbers, then
  the high-order 16 bits of one and the low-order 16 bits of the other
  and so on:

	   AB
	  xCD
	  ---
	   D*B
	  D*A
	  C*B
	+C*A
	  ----

In C, this would all look like:

	typedef	long	FRACT ;		/* assume "long" is 32 bits on my cpu */

	FRACT
	addfract(i,j)
		FRACT	i,j ;
	{
		return i+j ;
	}

	FRACT
	subfract(i,j)
		FRACT	i,j ;
	{
		return i-j ;
	}

	FRACT
	mulfract(i,j)
		FRACT	i,j ;
	{
		u_long	hh,hl,lh,ll ;
		int	sign = 1 ;

		if( i < 0 )
		  sign = -1, i=-i ;
		if( j < 0 )
		  sign = -sign, j=-j ;

		hh = (i>>16) * (j>>16) ;
		hl = (i>>16) * j ;
		lh = i * (j>>16) ;
		ll = i * j ;		/* actually, this is insignificant */

		if( hh & 0xffff8000 ) {		/* overflow? */
		  errno = ERANGE ;
		  return 0 ;
		}

		return sign*((hh<<16) + hl + ll) ;
	}

I've left division as an excersize for the student -- i.e. I don't feel
like working it out right now.

		-ed falk, sun microsystems
		 sun!falk, falk@sun.com
To be loyal to rags, to shout for rags, to worship rags, to die for rags 
-- that is a loyalty of unreason, it is pure animal (Mark Twain).