[comp.sys.acorn] Dividing

ecwu61@castle.ed.ac.uk (R Renwick) (03/03/91)

	Can anyone help me with this problem:
In my super-duper, wire-frame model animating 'C' program, I represent 
floating point numbers as 4-byte integers. This I do by shifting the number
being represented by 15 bits (multiply by 32768) in order that I can keep a
track of the fractional part. The reason for doing this is that I want
to make sure my program runs as fast as possible by not using floating
point arithmetic...

	What I want to do is do a divide a number by another number
and not lose alot of the precision.
	The way I do this at the moment is by
		 result=((a/b)<<15)
But if a and b are close together then (a/b) loses alot of precison
(since the result is an int).

Does anyone know how I could do an integer division without losing
precision. I cannot use result=(((a<<8)/b)<<7) or any other tricks
because a may be close to 32768<<15 already...
Perhaps if I can find the reciprocal of b in the above representation
and multiply it by a then I'll not lose the as much precision, but I
don't know how to do this (find the reciprocal) :-(

Anyway, if anyone knows how to solve this one, I'd be very grateful to
hear from them :-).

Rik

dseal@armltd.uucp (David Seal) (03/06/91)

In article <8828@castle.ed.ac.uk> ecwu61@castle.ed.ac.uk (R Renwick) writes:

>	Can anyone help me with this problem:
>In my super-duper, wire-frame model animating 'C' program, I represent 
>floating point numbers as 4-byte integers. This I do by shifting the number
>being represented by 15 bits (multiply by 32768) in order that I can keep a
>track of the fractional part. The reason for doing this is that I want
>to make sure my program runs as fast as possible by not using floating
>point arithmetic...
>
>	What I want to do is do a divide a number by another number
>and not lose alot of the precision.
>	The way I do this at the moment is by
>		 result=((a/b)<<15)
>But if a and b are close together then (a/b) loses alot of precison
>(since the result is an int).
>
>Does anyone know how I could do an integer division without losing
>precision. I cannot use result=(((a<<8)/b)<<7) or any other tricks
>because a may be close to 32768<<15 already...
>Perhaps if I can find the reciprocal of b in the above representation
>and multiply it by a then I'll not lose the as much precision, but I
>don't know how to do this (find the reciprocal) :-(

Two quick suggestions:

(1) The basic problem is that integer division stops when it determines the
    bit in the units place, whereas you want it to go on and calculate
    another 15 fractional binary places. So rather than using C's integer
    divide operator, write a long division routine using subtractions and
    shifts and force it to go on 15 iterations longer than normal. (This
    won't be much slower than C's integer divide, which essentially uses a
    long division routine anyway).

    If speed is very important, you'll probably want to either (a) write a
    well-optimised assembler routine; or (b) write it as a C macro, so that
    the compiler includes it inline. In the latter case, you may want to
    look carefully at the assembler produced by the compiler for an instance
    of your macro and experiment a bit - careful choice of the details of
    the long division algorithm can make a significant difference to the
    speed of the division.

    Incidentally, you may want to watch out for significant bits being lost
    during the last 15 iterations: this is perfectly possible and means that
    your result has overflowed.

(2) Use a Newton-Raphson iteration to find the reciprocal of your divisor
    and then multiply - this presupposes that you've got an efficient way to
    do multiplications, both for the Newton-Raphson iteration and for the
    final multiplication. The iteration to find the reciprocal of a number A
    is (in pseudo-code - you'll have to translate to C):

       newx = initial approximation to reciprocal;
       repeat
         x = newx;
         newx = x * (2 - A*x);
       until x and newx sufficiently close together;

    The main thing to beware of in this is that your initial approximation
    must be reasonably close to the reciprocal - it must be greater than 0
    and less than twice the reciprocal. Such an approximation can be found
    by finding the most significant 1 in A and choosing a value based on
    this - e.g. if A lies in the range 1.0 <= A < 2.0, its most significant
    1 is at bit position 16, and a suitable starting point for the iteration
    would be 0.75. This gives us:

    Most sign. bit position      Range for A        Suitable starting point
    -----------------------------------------------------------------------
              :
              :
              9              2^(-7) <= A < 2^(-6)       1.5*2^6
             10              2^(-6) <= A < 2^(-5)       1.5*2^5
             11              2^(-5) <= A < 2^(-4)       1.5*2^4
             12              2^(-4) <= A < 2^(-3)       1.5*2^3
             13              2^(-3) <= A < 2^(-2)       1.5*2^2
             14              2^(-2) <= A < 2^(-1)       1.5*2^1
             15              2^(-1) <= A < 2^0          1.5*2^0
             16                 2^0 <= A < 2^1          1.5*2^(-1)
             17                 2^1 <= A < 2^2          1.5*2^(-2)
             18                 2^2 <= A < 2^3          1.5*2^(-3)
             19                 2^3 <= A < 2^4          1.5*2^(-4)
             20                 2^4 <= A < 2^5          1.5*2^(-5)
             21                 2^5 <= A < 2^6          1.5*2^(-6)
             22                 2^6 <= A < 2^7          1.5*2^(-7)
             :
             :

    Watch out for very big values - e.g. 2^16 is representable in this
    number system, but 2^(-16) isn't!

Hope these suggestions help.

David Seal
dseal@acorn.co.uk or dseal@armltd.uucp