ecwu61@castle.ed.ac.uk (R Renwick) (03/03/91)
Can anyone help me with this problem: In my super-duper, wire-frame model animating 'C' program, I represent floating point numbers as 4-byte integers. This I do by shifting the number being represented by 15 bits (multiply by 32768) in order that I can keep a track of the fractional part. The reason for doing this is that I want to make sure my program runs as fast as possible by not using floating point arithmetic... What I want to do is do a divide a number by another number and not lose alot of the precision. The way I do this at the moment is by result=((a/b)<<15) But if a and b are close together then (a/b) loses alot of precison (since the result is an int). Does anyone know how I could do an integer division without losing precision. I cannot use result=(((a<<8)/b)<<7) or any other tricks because a may be close to 32768<<15 already... Perhaps if I can find the reciprocal of b in the above representation and multiply it by a then I'll not lose the as much precision, but I don't know how to do this (find the reciprocal) :-( Anyway, if anyone knows how to solve this one, I'd be very grateful to hear from them :-). Rik
dseal@armltd.uucp (David Seal) (03/06/91)
In article <8828@castle.ed.ac.uk> ecwu61@castle.ed.ac.uk (R Renwick) writes: > Can anyone help me with this problem: >In my super-duper, wire-frame model animating 'C' program, I represent >floating point numbers as 4-byte integers. This I do by shifting the number >being represented by 15 bits (multiply by 32768) in order that I can keep a >track of the fractional part. The reason for doing this is that I want >to make sure my program runs as fast as possible by not using floating >point arithmetic... > > What I want to do is do a divide a number by another number >and not lose alot of the precision. > The way I do this at the moment is by > result=((a/b)<<15) >But if a and b are close together then (a/b) loses alot of precison >(since the result is an int). > >Does anyone know how I could do an integer division without losing >precision. I cannot use result=(((a<<8)/b)<<7) or any other tricks >because a may be close to 32768<<15 already... >Perhaps if I can find the reciprocal of b in the above representation >and multiply it by a then I'll not lose the as much precision, but I >don't know how to do this (find the reciprocal) :-( Two quick suggestions: (1) The basic problem is that integer division stops when it determines the bit in the units place, whereas you want it to go on and calculate another 15 fractional binary places. So rather than using C's integer divide operator, write a long division routine using subtractions and shifts and force it to go on 15 iterations longer than normal. (This won't be much slower than C's integer divide, which essentially uses a long division routine anyway). If speed is very important, you'll probably want to either (a) write a well-optimised assembler routine; or (b) write it as a C macro, so that the compiler includes it inline. In the latter case, you may want to look carefully at the assembler produced by the compiler for an instance of your macro and experiment a bit - careful choice of the details of the long division algorithm can make a significant difference to the speed of the division. Incidentally, you may want to watch out for significant bits being lost during the last 15 iterations: this is perfectly possible and means that your result has overflowed. (2) Use a Newton-Raphson iteration to find the reciprocal of your divisor and then multiply - this presupposes that you've got an efficient way to do multiplications, both for the Newton-Raphson iteration and for the final multiplication. The iteration to find the reciprocal of a number A is (in pseudo-code - you'll have to translate to C): newx = initial approximation to reciprocal; repeat x = newx; newx = x * (2 - A*x); until x and newx sufficiently close together; The main thing to beware of in this is that your initial approximation must be reasonably close to the reciprocal - it must be greater than 0 and less than twice the reciprocal. Such an approximation can be found by finding the most significant 1 in A and choosing a value based on this - e.g. if A lies in the range 1.0 <= A < 2.0, its most significant 1 is at bit position 16, and a suitable starting point for the iteration would be 0.75. This gives us: Most sign. bit position Range for A Suitable starting point ----------------------------------------------------------------------- : : 9 2^(-7) <= A < 2^(-6) 1.5*2^6 10 2^(-6) <= A < 2^(-5) 1.5*2^5 11 2^(-5) <= A < 2^(-4) 1.5*2^4 12 2^(-4) <= A < 2^(-3) 1.5*2^3 13 2^(-3) <= A < 2^(-2) 1.5*2^2 14 2^(-2) <= A < 2^(-1) 1.5*2^1 15 2^(-1) <= A < 2^0 1.5*2^0 16 2^0 <= A < 2^1 1.5*2^(-1) 17 2^1 <= A < 2^2 1.5*2^(-2) 18 2^2 <= A < 2^3 1.5*2^(-3) 19 2^3 <= A < 2^4 1.5*2^(-4) 20 2^4 <= A < 2^5 1.5*2^(-5) 21 2^5 <= A < 2^6 1.5*2^(-6) 22 2^6 <= A < 2^7 1.5*2^(-7) : : Watch out for very big values - e.g. 2^16 is representable in this number system, but 2^(-16) isn't! Hope these suggestions help. David Seal dseal@acorn.co.uk or dseal@armltd.uucp