[comp.graphics] APPROXIMATING SQUARE_ROOT

bohn@ptolemy2.rdrc.rpi.edu (10/26/89)

I think I recall an atricle posted here this summer on a "home-brewed"
quick methods for approximating the square-root function.  Does anyone
out there remember what the method was, or have a copy of the article ?

Thanx.

Jan Bohn
bohn@rdrc.rpi.edu

zs04+@andrew.cmu.edu (Zachary T. Smith) (11/19/89)

I realize this is an old discussion, but nobody seems to have
considered this solution, so...

Assuming you're using integer math (and tho I'm only a novice
regarding computer graphics, I do at least know that the simple
stuff like 3d rotation can be done with integer math), I would
strongly suggest you use a lookup table. If your numbers are 16
bits wide, that's easy enuf to do since 64k by 16 isn't much
memory. If your number is 32 bits, use the same 64k by 16 table
but do it in 2 stages-- do a lookup for the upper 16 bits, then
divide its root out twice and do a lookup for the remaining value.

-Zach T. Smith (zs04+@andrew.cmu.edu)

doug@xdos.UUCP (Doug Merritt) (11/28/89)

In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes:
>Even with floating point numbers, and divide by 2 can be extremely fast
>if coded such that the exponent is simply decremented by one.  I would hope
>that many compilers do this optimization automatically, but I don't know.

This breaks when the number is not an exact power of two, because you're
not doing anything to the mantissa. So there'd better not be any compilers
that do this.

>On the other hand, if you want to take the square root of an integer, you
>could simply keep subtracting successive odd numbers until the next odd
>number is larger than your accumulator.  The number of odd numbers thus
>subtracted is the greatest integer function of the square root.  i.e.:

This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think
that this might be a deoptimization on machines with an efficient
multiply. Binary search requires an inner loop multiplication, but yours
requires lots more worst-case iterations.

For 16 bit integers, your algorithm's worst case is 255 iterations,
where binary search is 16. So the break-even point is a machine where
16-bit multiplication is 15 to 16 times slower than addition. (I've
misplaced my 68000 manual, but from memory I think the ratio of times for
addition versus multiplication make it approximately break-even for
that processor.)

For 32-bit integers, your worst case is 65535 iterations, which couldn't
possibly beat binary search's 32 iterations on any machine, even with
mildly inefficient software-synthesized multiplication.

But if your inputs are limited to say, 2^10 or less, I guess it could
be a win.

Clever algorithm for the right circumstances, but not a general case
optimization.
	Doug
-- 
Doug Merritt		{pyramid,apple}!xdos!doug
Member, Crusaders for a Better Tomorrow		Professional Wildeyed Visionary

CMH117@PSUVM.BITNET (Charles Hannum) (11/28/89)

In article <542@xdos.UUCP>, doug@xdos.UUCP (Doug Merritt) says:
>
>In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes:
>>Even with floating point numbers, and divide by 2 can be extremely fast
>>if coded such that the exponent is simply decremented by one.  I would hope
>>that many compilers do this optimization automatically, but I don't know.
>
>This breaks when the number is not an exact power of two, because you're
>not doing anything to the mantissa. So there'd better not be any compilers
>that do this.

We don't need to do anything to the mantissa.  The exponent represents powers
of two.  This, multiplied by the mantissa (adding the leading 1 if necessary)
yields the number.  Simply incrementing or decrementing the exponent will
multiply or divide by powers of 2.

>>On the other hand, if you want to take the square root of an integer, you
>>could simply keep subtracting successive odd numbers until the next odd
>>number is larger than your accumulator.  The number of odd numbers thus
>>subtracted is the greatest integer function of the square root.  i.e.:
>
>This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think
>that this might be a deoptimization on machines with an efficient
>multiply. Binary search requires an inner loop multiplication, but yours
>requires lots more worst-case iterations.
>
>For 16 bit integers, your algorithm's worst case is 255 iterations,
>where binary search is 16. So the break-even point is a machine where
>16-bit multiplication is 15 to 16 times slower than addition. (I've
>misplaced my 68000 manual, but from memory I think the ratio of times for
>addition versus multiplication make it approximately break-even for
>that processor.)
>
>For 32-bit integers, your worst case is 65535 iterations, which couldn't
>possibly beat binary search's 32 iterations on any machine, even with
>mildly inefficient software-synthesized multiplication.
>
>But if your inputs are limited to say, 2^10 or less, I guess it could
>be a win.
>
>Clever algorithm for the right circumstances, but not a general case
>optimization.

It wasn't meant to work in the general case.  Hell, it would be ridiculous
to use it for even 16-bit integers if your processor has a fast divide.
It's really only good for 8-bit numbers ... and then you might as well use
a lookup table.  The iterative method is a much better solution.

--
- Charles Martin Hannum II       "Klein bottle for sale ... inquire within."
    (That's Charles to you!)     "To life immortal!"
  cmh117@psuvm.{bitnet,psu.edu}  "No noozzzz izzz netzzzsnoozzzzz..."
  c9h@psuecl.{bitnet,psu.edu}    "Mem'ry, all alone in the moonlight ..."

tuna@athena.mit.edu (Kirk 'UhOh' Johnson) (11/29/89)

In article <542@xdos.UUCP> doug@xdos.UUCP (Doug Merritt) writes:
>In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes:
>>Even with floating point numbers, and divide by 2 can be extremely fast
>>if coded such that the exponent is simply decremented by one.  I would hope
>>that many compilers do this optimization automatically, but I don't know.
>
>This breaks when the number is not an exact power of two, because you're
>not doing anything to the mantissa. So there'd better not be any compilers
>that do this.
>

untrue. consider a number N of the form:

                   E
         N = M * 2

clearly, N/2 _can_ be obtained by simply decrementing E, for:

                   E-1
       N/2 = M * 2

kirk

-------------------------------------------------------------------------------
                                               `For a long time I felt without
kirk johnson                                    style and grace, wearing shoes

jep@oink.UUCP (James E. Prior) (11/29/89)

In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes:
>Even with floating point numbers, and divide by 2 can be extremely fast
>if coded such that the exponent is simply decremented by one.  I would hope
>that many compilers do this optimization automatically, but I don't know.

FYI, some machines can do floating points shifts (virtual *2 and /2) in 
their instruction set.  As I recall (I might be wrong, please no flames),
the IBM-370s had floating point shifts.  A bit of an oddity, their floating
point format was base 16, not base 2.  It complicates the shifts a little, 
but the idea's the same.  
-- 
Jim Prior    jep@oink    osu-cis!n8emr!oink!jep    N8KSM

turk@Apple.COM (Ken Turkowski) (12/16/89)

You can calculate the square root of a fixed-point number to any
precision with an algorithm similar to that of division, generating one
bit of precision each time through the loop.  Look at any text on
computer arithmetic.
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: TURKOWSKI1
UUCP: sun!apple!turk