bohn@ptolemy2.rdrc.rpi.edu (10/26/89)
I think I recall an atricle posted here this summer on a "home-brewed" quick methods for approximating the square-root function. Does anyone out there remember what the method was, or have a copy of the article ? Thanx. Jan Bohn bohn@rdrc.rpi.edu
zs04+@andrew.cmu.edu (Zachary T. Smith) (11/19/89)
I realize this is an old discussion, but nobody seems to have considered this solution, so... Assuming you're using integer math (and tho I'm only a novice regarding computer graphics, I do at least know that the simple stuff like 3d rotation can be done with integer math), I would strongly suggest you use a lookup table. If your numbers are 16 bits wide, that's easy enuf to do since 64k by 16 isn't much memory. If your number is 32 bits, use the same 64k by 16 table but do it in 2 stages-- do a lookup for the upper 16 bits, then divide its root out twice and do a lookup for the remaining value. -Zach T. Smith (zs04+@andrew.cmu.edu)
doug@xdos.UUCP (Doug Merritt) (11/28/89)
In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes: >Even with floating point numbers, and divide by 2 can be extremely fast >if coded such that the exponent is simply decremented by one. I would hope >that many compilers do this optimization automatically, but I don't know. This breaks when the number is not an exact power of two, because you're not doing anything to the mantissa. So there'd better not be any compilers that do this. >On the other hand, if you want to take the square root of an integer, you >could simply keep subtracting successive odd numbers until the next odd >number is larger than your accumulator. The number of odd numbers thus >subtracted is the greatest integer function of the square root. i.e.: This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think that this might be a deoptimization on machines with an efficient multiply. Binary search requires an inner loop multiplication, but yours requires lots more worst-case iterations. For 16 bit integers, your algorithm's worst case is 255 iterations, where binary search is 16. So the break-even point is a machine where 16-bit multiplication is 15 to 16 times slower than addition. (I've misplaced my 68000 manual, but from memory I think the ratio of times for addition versus multiplication make it approximately break-even for that processor.) For 32-bit integers, your worst case is 65535 iterations, which couldn't possibly beat binary search's 32 iterations on any machine, even with mildly inefficient software-synthesized multiplication. But if your inputs are limited to say, 2^10 or less, I guess it could be a win. Clever algorithm for the right circumstances, but not a general case optimization. Doug -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary
CMH117@PSUVM.BITNET (Charles Hannum) (11/28/89)
In article <542@xdos.UUCP>, doug@xdos.UUCP (Doug Merritt) says: > >In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes: >>Even with floating point numbers, and divide by 2 can be extremely fast >>if coded such that the exponent is simply decremented by one. I would hope >>that many compilers do this optimization automatically, but I don't know. > >This breaks when the number is not an exact power of two, because you're >not doing anything to the mantissa. So there'd better not be any compilers >that do this. We don't need to do anything to the mantissa. The exponent represents powers of two. This, multiplied by the mantissa (adding the leading 1 if necessary) yields the number. Simply incrementing or decrementing the exponent will multiply or divide by powers of 2. >>On the other hand, if you want to take the square root of an integer, you >>could simply keep subtracting successive odd numbers until the next odd >>number is larger than your accumulator. The number of odd numbers thus >>subtracted is the greatest integer function of the square root. i.e.: > >This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think >that this might be a deoptimization on machines with an efficient >multiply. Binary search requires an inner loop multiplication, but yours >requires lots more worst-case iterations. > >For 16 bit integers, your algorithm's worst case is 255 iterations, >where binary search is 16. So the break-even point is a machine where >16-bit multiplication is 15 to 16 times slower than addition. (I've >misplaced my 68000 manual, but from memory I think the ratio of times for >addition versus multiplication make it approximately break-even for >that processor.) > >For 32-bit integers, your worst case is 65535 iterations, which couldn't >possibly beat binary search's 32 iterations on any machine, even with >mildly inefficient software-synthesized multiplication. > >But if your inputs are limited to say, 2^10 or less, I guess it could >be a win. > >Clever algorithm for the right circumstances, but not a general case >optimization. It wasn't meant to work in the general case. Hell, it would be ridiculous to use it for even 16-bit integers if your processor has a fast divide. It's really only good for 8-bit numbers ... and then you might as well use a lookup table. The iterative method is a much better solution. -- - Charles Martin Hannum II "Klein bottle for sale ... inquire within." (That's Charles to you!) "To life immortal!" cmh117@psuvm.{bitnet,psu.edu} "No noozzzz izzz netzzzsnoozzzzz..." c9h@psuecl.{bitnet,psu.edu} "Mem'ry, all alone in the moonlight ..."
tuna@athena.mit.edu (Kirk 'UhOh' Johnson) (11/29/89)
In article <542@xdos.UUCP> doug@xdos.UUCP (Doug Merritt) writes: >In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes: >>Even with floating point numbers, and divide by 2 can be extremely fast >>if coded such that the exponent is simply decremented by one. I would hope >>that many compilers do this optimization automatically, but I don't know. > >This breaks when the number is not an exact power of two, because you're >not doing anything to the mantissa. So there'd better not be any compilers >that do this. > untrue. consider a number N of the form: E N = M * 2 clearly, N/2 _can_ be obtained by simply decrementing E, for: E-1 N/2 = M * 2 kirk ------------------------------------------------------------------------------- `For a long time I felt without kirk johnson style and grace, wearing shoes
jep@oink.UUCP (James E. Prior) (11/29/89)
In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes: >Even with floating point numbers, and divide by 2 can be extremely fast >if coded such that the exponent is simply decremented by one. I would hope >that many compilers do this optimization automatically, but I don't know. FYI, some machines can do floating points shifts (virtual *2 and /2) in their instruction set. As I recall (I might be wrong, please no flames), the IBM-370s had floating point shifts. A bit of an oddity, their floating point format was base 16, not base 2. It complicates the shifts a little, but the idea's the same. -- Jim Prior jep@oink osu-cis!n8emr!oink!jep N8KSM
turk@Apple.COM (Ken Turkowski) (12/16/89)
You can calculate the square root of a fixed-point number to any precision with an algorithm similar to that of division, generating one bit of precision each time through the loop. Look at any text on computer arithmetic. -- Ken Turkowski @ Apple Computer, Inc., Cupertino, CA Internet: turk@apple.com Applelink: TURKOWSKI1 UUCP: sun!apple!turk