mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (12/30/89)
I've noticed that there's some interest in integer square root algorithms.
Here's one that is short, fast, and sweet. It depends on the bit-shifting
facilities of C. This one is geared for finding the square-root of a
32-bit unsigned, resulting in a 16-bit root. It should be easy to adapt
to other languages and/or integer sizes. Enjoy!
ULONG Isqrt(n)
ULONG n;
{
register i, r, rb, b;
for (i = 15, r = 0; i >= 0; --i)
if ( (rb = r+(b=1<<i)) * rb <= n)
r += b;
return (ULONG) r;
}
-Mitchell
mitchell@cbmvax.UUCP
There are evils on this rock greater than even YOU
can imagine.
chris@mimsy.umd.edu (Chris Torek) (12/31/89)
In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >Here's one that is short, fast, and sweet. ... and wrong: the line if ( (rb = r+(b=1<<i)) * rb <= n) uses and sets rb in an unknown order. (Mr. Mitchell's compiler clearly does the assignment, then the evaluation of the second `rb', then the multiplication, but a compiler is free to evaluate the second `rb', then compute r+..., assign that to rb, multiply that by the old value of rb, and compare with n.) The simplest fix is to break up the three embedded statements (b=, rb=, if...), which also leads to more easily understood code. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (12/31/89)
In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com >(Fred Mitchell - PA) writes: >>Here's one that is short, fast, and sweet. > >... and wrong: the line > > if ( (rb = r+(b=1<<i)) * rb <= n) > >uses and sets rb in an unknown order. (Mr. Mitchell's compiler clearly >does the assignment, then the evaluation of the second `rb', then the >multiplication, but a compiler is free to evaluate the second `rb', then >compute r+..., assign that to rb, multiply that by the old value of rb, >and compare with n.) The simplest fix is to break up the three embedded >statements (b=, rb=, if...), which also leads to more easily understood >code. NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)... if ( ( rb = r + (b=1<<i) ) * rb <= n) Now the one stipulation of the C standard is that INNER PARENTHESIS GETS EVALUATED FIRST!!!! So the first thing that MUST be evaluated first is: (b = 1 << i) then: (rb = r + b) and finally: rb * rb <= n Don't fault Chris however. It is very terse and a bit hard to understand, especially for neophyte C programmers. But I wrote this for SPEED and COMPACTNESS. This arrangement is sure to work on any properly-implemented C compiler, and if not- then get your money back! Sure you can expand it if you like- but that defeats the purpose. I have used this simple routine for years and on several different compilers- even buggy ones- and have acheived the expected results. This was also originally done in FORTH and took up twice as many lines of code! (If any are interested in the FORTH version, beg and plead and I just may dig it up! :-) >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) >Domain: chris@cs.umd.edu Path: uunet!mimsy!chris -Mitchell mitchell@cbmvax.UUCP "What do you mean I'm wrong? I've been doing this stuff for twelve years!!!" - Yours Truly.
spl@mcnc.org (Steve Lamont) (12/31/89)
In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >>In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com >>(Fred Mitchell - PA) writes: >>>Here's one that is short, fast, and sweet. >> >>... and wrong: [...] ... and (apparently) *slower* Here's the result of a test I ran on a VAX: Using Isqrt = 3.277e-05 sec Using sqrt = 1.923e-05 sec This was including the overhead of converting the argument to a double for sqrt() and then converting it back to an int for assignment. spl (the p stands for pretty skeptical) -- Steve Lamont, sciViGuy EMail: spl@ncsc.org NCSC, Box 12732, Research Triangle Park, NC 27709 "Reality involves a square root" Thomas Palmer
jep@oink.UUCP (James E. Prior) (01/01/90)
In article <9170@cbmvax.commodore.com> mitchell@cbmvax.UUCP (Fred Mitchell - PA) writes: >... > for (i = 15, r = 0; i >= 0; --i) > if ( (rb = r+(b=1<<i)) * rb <= n) ^^ ^^ !! !! > r += b; > return (ULONG) r; >... Beware, C doesn't guarantee the order of evaluation. Fred's code will work on some compilers, but not on others. The following code fragment should overcome that particular problem. for (i=15,r=0;i>=0;--i) if ((rb=r+(b=1<<i)) , rb*rb<=n) r+=b; return (ULONG)=r; And the following code should be a little quicker. for (i=15,r=0,b=1<<i;i>=0;i--,b>>=1) { rb=r+b; if (rb*rb<=n) r+=b; return (ULONG)r; -- Jim Prior jep@oink osu-cis!n8emr!oink!jep N8KSM
chris@mimsy.umd.edu (Chris Torek) (01/01/90)
>In article <21550@mimsy.umd.edu> I wrote: >> if ( (rb = r+(b=1<<i)) * rb <= n) >>uses and sets rb in an unknown order. ... a compiler is free to evaluate >>the second `rb', then compute r+..., assign that to rb, multiply that by >>the old value of rb, and compare with n. In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >... Let's analyze this line more closely (exploded IF, here we come!)... > > if ( > ( rb = r + > (b=1<<i) ) > * rb > <= n) > >Now the one stipulation of the C standard is that INNER PARENTHESIS >GETS EVALUATED FIRST!!!! As whoever-it-was said, `That turns out not to be the case.' The forthcoming C standard, which is not yet a standard, says that parentheses must be observed when evaluating expressions where rearrangement might affect the result. For instance, the floating point expression (a - 1.0) * b must be computed that way, and not as `a * b - b'---an expression that is mathematically equivalent, but not equivalent in the strange world of floating point (finite fields anyone?). Compilers remain free to rearrange expressions in which the rearrangement is not observable (except via examination of the object code). The C standard specifically does *not* say that parentheses control the order of the evaluation of the variables contained in the expression. Their affects are limited to suppressing rearrangement. That is, the compiler can still evaluate (a - 1.0) * b as `fetch b, fetch a, subtract 1.0, multiply'. A compiler does not have to `fetch a, subtract 1.0, fetch b, multiply'. Now: >So the first thing that MUST be evaluated >first is: > (b = 1 << i) This need not be evaluated first. This expression tells the compiler that it is to compute `1 << i', and eventually (by the next `sequence point') store the result in the location determined by evaluating the name `b' in object context. The result of this expression is to be the value that would be obtained if b were examined after `1 << i' was stored in it. (The implication of the latter is that if `b' is a small storage type, such as `char', and has a large value put in it, such as 32768, and the machine does not trap on overflow, the result will be a small value, and perhaps a negative one. In this case b has the longest type available in C, so there is no need to worry about truncation.) >then: > (rb = r + b) [written as (rb = r + (b = ...))] This expression tells the compiler that it is to compute `r + <the value resulting from the inner expression>' and eventually (again, by the next sequence point) store the result in the variable rb. Again, the result is the value rb would have after the store. >and finally: > rb * rb <= n [written as `(rb = ...) * rb'] This expression tells the compiler that it is to compute `<the value resulting from the inner expression> * rb', and compare that with the value stored in the object named `n'. There are no sequence points in any of the above expressions. This affects only one thing, and that is the value produced by the second name `rb', in `(rb = ...) * rb'. The compiler is free to obtain that value either before assigning the new value (r + (b = ...)), or after assigning the new value. >Don't fault Chris however. It is very terse and a bit hard to understand, >especially for neophyte C programmers. (I can only assume that Mr. Mitchell means to include me in the group `neophyte C programmers'. I dare say he has not read comp.lang.c for the last five years.) It is indeed difficult. Fortunately, I have some experience with this particular explication. >But I wrote this for SPEED and COMPACTNESS. You have mistaken `compact source representation' for `compact object code'. I imagine that most compilers will produce identical object code for the three-statement version as for the one-statement version--- provided, of course, that they happen to evaluate ((rb = ...) * rb) such that the assignment is done before the second evaluation of rb. The 4BSD VAX PCC, at least, produces the same code, since its notion of expression tree evaluation is such that it does stores immediately, so as to free up registers that may otherwise be tied down holding address locations. (It is worth noting that the proposed C standard is not yet widespread ---not merely because it is not yet a standard---and many compilers are not even constrained to avoid expression rearrangement of the sort that breaks (a-1.0)*b.) Incidentally, if you want real speed, you should consider using Rokicki's algorithm, with expansion (the loop runs only 16 times, so might as well go all the way to 16 expansions and no loop) and perhaps coded in assembly (you can probably manage r>>=1 maybe-or-t in fewer instructions than your compiler uses). Then too, if the machine has a `find most significant one bit' instruction (some floating point conversion instructions will do this for you), you can skip some of the 16 steps as well. In any case, avoiding multiplies will help, since integer multiply is fairly difficult. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
chris@mimsy.umd.edu (Chris Torek) (01/01/90)
In article <5964@alvin.mcnc.org> spl@mcnc.org (Steve Lamont) writes: >[Fred Mitchell's Isqrt is] (apparently) *slower* [than the 4.3BSD VAX sqrt]. >This was including the overhead of converting the argument to a double for >sqrt() and then converting it back to an int for assignment. 4.3BSD's VAX sqrt uses some fancy VAX code (due to Kahan and his grad students). It is MUCH better than the old `portable' double-precision sqrt, being not only many times faster but also more accurate. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
ksbooth@watcgl.waterloo.edu (Kelly Booth) (01/01/90)
In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)... > > if ( > ( rb = r + > (b=1<<i) ) > * rb > <= n) > >Now the one stipulation of the C standard is that INNER PARENTHESIS >GETS EVALUATED FIRST!!!! People say this about Fortran. They are wrong. The rules on parentheses are intended to over-ride the default precedence rules of the language. They do no force a particular order of evaluation on the compiler as you assume. All the rules say is that the result of the inner parenthesized expression is to be treated as a single value within the larger expression. The relative order of evaluation of subexpressions can be anything the compiler wants as long as it is consistent with this (very mild) constraint. Most (if not all) of the problems that arise from side effects in functions are due to this problem (or variants of it).
kchen@Apple.COM (Kok Chen) (01/01/90)
mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >>In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com >>(Fred Mitchell - PA) writes: >>>Here's one that is short, fast, and sweet. >> >>... and wrong: the line >> >> if ( (rb = r+(b=1<<i)) * rb <= n) >> >>uses and sets rb in an unknown order. (Mr. Mitchell's compiler clearly >>... > >NOT WRONG- ... > >Now the one stipulation of the C standard is that INNER PARENTHESIS >GETS EVALUATED FIRST!!!! So the first thing that MUST be evaluated >first is: > (b = 1 << i) > >then: > (rb = r + b) > >and finally: > rb * rb <= n > >Don't fault Chris however. It is very terse and a bit hard to understand, >especially for neophyte C programmers. First, a comment and then another suggestion for doing integer square roots. I am afraid I have to agree with Mr. Torek on this one. The K&R definition only states that the PRIORITY of evaluation is as stated by Mr. Mitchell, not the TEMPORAL ORDER in which each node of the syntax tree is evaluated. I.e., the "rb" in the "* rb" branch of the expression could be evaluated BEFORE or AFTER the "rb =" branch is evaluated. The only thing a compiler writer has to make sure of is that the "*" operator is applied AFTER the "rb =". I wonder if other compiler writers agree on this one (I am not sure how KCC would have handled it, either; it's been too many eons ago. But, I suspect it would not yield the answer Mr. Mitchell desires. The reason is that KCC would always try to swap branches of a commutative operator so that the left branch has smaller Sethi weight than the right branch, before doing a left-to-right tree-walk, so as to minimize register usage. By doing so, it would first evaluate rb before going to the branch that assigns to it. Anyway, that topic belongs to a different news group :-). In any case, it is all quite moot, since there are other algorithms that have N (= number of bits) iterations that do not even involve multiplications. Such algorithms could be faster on machines that do not multiply as fast as they add or shift. One that I came up with a few years back (when I had need of a fast square root evaluator) is based on successive approximation and the fact that (a+b)^2 = (a^2 + 2ab + b^2). Each iteration achieves an extra bit of accuracy, but instead of having to square a number for the test, the (a+b)^2 trick is used. /* * Example of (16 bit) integer square root based * on (a+b)^2 = (a^2 + 2ab + b^2). Yields largest integer * whose square is less than or equal n. * * binValue is a table of the numbers 2^0, 2^1, 2^2, etc. * binSquare is a table of the square of 2^0, 2^1, 2^2, etc. * * At all times, succApproximation = iSqrt^2. */ int binValue[] = { 0x80, 0x40, 0x20, 0x10, 0x8, 0x4, 0x2, 0x1 } ; int binSquare[] = { 0x4000, 0x1000, 0x400, 0x100, 0x40, 0x10 , 0x4, 0x1 } ; int approxSquareRoot( n ) register int n ; { register int succApproximation, iSqrt, i, test, bitPosition ; succApproximation = iSqrt = 0 ; bitPosition = 8 ; /* 2.2^7 */ for ( i = 0; i < 8; i++ ) { /* * Let test be the square of the the sum * of the current approximation and the next * bit value, i.e., (iSqrt+binValue)^2. */ test = succApproximation /* a^2 */ + ( iSqrt << bitPosition ) /* 2.a.b */ + binSquare[i] ; /* b^2 */ /* * If test is no greater than n, binValue[i] can be added * to the current approximation of the square root. */ if ( test <= n ) { succApproximation = test ; iSqrt += binValue[i] ; } --bitPosition ; } return ( iSqrt ) ; } The above is written freehand and I don't know for sure it will compile :-). I am just trying to convey the flavour of the algorithm. The algorithm is so obvious that I am sure it has been written up somewhere already. My apologies to any prior discoverers for any lack of attribution to their works. In fact, I would not at all be surprised if this is in some piece of hardware somewhere since the two tables can just as well be generated by some shift register working in real time (indeed, if your machine can shift faster than it can index an element of a table, it would be faster to generate the table elements iteratively too). I am equally sure that there are faster ways to do square roots than this. (By the way, loonggg ago [like almost 25 years ago, when we were all programming in FORTRAN :-)], when I first got interested in algorithms, I looked up what the algorithm used by Abacus practitioners were, to obtain the square root of a number. Well, no big revalation - the algorithm is precisely the one we all learnt in primary school where you gather two digits at a time. Sigh.) Today? I try looking at a higher level - for algorithms that minimize the use of square roots. There ain't such a thing as a FAST square root. :-) Anytime you have to iterate 8 to 16 times to reach your results... Regards, Kok Chen kchen@apple.COM Apple Computer, Inc.
chris@mimsy.umd.edu (Chris Torek) (01/01/90)
Aw heck. :-) Here is my version of Rokicki's code, complete with a main() to test it. /* * Integer square root routine, good for up to 32-bit values. * Note that the largest square root (that of 0xffffffff) is * 0xffff, so the result fits in a regular unsigned and need * not be `long'. * * Original code from Tomas Rokicki (using a well known algorithm). * This version by Chris Torek, University of Maryland. * * This code is in the public domain. */ unsigned root(v) register unsigned long v; { register long t = 1L << 30, r = 0, s; #define STEP(k) \ s = t + r; \ r >>= 1; \ if (s <= v) { \ v -= s; \ r |= t; \ } STEP(15); t >>= 2; STEP(14); t >>= 2; STEP(13); t >>= 2; STEP(12); t >>= 2; STEP(11); t >>= 2; STEP(10); t >>= 2; STEP(9); t >>= 2; STEP(8); t >>= 2; STEP(7); t >>= 2; STEP(6); t >>= 2; STEP(5); t >>= 2; STEP(4); t >>= 2; STEP(3); t >>= 2; STEP(2); t >>= 2; STEP(1); t >>= 2; STEP(0); return r; } #ifdef MAIN #include <stdio.h> #include <math.h> main() { unsigned long l; char buf[100]; for (;;) { (void) printf("gimme a number> "); if (fgets(buf, sizeof buf, stdin) == NULL) break; /* should use strtoul here but some do not have it */ if (sscanf(buf, " 0x%lx", &l) != 1 && sscanf(buf, " 0%lo", &l) != 1 && sscanf(buf, "%lu", &l) != 1 && sscanf(buf, "%lx", &l) != 1) (void) printf("that was not a number\n"); else (void) printf("root(%lu) => %u; sqrt(%lu) => %.17g\n", l, root(l), l, sqrt((double)l)); } exit(0); } #endif -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
tim@nucleus.amd.com (Tim Olson) (01/01/90)
In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: | NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)... | | if ( | ( rb = r + | (b=1<<i) ) | * rb | <= n) | | Now the one stipulation of the C standard is that INNER PARENTHESIS | GETS EVALUATED FIRST!!!! No. Parenthesis only change the grouping of an expression, not its order of evaluation (this is true in K&R "classic" C, as well as the proposed ANSI standard). The standard explicitly states: The grouping of an expression does not completely determine its evaluation. In the following fragment #include <stdio.h> int sum; char *p; /*...*/ sum = sum * 10 - '0' + ((*(p++)) = (getchar()))); the expression statement is groupsed as if it were written as sum = (((sum * 10) - '0') + ((*(p++)) = (getchar()))); but the actual increment of p can occur at any time between the previous sequence point and the next sequence point (the ;), and the call to getchar can occur at any point prior to the need of its returned value. The key here is the concept of a "sequence point". Any side-effects (such as assignment) in an expression can occur at any time as long as they are complete at the sequence point. This leads to the restriction in section 3.3: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The expression (rb = r + (b=1<<i)) * rb <= n) violates this restriction, because the second reference to rb is not to determine the previous value of rb to use in the assignment. Because the standard allows the assignment side-effect to occur at anytime prior to the sequence point, the second reference to rb could either be the old or new value of rb, depending upon the implementation. -- Tim Olson Advanced Micro Devices (tim@amd.com)
mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (01/02/90)
Well, it seems that this has started quite a controversy. This algorithm works well on 8- and 16-bit machines that has no hardware floating point available. It was something that was sitting around for years and I thought I'd pull it out. Even though there is a question regarding order of evaluation vs. order of precedence (and I've just check K&R on it), for every compiler that I've used it on, it has worked. Perhaps order of evaluation should also be insured. But this is the wrong news group for this discussion. I suppose debates about this sort of thing can go on forever, ad nauseam :-) So let us get back to what we came here for, anyway- graphics! -Mitchell mitchell@cbmvax.UUCP "You, too can spend a lifetime squabbling. Instead, let's open our eyes and see."
ge@kunivv1.sci.kun.nl (Ge' Weijers) (01/03/90)
Another integer square root algorithm, from Mr. Otto Peter from Innsbruck, Austria (published in C'T 1-90) copied verbatim: #define N_BITS 32 #define MAX_BIT ((N_BITS+1)/2-1) unsigned long int sqrt_5(x) unsigned long int x; { register unsigned long int xroot, m2, x2; xroot = 0; m2 = 1<< MAX_BIT * 2; do { x2 = xroot + m2; xroot >>= 1; if(x2 <= x) { x -= x2; xroot += m2; } } while(m2 >>= 2); if(xroot < x) return (xroot+1); return(xroot); } The algorithm does not use multiplication, which is a benefit on things like a 68000. Ge' Weijers Ge' Weijers Internet/UUCP: ge@cs.kun.nl Faculty of Mathematics and Computer Science, (uunet.uu.net!cs.kun.nl!ge) University of Nijmegen, Toernooiveld 1 6525 ED Nijmegen, the Netherlands tel. +3180612483 (UTC-2)
hallett@pet16.uucp (Jeff Hallett x5163 ) (01/05/90)
In article <37525@apple.Apple.COM> kchen@Apple.COM (Kok Chen) writes: >the square root of a number. Well, no big revalation - the algorithm is >precisely the one we all learnt in primary school where you gather two digits >at a time. Sigh.) Could someone relate this algorithm? I remember learning it, but then promptly forgot it. sigh. I always thought it was kinda fun. BTW, I've seen Kok's algorithm before and kinda liked it. Thanks for publishing it. -- Jeffrey A. Hallett, PET Software Engineering GE Medical Systems, W641, PO Box 414, Milwaukee, WI 53201 (414) 548-5163 : EMAIL - hallett@gemed.ge.com "Non Sequitor - Your facts are uncoordinated"; "I am Nomad: I am perfect."
rudy@mips.COM (Rudy Wang) (01/11/90)
I've collected all 4 algorithms from chris, rokicki, mitchell and kchen and ran them under the Mips' profiler on a Mips M2000 (w/ 64M memory). Here's the result: Chris' inlined routine was clearly the winner! It was 20% faster than the original rokicki algorithm. =========================================================================== Profile listing generated Wed Jan 10 21:08:30 1990 with: prof -pixie -proc isqrt ---------------------------------------------------------------------------- * -p[rocedures] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * procedure; unexecuted procedures are excluded * ---------------------------------------------------------------------------- 319597 cycles cycles %cycles cum % cycles bytes procedure (file) /call /line 109500 34.26 34.26 219 11 isqrt4 (isqrt.c) 86000 26.91 61.17 172 9 isqrt3 (isqrt.c) 56000 17.52 78.69 112 6 isqrt2 (isqrt.c) 46500 14.55 93.24 93 24 isqrt1 (isqrt.c) ... ... many irrelevant procedures deleted... ... =========================================================================== Source code follows: #define ULONG unsigned long ULONG isqrt1 ( /* chris' algorithm */ register ULONG v ) { register long t = 1L << 30, r = 0, s; #define STEP(k) s = t + r; r >>= 1; if (s <= v) {v -= s; r |= t;} STEP(15); t >>= 2; STEP(14); t >>= 2; STEP(13); t >>= 2; STEP(12); t >>= 2; STEP(11); t >>= 2; STEP(10); t >>= 2; STEP(9); t >>= 2; STEP(8); t >>= 2; STEP(7); t >>= 2; STEP(6); t >>= 2; STEP(5); t >>= 2; STEP(4); t >>= 2; STEP(3); t >>= 2; STEP(2); t >>= 2; STEP(1); t >>= 2; STEP(0); return (r); } /* isqrt1 */ ULONG isqrt2 ( /* rokicki's (O. Peter's?) algorithm */ register ULONG v ) { register ULONG t, r, tmp; t = 0x40000000; r = 0; do { tmp = t + r; r >>= 1; if (tmp <= v) { v -= tmp; r |= t; } t >>= 2; } while (t != 0); return (r); } /* isqrt2 */ ULONG isqrt3 ( /* mitchell's algorithm */ register ULONG n ) { register int i, r, rb, b; i = 15; r = 0; do { b = 1 << i; rb = r + b; if (rb * rb <= n) r += b; i--; } while (i >= 0); return ((ULONG) r); } /* isqrt3 */ /* *------------------------------------------------------------------------ * kchen's algorithm - slightly modified to use bit-shift, since it's * quite a bit (:-)) faster than member access on a Mips machine. *------------------------------------------------------------------------ * * Example of (16 bit) integer square root based * on (a+b)^2 = (a^2 + 2ab + b^2). Yields largest integer * whose square is less than or equal n. * * binValue is a table of the numbers 2^0, 2^1, 2^2, etc. * binSquare is a table of the square of 2^0, 2^1, 2^2, etc. * * At all times, succApproximation = iSqrt^2. */ #if 0 int binValue [] = { 0x8000, 0x4000, 0x2000, 0x1000, 0x800, 0x400, 0x200, 0x100, 0x80, 0x40, 0x20, 0x10, 0x8, 0x400, 0x2, 0x1 }; int binSquare [] = { 0x40000000, 0x10000000, 0x4000000, 0x1000000, 0x400000, 0x100000 , 0x40000, 0x10000, 0x4000, 0x1000, 0x400, 0x100, 0x40, 0x10 , 0x4, 0x1 }; #endif 0 int isqrt4 ( register int n ) { register int guess, iSqrt, i, test; guess = iSqrt = 0; for (i = 16; i != 0; i--) { /* * Let test be the square of the the sum of the current * approx. and the next bit value, i.e., (iSqrt+binValue)^2. */ test = guess /* a^2 */ + (iSqrt << i) /* 2*a*b */ + (1 << ((i - 1) << 1)); /* b^2 */ /* * If test is no greater than n, binValue[i] can be added * to the current approximation of the square root. */ if (test <= n) { guess = test; iSqrt += (1 << (i - 1)); } } return (iSqrt); } /* isqrt4 */ main ( register int argc, register char **argv ) { register ULONG i, j, k; i = atoi (argv [1]); for (j = 0; j < 500; j++) k = isqrt1 (i); printf ("sqrt (%0d) = %0d\n", i, k); for (j = 0; j < 500; j++) k = isqrt2 (i); printf ("sqrt (%0d) = %0d\n", i, k); for (j = 0; j < 500; j++) k = isqrt3 (i); printf ("sqrt (%0d) = %0d\n", i, k); for (j = 0; j < 500; j++) k = isqrt4 (i); printf ("sqrt (%0d) = %0d\n", i, k); } /* main */ -- UUCP: {ames,decwrl,prls,pyramid}!mips!rudy (or rudy@mips.com) DDD: 408-991-0247 Rudy Wang (or 408-720-1700, Ext. 247) USPS: MIPS Computer Systems, 930 Arques, Sunnyvale, CA 94086-3650 Quote: I think they're for 1 AM - Descartes, about his midnight snacks