[comp.graphics] Integral Square Root

mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (12/30/89)

I've noticed that there's some interest in integer square root algorithms.
Here's one that is short, fast, and sweet. It depends on the bit-shifting
facilities of C. This one is geared for finding the square-root of a
32-bit unsigned, resulting in a 16-bit root. It should be easy to adapt
to other languages and/or integer sizes. Enjoy!

ULONG  Isqrt(n)
ULONG  n;
{
   register i, r, rb, b;

   for (i = 15, r = 0; i >= 0; --i)
       if ( (rb = r+(b=1<<i)) * rb <= n)
           r += b;
   return (ULONG) r;
}


    -Mitchell
     mitchell@cbmvax.UUCP

     There are evils on this rock greater than even YOU
     can imagine.

chris@mimsy.umd.edu (Chris Torek) (12/31/89)

In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com
(Fred Mitchell - PA) writes:
>Here's one that is short, fast, and sweet.

... and wrong: the line

       if ( (rb = r+(b=1<<i)) * rb <= n)

uses and sets rb in an unknown order.  (Mr. Mitchell's compiler clearly
does the assignment, then the evaluation of the second `rb', then the
multiplication, but a compiler is free to evaluate the second `rb', then
compute r+..., assign that to rb, multiply that by the old value of rb,
and compare with n.)  The simplest fix is to break up the three embedded
statements (b=, rb=, if...), which also leads to more easily understood
code.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (12/31/89)

In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com
>(Fred Mitchell - PA) writes:
>>Here's one that is short, fast, and sweet.
>
>... and wrong: the line
>
>       if ( (rb = r+(b=1<<i)) * rb <= n)
>
>uses and sets rb in an unknown order.  (Mr. Mitchell's compiler clearly
>does the assignment, then the evaluation of the second `rb', then the
>multiplication, but a compiler is free to evaluate the second `rb', then
>compute r+..., assign that to rb, multiply that by the old value of rb,
>and compare with n.)  The simplest fix is to break up the three embedded
>statements (b=, rb=, if...), which also leads to more easily understood
>code.

NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)...

	if (
		( rb = r + 
			(b=1<<i) )
				* rb
					<= n)

Now the one stipulation of the C standard is that INNER PARENTHESIS
GETS EVALUATED FIRST!!!! So the first thing that MUST be evaluated
first is:
	(b = 1 << i)

then:
	(rb = r + b)

and finally:
	rb * rb <= n

Don't fault Chris however. It is very terse and a bit hard to understand,
especially for neophyte C programmers. But I wrote this for SPEED and
COMPACTNESS. This arrangement is sure to work on any properly-implemented
C compiler, and if not- then get your money back! Sure you can expand it
if you like- but that defeats the purpose. I have used this simple
routine for years and on several different compilers- even buggy ones-
and have acheived the expected results. This was also originally done
in FORTH and took up twice as many lines of code! (If any are interested
in the FORTH version, beg and plead and I just may dig it up! :-)

>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
>Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris


	-Mitchell
	 mitchell@cbmvax.UUCP
	 "What do you mean I'm wrong? I've been doing this stuff for
 	 twelve years!!!" - Yours Truly.

spl@mcnc.org (Steve Lamont) (12/31/89)

In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes:
>In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>>In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com
>>(Fred Mitchell - PA) writes:
>>>Here's one that is short, fast, and sweet.
>>
>>... and wrong: [...]

... and (apparently) *slower*

Here's the result of a test I ran on a VAX:

  Using Isqrt = 3.277e-05 sec
  Using  sqrt = 1.923e-05 sec

This was including the overhead of converting the argument to a double for
sqrt() and then converting it back to an int for assignment.

							spl (the p stands for
							pretty skeptical)
-- 
Steve Lamont, sciViGuy			EMail:	spl@ncsc.org
NCSC, Box 12732, Research Triangle Park, NC 27709
"Reality involves a square root"
			Thomas Palmer

jep@oink.UUCP (James E. Prior) (01/01/90)

In article <9170@cbmvax.commodore.com> mitchell@cbmvax.UUCP (Fred Mitchell - PA) writes:
>...
>   for (i = 15, r = 0; i >= 0; --i)
>       if ( (rb = r+(b=1<<i)) * rb <= n)
              ^^                 ^^
              !!                 !!
>           r += b;
>   return (ULONG) r;
>...

Beware, C doesn't guarantee the order of evaluation.  Fred's code will 
work on some compilers, but not on others.

The following code fragment should overcome that particular problem.  
   for (i=15,r=0;i>=0;--i)
      if ((rb=r+(b=1<<i)) , rb*rb<=n)
         r+=b;
   return (ULONG)=r;

And the following code should be a little quicker.  
   for (i=15,r=0,b=1<<i;i>=0;i--,b>>=1) {
      rb=r+b;
      if (rb*rb<=n)
         r+=b;
   return (ULONG)r;

-- 
Jim Prior    jep@oink    osu-cis!n8emr!oink!jep    N8KSM

chris@mimsy.umd.edu (Chris Torek) (01/01/90)

>In article <21550@mimsy.umd.edu> I wrote:
>>       if ( (rb = r+(b=1<<i)) * rb <= n)
>>uses and sets rb in an unknown order. ... a compiler is free to evaluate
>>the second `rb', then compute r+..., assign that to rb, multiply that by
>>the old value of rb, and compare with n.

In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com
(Fred Mitchell - PA) writes:
>... Let's analyze this line more closely (exploded IF, here we come!)...
>
>	if (
>		( rb = r + 
>			(b=1<<i) )
>				* rb
>					<= n)
>
>Now the one stipulation of the C standard is that INNER PARENTHESIS
>GETS EVALUATED FIRST!!!!

As whoever-it-was said, `That turns out not to be the case.'

The forthcoming C standard, which is not yet a standard, says that
parentheses must be observed when evaluating expressions where
rearrangement might affect the result.  For instance, the floating
point expression

	(a - 1.0) * b

must be computed that way, and not as `a * b - b'---an expression that
is mathematically equivalent, but not equivalent in the strange world
of floating point (finite fields anyone?).  Compilers remain free to
rearrange expressions in which the rearrangement is not observable
(except via examination of the object code).

The C standard specifically does *not* say that parentheses control
the order of the evaluation of the variables contained in the expression.
Their affects are limited to suppressing rearrangement.  That is, the
compiler can still evaluate (a - 1.0) * b as `fetch b, fetch a, subtract
1.0, multiply'.  A compiler does not have to `fetch a, subtract 1.0,
fetch b, multiply'.

Now:

>So the first thing that MUST be evaluated
>first is:
>	(b = 1 << i)

This need not be evaluated first.  This expression tells the compiler
that it is to compute `1 << i', and eventually (by the next `sequence
point') store the result in the location determined by evaluating the
name `b' in object context.  The result of this expression is to be
the value that would be obtained if b were examined after `1 << i' was
stored in it.  (The implication of the latter is that if `b' is a
small storage type, such as `char', and has a large value put in it,
such as 32768, and the machine does not trap on overflow, the result
will be a small value, and perhaps a negative one.  In this case b has
the longest type available in C, so there is no need to worry about
truncation.)

>then:
>	(rb = r + b)
[written as (rb = r + (b = ...))]

This expression tells the compiler that it is to compute `r + <the
value resulting from the inner expression>' and eventually (again,
by the next sequence point) store the result in the variable rb.
Again, the result is the value rb would have after the store.

>and finally:
>	rb * rb <= n
[written as `(rb = ...) * rb']

This expression tells the compiler that it is to compute `<the value
resulting from the inner expression> * rb', and compare that with the
value stored in the object named `n'.

There are no sequence points in any of the above expressions.  This
affects only one thing, and that is the value produced by the second
name `rb', in `(rb = ...) * rb'.  The compiler is free to obtain that
value either before assigning the new value (r + (b = ...)), or after
assigning the new value.

>Don't fault Chris however. It is very terse and a bit hard to understand,
>especially for neophyte C programmers.

(I can only assume that Mr. Mitchell means to include me in the group
`neophyte C programmers'.  I dare say he has not read comp.lang.c for the
last five years.)  It is indeed difficult.  Fortunately, I have some
experience with this particular explication.

>But I wrote this for SPEED and COMPACTNESS.

You have mistaken `compact source representation' for `compact object
code'.  I imagine that most compilers will produce identical object
code for the three-statement version as for the one-statement version---
provided, of course, that they happen to evaluate ((rb = ...) * rb)
such that the assignment is done before the second evaluation of rb.
The 4BSD VAX PCC, at least, produces the same code, since its notion of
expression tree evaluation is such that it does stores immediately, so
as to free up registers that may otherwise be tied down holding address
locations.

(It is worth noting that the proposed C standard is not yet widespread
---not merely because it is not yet a standard---and many compilers are
not even constrained to avoid expression rearrangement of the sort that
breaks (a-1.0)*b.)

Incidentally, if you want real speed, you should consider using Rokicki's
algorithm, with expansion (the loop runs only 16 times, so might as well
go all the way to 16 expansions and no loop) and perhaps coded in assembly
(you can probably manage r>>=1 maybe-or-t in fewer instructions than your
compiler uses).  Then too, if the machine has a `find most significant one
bit' instruction (some floating point conversion instructions will do this
for you), you can skip some of the 16 steps as well.  In any case, avoiding
multiplies will help, since integer multiply is fairly difficult.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.umd.edu (Chris Torek) (01/01/90)

In article <5964@alvin.mcnc.org> spl@mcnc.org (Steve Lamont) writes:
>[Fred Mitchell's Isqrt is] (apparently) *slower* [than the 4.3BSD VAX sqrt].
>This was including the overhead of converting the argument to a double for
>sqrt() and then converting it back to an int for assignment.

4.3BSD's VAX sqrt uses some fancy VAX code (due to Kahan and his grad
students).  It is MUCH better than the old `portable' double-precision
sqrt, being not only many times faster but also more accurate.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

ksbooth@watcgl.waterloo.edu (Kelly Booth) (01/01/90)

In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes:
>NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)...
>
>	if (
>		( rb = r + 
>			(b=1<<i) )
>				* rb
>					<= n)
>
>Now the one stipulation of the C standard is that INNER PARENTHESIS
>GETS EVALUATED FIRST!!!!

People say this about Fortran.  They are wrong.  The rules on parentheses are intended to over-ride
the default precedence rules of the language.  They do no force a particular order of evaluation on
the compiler as you assume.  All the rules say is that the result of the inner parenthesized expression
is to be treated as a single value within the larger expression.  The relative order of evaluation of
subexpressions can be anything the compiler wants as long as it is consistent with this (very mild)
constraint.

Most (if not all) of the problems that arise from side effects in functions are due to this problem
(or variants of it).

kchen@Apple.COM (Kok Chen) (01/01/90)

mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes:

>In article <21550@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>>In article <9170@cbmvax.commodore.com> mitchell@cbmvax.commodore.com
>>(Fred Mitchell - PA) writes:
>>>Here's one that is short, fast, and sweet.
>>
>>... and wrong: the line
>>
>>       if ( (rb = r+(b=1<<i)) * rb <= n)
>>
>>uses and sets rb in an unknown order.  (Mr. Mitchell's compiler clearly
>>...
>
>NOT WRONG- ...
>
>Now the one stipulation of the C standard is that INNER PARENTHESIS
>GETS EVALUATED FIRST!!!! So the first thing that MUST be evaluated
>first is:
>	(b = 1 << i)
>
>then:
>	(rb = r + b)
>
>and finally:
>	rb * rb <= n
>
>Don't fault Chris however. It is very terse and a bit hard to understand,
>especially for neophyte C programmers. 


First, a comment and then another suggestion for doing integer square
roots.
 
I am afraid I have to agree with Mr. Torek on this one.  The K&R definition
only states that the PRIORITY of evaluation is as stated by Mr. Mitchell,
not the TEMPORAL ORDER in which each node of the syntax tree is evaluated.

I.e., the "rb" in the "* rb" branch of the expression could be evaluated 
BEFORE or AFTER the "rb =" branch is evaluated.  The only thing a compiler
writer has to make sure of is that the "*" operator is applied AFTER the
"rb =".

I wonder if other compiler writers agree on this one (I am not sure how
KCC would have handled it, either; it's been too many eons ago.  But, I
suspect it would not yield the answer Mr. Mitchell desires.  The reason is
that KCC would always try to swap branches of a commutative operator so 
that the left branch has smaller Sethi weight than the right branch, before
doing a left-to-right tree-walk, so as to minimize register usage.  By doing 
so, it would first evaluate rb before going to the branch that assigns to it.
Anyway, that topic belongs to a different news group :-).

In any case, it is all quite moot, since there are other algorithms that 
have N (= number of bits) iterations that do not even involve multiplications.
Such algorithms could be faster on machines that do not multiply as fast 
as they add or shift.

One that I came up with a few years back (when I had need of a fast square
root evaluator) is based on successive approximation and the fact that
(a+b)^2 = (a^2 + 2ab + b^2).  Each iteration achieves an extra bit of
accuracy, but instead of having to square a number for the test, the (a+b)^2
trick is used.


/*
 *    Example of (16 bit) integer square root based
 *    on (a+b)^2 = (a^2 + 2ab + b^2).  Yields largest integer
 *    whose square is less than or equal n.
 *
 *    binValue  is a table of the numbers 2^0, 2^1, 2^2, etc.
 *    binSquare is a table of the square of 2^0, 2^1, 2^2, etc.
 *
 *    At all times, succApproximation =  iSqrt^2.
 */

int binValue[]  = { 0x80,   0x40,   0x20,  0x10,  0x8,  0x4,   0x2, 0x1 } ;
int binSquare[] = { 0x4000, 0x1000, 0x400, 0x100, 0x40, 0x10 , 0x4, 0x1 } ;


int approxSquareRoot( n )
register int n ;
{
    register int succApproximation, iSqrt, i, test, bitPosition ;

    succApproximation = iSqrt = 0 ;
    bitPosition = 8 ;                            /* 2.2^7 */

    for ( i = 0; i < 8; i++ ) {

        /*
         *    Let test be the square of the the sum
         *    of the current approximation and the next
         *    bit value, i.e., (iSqrt+binValue)^2.
         */

        test = succApproximation                /* a^2   */
		+ ( iSqrt << bitPosition )      /* 2.a.b */
	        + binSquare[i] ;                /* b^2   */

        /*
         *    If test is no greater than n, binValue[i] can be added
         *    to the current approximation of the square root.
         */

        if ( test <= n ) {
            succApproximation = test ;
            iSqrt += binValue[i] ;
        }
        --bitPosition ;
    }
    return ( iSqrt ) ;
}


The above is written freehand and I don't know for sure it will compile :-). 
I am just trying to convey the flavour of the algorithm.  

The algorithm is so obvious that I am sure it has been written up somewhere 
already.  My apologies to any prior discoverers for any lack of attribution 
to their works.  In fact, I would not at all be surprised if this is in some 
piece of hardware somewhere since the two tables can just as well be generated
by some shift register working in real time (indeed, if your machine can
shift faster than it can index an element of a table, it would be faster
to generate the table elements iteratively too).

I am equally sure that there are faster ways to do square roots than this.

(By the way, loonggg ago [like almost 25 years ago, when we were all 
programming in FORTRAN :-)], when I first got interested in algorithms, 
I looked up what the algorithm used by Abacus practitioners were, to obtain 
the square root of a number. Well, no big revalation - the algorithm is 
precisely the one we all learnt in primary school where you gather two digits
at a time.  Sigh.)

Today?  I try looking at a higher level - for algorithms that minimize the
use of square roots.  There ain't such a thing as a FAST square root. :-)
Anytime you have to iterate 8 to 16 times to reach your results...


Regards,

Kok Chen				kchen@apple.COM
Apple Computer, Inc.

chris@mimsy.umd.edu (Chris Torek) (01/01/90)

Aw heck. :-)

Here is my version of Rokicki's code, complete with a main() to test it.

/*
 * Integer square root routine, good for up to 32-bit values.
 * Note that the largest square root (that of 0xffffffff) is
 * 0xffff, so the result fits in a regular unsigned and need
 * not be `long'.
 *
 * Original code from Tomas Rokicki (using a well known algorithm).
 * This version by Chris Torek, University of Maryland.
 *
 * This code is in the public domain.
 */
unsigned
root(v)
	register unsigned long v;
{
	register long t = 1L << 30, r = 0, s;

#define STEP(k) \
	s = t + r; \
	r >>= 1; \
	if (s <= v) { \
		v -= s; \
		r |= t; \
	}
	STEP(15); t >>= 2;
	STEP(14); t >>= 2;
	STEP(13); t >>= 2;
	STEP(12); t >>= 2;
	STEP(11); t >>= 2;
	STEP(10); t >>= 2;
	STEP(9); t >>= 2;
	STEP(8); t >>= 2;
	STEP(7); t >>= 2;
	STEP(6); t >>= 2;
	STEP(5); t >>= 2;
	STEP(4); t >>= 2;
	STEP(3); t >>= 2;
	STEP(2); t >>= 2;
	STEP(1); t >>= 2;
	STEP(0);
	return r;
}

#ifdef MAIN
#include <stdio.h>
#include <math.h>

main()
{
	unsigned long l;
	char buf[100];

	for (;;) {
		(void) printf("gimme a number> ");
		if (fgets(buf, sizeof buf, stdin) == NULL)
			break;
		/* should use strtoul here but some do not have it */
		if (sscanf(buf, " 0x%lx", &l) != 1 &&
		    sscanf(buf, " 0%lo", &l) != 1 &&
		    sscanf(buf, "%lu", &l) != 1 &&
		    sscanf(buf, "%lx", &l) != 1)
			(void) printf("that was not a number\n");
		else
			(void) printf("root(%lu) => %u; sqrt(%lu) => %.17g\n",
			    l, root(l), l, sqrt((double)l));
	}
	exit(0);
}
#endif
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

tim@nucleus.amd.com (Tim Olson) (01/01/90)

In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes:
| NOT WRONG- Let's analyze this line more closely (exploded IF, here we come!)...
| 
| 	if (
| 		( rb = r + 
| 			(b=1<<i) )
| 				* rb
| 					<= n)
| 
| Now the one stipulation of the C standard is that INNER PARENTHESIS
| GETS EVALUATED FIRST!!!!

No.  Parenthesis only change the grouping of an expression, not its
order of evaluation (this is true in K&R "classic" C, as well as the
proposed ANSI standard).  The standard explicitly states:

	 The grouping of an expression does not completely determine
	 its evaluation.  In the following fragment

	      #include <stdio.h>
	      int sum;
	      char *p;
	      /*...*/
	      sum = sum * 10 - '0' + ((*(p++)) = (getchar())));

	 the expression statement is groupsed as if it were written as

	      sum = (((sum * 10) - '0') + ((*(p++)) = (getchar())));

	 but the actual increment of p can occur at any time between
	 the previous sequence point and the next sequence point (the
	 ;), and the call to getchar can occur at any point prior to
	 the need of its returned value.

The key here is the concept of a "sequence point".  Any side-effects
(such as assignment) in an expression can occur at any time as long as
they are complete at the sequence point.  This leads to the
restriction in section 3.3:

	    Between the previous and next sequence point an object
	    shall have its stored value modified at most once by the
	    evaluation of an expression.  Furthermore, the prior value
	    shall be accessed only to determine the value to be
	    stored.


The expression

	(rb = r + (b=1<<i)) * rb <= n)

violates this restriction, because the second reference to rb is not
to determine the previous value of rb to use in the assignment.

Because the standard allows the assignment side-effect to occur at
anytime prior to the sequence point, the second reference to rb could
either be the old or new value of rb, depending upon the
implementation.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

mitchell@cbmvax.commodore.com (Fred Mitchell - PA) (01/02/90)

Well, it seems that this has started quite a controversy. This algorithm
works well on 8- and 16-bit machines that has no hardware floating point
available. It was something that was sitting around for years and I thought
I'd pull it out. Even though there is a question regarding order of evaluation
vs. order of precedence (and I've just check K&R on it), for every compiler
that I've used it on, it has worked. Perhaps order of evaluation should
also be insured. But this is the wrong news group for this discussion.

I suppose debates about this sort of thing can go on forever, ad nauseam :-)
So let us get back to what we came here for, anyway- graphics!

	-Mitchell
	 mitchell@cbmvax.UUCP
	 "You, too can spend a lifetime squabbling. Instead, let's open
	  our eyes and see."

ge@kunivv1.sci.kun.nl (Ge' Weijers) (01/03/90)

Another integer square root algorithm, from Mr. Otto Peter from Innsbruck,
Austria (published in C'T 1-90) copied verbatim:

#define	N_BITS	32
#define	MAX_BIT	((N_BITS+1)/2-1)

unsigned long int sqrt_5(x)
unsigned long int x;
{
	register unsigned long int xroot, m2, x2;

	xroot = 0; m2 = 1<< MAX_BIT * 2;

	do {
		x2 = xroot + m2;
		xroot >>= 1;
		if(x2 <= x) {
			x -= x2;
			xroot += m2;
		}
	} while(m2 >>= 2);
	if(xroot < x) return (xroot+1);
	return(xroot);
}

The algorithm does not use multiplication, which is a benefit on things like
a 68000.

Ge' Weijers


Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         
6525 ED Nijmegen, the Netherlands              tel. +3180612483 (UTC-2)

hallett@pet16.uucp (Jeff Hallett x5163 ) (01/05/90)

In article <37525@apple.Apple.COM> kchen@Apple.COM (Kok Chen) writes:
>the square root of a number. Well, no big revalation - the algorithm is 
>precisely the one we all learnt in primary school where you gather two digits
>at a time.  Sigh.)

Could someone relate this algorithm?  I remember learning it, but then
promptly forgot it. sigh.  I always thought it was kinda fun.

BTW, I've seen Kok's algorithm before and kinda liked  it.  Thanks for
publishing it.


--
	     Jeffrey A. Hallett, PET Software Engineering
      GE Medical Systems, W641, PO Box 414, Milwaukee, WI  53201
	    (414) 548-5163 : EMAIL -  hallett@gemed.ge.com
"Non Sequitor - Your facts are uncoordinated"; "I am Nomad: I am perfect."

rudy@mips.COM (Rudy Wang) (01/11/90)

I've collected all 4 algorithms from chris, rokicki, mitchell and kchen and
ran them under the Mips' profiler on a Mips M2000 (w/ 64M memory).  Here's
the result:  Chris' inlined routine was clearly the winner!  It was 20%
faster than the original rokicki algorithm.

===========================================================================
Profile listing generated Wed Jan 10 21:08:30 1990 with:
   prof -pixie -proc isqrt
----------------------------------------------------------------------------
*  -p[rocedures] using basic-block counts;                                 *
*  sorted in descending order by the number of cycles executed in each     *
*  procedure; unexecuted procedures are excluded                           *
----------------------------------------------------------------------------

319597 cycles

    cycles %cycles  cum %     cycles  bytes procedure (file)
                               /call  /line

    109500   34.26  34.26        219     11 isqrt4 (isqrt.c)
     86000   26.91  61.17        172      9 isqrt3 (isqrt.c)
     56000   17.52  78.69        112      6 isqrt2 (isqrt.c)
     46500   14.55  93.24         93     24 isqrt1 (isqrt.c)
      ...
      ... many irrelevant procedures deleted...
      ...
===========================================================================
Source code follows:

#define ULONG	unsigned long

ULONG
isqrt1 (			/* chris' algorithm */
    register ULONG v
    )
{
    register long t = 1L << 30, r = 0, s;

#define STEP(k) s = t + r; r >>= 1; if (s <= v) {v -= s; r |= t;}

    STEP(15); t >>= 2; STEP(14); t >>= 2; STEP(13); t >>= 2;
    STEP(12); t >>= 2; STEP(11); t >>= 2; STEP(10); t >>= 2;
    STEP(9);  t >>= 2; STEP(8);  t >>= 2; STEP(7);  t >>= 2;
    STEP(6);  t >>= 2; STEP(5);  t >>= 2; STEP(4);  t >>= 2;
    STEP(3);  t >>= 2; STEP(2);  t >>= 2; STEP(1);  t >>= 2;
    STEP(0);
    return (r);

}  /* isqrt1 */


ULONG
isqrt2 (			/* rokicki's (O. Peter's?) algorithm */
    register ULONG v
    )
{
    register ULONG t, r, tmp;

    t = 0x40000000;
    r = 0;
    do
    {
	tmp = t + r;
	r >>= 1;
	if (tmp <= v)
	{
	    v -= tmp;
	    r |= t;
	}
	t >>= 2;
    }
    while (t != 0);

    return (r);

}  /* isqrt2 */


ULONG
isqrt3 (			/* mitchell's algorithm */
    register ULONG n
    )
{
    register int i, r, rb, b;

    i = 15;
    r = 0;
    do
    {
	b = 1 << i;
	rb = r + b;
	if (rb * rb <= n)
	    r += b;
	i--;
    }
    while (i >= 0);

    return ((ULONG) r);

}  /* isqrt3 */


/*
 *------------------------------------------------------------------------
 *	kchen's algorithm - slightly modified to use bit-shift, since it's
 *	quite a bit (:-)) faster than member access on a Mips machine.
 *------------------------------------------------------------------------
 *
 *    Example of (16 bit) integer square root based
 *    on (a+b)^2 = (a^2 + 2ab + b^2).  Yields largest integer
 *    whose square is less than or equal n.
 *
 *    binValue  is a table of the numbers 2^0, 2^1, 2^2, etc.
 *    binSquare is a table of the square of 2^0, 2^1, 2^2, etc.
 *
 *    At all times, succApproximation =  iSqrt^2.
 */

#if 0
int binValue [] = {
    0x8000, 0x4000, 0x2000, 0x1000, 0x800, 0x400, 0x200, 0x100,
    0x80,   0x40,   0x20,   0x10,   0x8,   0x400, 0x2,   0x1
    };
int binSquare [] = {
    0x40000000, 0x10000000, 0x4000000, 0x1000000, 0x400000, 0x100000 , 0x40000, 0x10000,
    0x4000,     0x1000,     0x400,     0x100,     0x40,     0x10 ,     0x4,     0x1
    };
#endif 0

int
isqrt4 (
    register int n
    )
{
    register int guess, iSqrt, i, test;

    guess = iSqrt = 0;

    for (i = 16; i != 0; i--)
    {
	/*
	 * Let test be the square of the the sum of the current
	 * approx. and the next bit value, i.e., (iSqrt+binValue)^2.
	 */
	test = guess				/* a^2   */
	     + (iSqrt << i)			/* 2*a*b */
	     + (1 << ((i - 1) << 1));		/* b^2   */
	/*
	 *    If test is no greater than n, binValue[i] can be added
	 *    to the current approximation of the square root.
	 */
	if (test <= n)
	{
	    guess = test;
	    iSqrt += (1 << (i - 1));
	}
    }
    return (iSqrt);

}  /* isqrt4 */


main (
    register int argc,
    register char **argv
    )
{
    register ULONG i, j, k;

    i = atoi (argv [1]);

    for (j = 0; j < 500; j++)
	k = isqrt1 (i);
    printf ("sqrt (%0d) = %0d\n", i, k);

    for (j = 0; j < 500; j++)
	k = isqrt2 (i);
    printf ("sqrt (%0d) = %0d\n", i, k);

    for (j = 0; j < 500; j++)
	k = isqrt3 (i);
    printf ("sqrt (%0d) = %0d\n", i, k);

    for (j = 0; j < 500; j++)
	k = isqrt4 (i);
    printf ("sqrt (%0d) = %0d\n", i, k);

}  /* main */
-- 
UUCP:	{ames,decwrl,prls,pyramid}!mips!rudy	(or rudy@mips.com)
DDD:	408-991-0247 Rudy Wang			(or 408-720-1700, Ext. 247)
USPS:	MIPS Computer Systems, 930 Arques, Sunnyvale, CA 94086-3650
Quote:	I think they're for 1 AM - Descartes, about his midnight snacks