[comp.lang.c] Integer square root routine needed.

utoddl@ecsvax.UUCP (Todd M. Lewis) (07/31/89)

This may be the wrong place to post this, but I need C source
to a reasonably fast integer square root routine.  (Actually
a clear description of an efficient algorithm would do--I can
write the code myself.)

The wheel I'm inventing is a library of matrix operations for
3-D graphics for Yet Another Ultimate Space Game.  I've got 
everything working with longs, but there's this one place where
I just can't seem to get around doing a square root.  Right now
I'm converting to a double and back, and that works Ok, but
I'd rather have no floating point operations.

So, How does one efficiently go about taking the square root
of a long?

Thanks,

Todd M. Lewis, utoddl@ecsvax.uncecs.edu, utoddl@ecsvax.bitnet
  To quote Eugene H. Spafford, "Crisis and Aftermath", Communications of
  the ACM, vol. 32, no. 6, (June 1989), p. 684:
  "It is curious that this many people [...] would assume to know the
   intent of the author based on the observed behavior of the program."

cliff@ficc.uu.net (cliff click) (08/01/89)

In article <7415@ecsvax.UUCP>, utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
> This may be the wrong place to post this, but I need C source
> to a reasonably fast integer square root routine.  (Actually
> a clear description of an efficient algorithm would do--I can
> write the code myself.) 

Here's mine:

/********* C code to perform integer square root *************/
long sqrt();

main()
{
  register long i;

  for(i=0; i < 100; i++)
    printf("%ld->%ld\t",i,sqrt(i));
  printf("\n32767=%ld, 16000=%ld",sqrt(32767L),sqrt(16000L));
  printf("\n2000000000=%ld\n",sqrt(2000000000L));
  printf(  " 123456789=%ld\n",sqrt( 123456789L));
}

long sqrt(a)                    /* Integer square root */
long a;
{
register unsigned long b,c;

  if( a <= 1 ) return a;        /* Can't handle zero, one */
  c = b = a;                    /* Start at start */
  while( b ) c>>=1,b>>=2;       /* Get sqrt accurate to 1 bit in c */
  for( b = 0; b < 4; b++ )      /* 4 iterations of newtons */
    c = ((a+c*c)>>1)/c;         /* Newtons method */
/*  c = ((a>>1)+((c*c)>>1))/c;  /* Slightly slower, prevents overflow, loses 1 bit */
  return c;                     /* Return it */
}

-- 
Cliff Click, Software Contractor at Large
Business: uunet.uu.net!ficc!cliff, cliff@ficc.uu.net, +1 713 274 5368 (w).
Disclaimer: lost in the vortices of nilspace...       +1 713 568 3460 (h).

paul@moncam.co.uk (Paul Hudson) (08/01/89)

Here you are. This came with much sweat & tears about 2 years ago. No
comments, and bad style, sorry.  A real non-quiche-eater program.

/*
 *	@(#)sqroot.c	1.1	6/26/87
 *	Paul's magic square root routine
 */

main()
{
  unsigned i;
  printf("testing them all!\n");
  
  for (i = 1; i != 65536; i += 1)
  {
    if ((i & 255) == 0) printf("%d\r", i);
    if (i != sqroot(i*i))
      printf(" i %d sqroot %d\n", i, sqroot(i*i));
  }
}

sqroot(n)
register unsigned n;
{
  unsigned  result, acc;
  int  j;

 /* find the highest non-zero pair of bits */
  acc = n;
  for (j = -2; acc > 3; j += 2) 
    acc >>=2;
  result = 1;
  acc -= 1;
  for ( ; j >= 0; j -= 2)
  {
    result <<= 1;
    acc = (acc << 2) + ((n >> j) & 03);
    if ((result << 1) < acc)
    {
      acc -= (result << 1) + 1;
      result += 1;
    }
  }
  return result;
  
}


--
Paul Hudson	 MAIL: Monotype ADG, Science Park, Cambridge, CB4 4FQ, UK.
		PHONE: +44 (223) 420018	  EMAIL: paul@moncam.co.uk,
	;"	  FAX: +44 (223) 420911		 ...!ukc!acorn!moncam!paul
 `"";";"        These opinions void where prohibited by law.

bww@hpqtdla.HP.COM (Brian Woodroffe) (08/01/89)

The following `facts' may be of help in designing an algorithm:-
1/ A perfect square is the square of an integer.
2/ All perfect squares are the sum of a sequence of odd numbers.
3/ The sum of the sequence is the square of the length of the sequence.

eg 	n	seq(n)	sqr(n)
	2	1+3	4
	3	1+3+5	9
	4	1+3+5+7	16
etc

The proof by induction can be derived from (x+1)**2 == x**2 + 2*x + 1


An algorithm based on a binary chop should converge quickly, too!
   
+--------------------------------+-----------------------------------+
| Brian Woodroffe                | HPDESK: Brian Woodroffe/HP1400/B1 |
| Hewlett Packard Ltd            | ARPA:   bww@hpsqf                 |
| Queensferry Telecomms Division |         bww%hpsqf@hplabs.hp.com   |
| South Queensferry              | UUCP:   ..!hplabs!hpqtdla!bww     |
| West Lothian                   | JANET:  bww%hpqtdla@hpl.hp.co.uk  |
| Scotland EH30 9TG.             | PHONE:  +44-31-331-7234           |
+--------------------------------+-----------------------------------+

leichter@CS.YALE.EDU (Jerry Leichter) (08/01/89)

In article <7415@ecsvax.UUCP>, utoddl@ecsvax.UUCP (Todd M. Lewis) writes...
> 
>This may be the wrong place to post this, but I need C source
>to a reasonably fast integer square root routine.  (Actually
>a clear description of an efficient algorithm would do--I can
>write the code myself.)
> 
>The wheel I'm inventing is a library of matrix operations for
>3-D graphics for Yet Another Ultimate Space Game.  I've got 
>everything working with longs, but there's this one place where
>I just can't seem to get around doing a square root.  Right now
>I'm converting to a double and back, and that works Ok, but
>I'd rather have no floating point operations.
> 
>So, How does one efficiently go about taking the square root
>of a long?

A lot depends on what the expected range of inputs is.  A classic algorithm,
which is appropriate when you are usually taking square roots of small num-
bers, is to use the observation that the sum of the first n odd numbers is
n^2, so you can extract a square root using just subtraction and counting.

You might combine this with range reduction - divide through by a couple of
large squares up front.  In particular, removing the largest even power of two
is trivial on a binary machine.  Again, whether this is worth while depends a
bit on what you know about the numbers involved.  If they are likely to have
small factors, then range reduction can be easy and very effective.

Without some constraints on the inputs, it's probably hard to beat a carefully
implemented Newton-Raphson algorithm.  Since it sounds as if you are actually
computing Euclidean distances (sqrt(a^2 + b^2)), you can make an excellent
starting guess - actually, abs(a) + abs(b) is probably pretty good.

							-- Jerry

shap@bunker.UUCP (Joseph D. Shapiro) (08/02/89)

In article <7415@ecsvax.UUCP> utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
>
>This may be the wrong place to post this, but I need C source
>to a reasonably fast integer square root routine.  (Actually
>a clear description of an efficient algorithm would do--I can
>write the code myself.)
>

this routine makes better and better guesses until it gets the same
answer twice.  This takes at most 20 iterations.  You could also limit
the iterations to 10 and get real close.

#include <stdlib.h>

#define print(xxx) printf(#xxx " = %ld\n",xxx);
#define MAXROOT 46340 /* square 46341 and you blow the long range */

long root(long value)
{
	long hi;
	long lo;
	long mid;
	long last;

	lo = 1;
	hi = value;
	if (hi>MAXROOT) hi=MAXROOT;
	mid = value/4;
	if (mid>MAXROOT) mid=MAXROOT;
	last = 0;

	while(mid != last)
	{
		last=mid;
		if (mid * mid > value)
		{
			hi = mid;
		}
		else
		{
			lo = mid;
		}
		mid = (hi+lo)/2L;
	}
	return mid;
}

main(int argc, char **argv)
{
	long value;

	value=atol(argv[1]);
	print(value);
	print(root(value));
	return 0;
}
-- 
__--__--__--__--__--__--__--__--__--__--__--__--__--__--__--__--__--__--__--__
Joe Shapiro					"My other car is a turbo...
ISC-Bunker Ramo     				 ...too."
{decvax,yale,philabs,oliveb}!bunker!shap

scottf@telxon.UUCP (Scott Fluhrer) (08/02/89)

In article <5392@ficc.uu.net> cliff@ficc.uu.net (cliff click) writes:
>In article <7415@ecsvax.UUCP>, utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
>> I need C source to a reasonably fast integer square root routine.
>Here's mine:
>
>
>long sqrt(a)                    /* Integer square root */
>long a;
>{
>register unsigned long b,c;
>
>  if( a <= 1 ) return a;        /* Can't handle zero, one */
>  c = b = a;                    /* Start at start */
>  while( b ) c>>=1,b>>=2;       /* Get sqrt accurate to 1 bit in c */
>  for( b = 0; b < 4; b++ )      /* 4 iterations of newtons */
>    c = ((a+c*c)>>1)/c;         /* Newtons method */
>/*  c = ((a>>1)+((c*c)>>1))/c;  /* Slightly slower, prevents overflow, loses 1 bit */
>  return c;                     /* Return it */
>}

Here's a 'better' version:

/* 'unsigned' == 'we don't have to worry about negative values' :-) */
long sqrt(a)                    /* Integer square root */
unsigned long a;
{
	unsigned long b,c;

	if ( a <= 1 ) return a; /* Can't handle 0.  Our initial guess */
				/* can't handle 1 either */
  	c = b = a;              /* Start at start */
	while (b) c>>=1,b>>=2;  /* Get sqrt accurate to 1 bit in c */
	b = (c + a/c)>>1;	/* Do 1 iteration of newtons */
	do {			/* Now keep on doing iterations until */
		c = b;		/* the value ceases to decrease */
		b = (c + a/c)>>1;
	} while (b < c);
	return c;		/* Return the minimum value */
}

By 'better', I mean:

* Always returns the 'right' answer, with 'right' being 'the squareroot rounded
  down to the next lowest integer' (it is an interesting algebraic exercise
  to verify this).  The previous version got it wrong occassionally (like at
  values 80 and 99) and could overflow

Now, you don't say if exactness (except for overflow, the other routine is only
off by 1) is required, but since it is not difficult, and doesn't cost much
time (my version may even go faster, since it doesn't do any multiplies), you
might as well get it right...

BTW: followup's probably shouldn't go to comp.lang.c...
------
Poncho, the Faithful Sidekick
aka scottf@telxon

t-rayc@microsoft.UUCP (Raymond Chen) (08/02/89)

In article <7415@ecsvax.UUCP>, utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
> This may be the wrong place to post this, but I need C source
> to a reasonably fast integer square root routine.  (Actually
> a clear description of an efficient algorithm would do--I can
> write the code myself.) 

After seeing a Newton's method, I had to go write up this one again.
This basic algorithm was used in a floating point package I wrote
a long long time ago when I was a wee lad.  (For those who are curious, 
it used a 16-bit exponent and a 16-bit mantissa.  This means you can 
represent numbers in the range 10^+-9800 to three significant digits.
Very useful for games, where accuracy takes a back seat to speed.
End of digression.)

Okay, so here's the code.  I wrote it from memory in a few minutes,
so it must've been pretty obvious :-)  If you like it, dislike it,
or find something wrong with it, let me know.
--

#define BITSPERLONG 32

#define TOP2BITS(x) ((x & (3 << (BITSPERLONG-2))) >> (BITSPERLONG-2))


/* usqrt:
    ENTRY x: unsigned long
    EXIT  returns floor(sqrt(x) * pow(2, BITSPERLONG/2))

    Since the square root never uses more than half the bits
    of the input, we use the other half of the bits to contain
    extra bits of precision after the binary point.

    EXAMPLE
        suppose BITSPERLONG = 32
        then    usqrt(144) = 786432 = 12 * 65536
                usqrt(32) = 370727 = 5.66 * 65536

    NOTES
        (1) change BITSPERLONG to BITSPERLONG/2 if you do not want
            the answer scaled.  Indeed, if you want n bits of
            precision after the binary point, use BITSPERLONG/2+n.
	    The code assumes that BITSPERLONG is even.
        (2) This is really better off being written in assembly.
            The line marked below is really a "arithmetic shift left"
            on the double-long value with r in the upper half
            and x in the lower half.  This operation is typically
            expressible in only one or two assembly instructions.
        (3) Unrolling this loop is probably not a bad idea.

    ALGORITHM
        The calculations are the base-two analogue of the square
        root algorithm we all learned in grammar school.  Since we're
        in base 2, there is only one nontrivial trial multiplier.

        Notice that absolutely no multiplications or divisions are performed.
        This means it'll be fast on a wide range of processors.
*/

unsigned long usqrt(unsigned long x)
{
    unsigned long a = 0;                    /* accumulator */
    unsigned long r = 0;                    /* remainder */
    unsigned long e = 0;                    /* trial product */

    int i;

    for (i = 0; i < BITSPERLONG; i++) {      /* NOTE 1 */
        r = (r << 2) + TOP2BITS(x); x <<= 2; /* NOTE 2 */
        a <<= 1;
        e = (a << 1) + 1;
        if (r >= e)
            r -= e, a++;
    }

return a;
}
--
Raymond Chen, mathematician by training			...!microsoft!t-rayc

ken@aiai.ed.ac.uk (Ken Johnson) (08/02/89)

>In article <7415@ecsvax.UUCP>, utoddl@ecsvax.UUCP (Todd M. Lewis) writes...
>> 
>>This may be the wrong place to post this, but I need C source
>>to a reasonably fast integer square root routine.
>>The wheel I'm inventing is a library of matrix operations for
>>3-D graphics for Yet Another Ultimate Space Game.

In article <68259@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry
Leichter) writes:

> computing Euclidean distances (sqrt(a^2 + b^2)), ...

If all you're doing is of the form

  if (distance_between(point1,point2) < threshhold)
  {
   ...
  }

  distance_between(point1,point2)
  struct point *point1, *point2;
  {
	return(sqrt((point1->x - point2->x)^2 + (point1->y - point2->y)^2));
  }

and the square root is only needed to compute distance_between
by Pythagoras, you may be able to substitute

  threshhold_squared = threshhold * threshhold;

  if (squared_distance_between(point1,point2) < threshhold_squared)
  {
   ...
  }

  squared_distance_between(point1,point2)
  struct point *point1, *point2;
  {
	return((point1->x - point2->x)^2 + (point1->y - point2->y)^2);
  }

and get around the problem that way. (If the numbers involved are big,
shift both X and Y differences before squaring.)
-- 
Ken Johnson, AI Applications Institute, 80 South Bridge, Edinburgh EH1 1HN
E-mail ken@aiai.ed.ac.uk, phone 031-225 4464 extension 212
`I have read your article, Mr.  Johnson, and I am no wiser than when I
started.' -- `Possibly not, sir, but far better informed.'

bright@Data-IO.COM (Walter Bright) (08/03/89)

In article <7415@ecsvax.UUCP> utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
>This may be the wrong place to post this, but I need C source
>to a reasonably fast integer square root routine.

/***************************
 * Compute the square root using Newton's method.
 * Donated to Public Domain.
 */

unsigned long ulsqrt(l)
unsigned long l;
{	unsigned long c1;

	/* Perhaps we could do better than this for a first guess	*/
	c1 = (l >> 1) + 1;
	while (c1 * c1 > l)
	{
		c1 = (c1 + l / c1) >> 1;
		/*printf("c1 = %ld\n",c1);*/
	}
	return c1;
}

If you can improve on this, please let me know.

rec@elf115.uu.net (Roger Critchlow) (08/03/89)

/*
** ALGORITHM 650:
** Efficient Square Root Implementation on the 68000
** Kenneth C. Johnson
** ACM Transactions on Mathematical Software,
** Vol 13, No. 2, June 1987, Pages 138-151.
**
** The result is rounded to the nearest integer
** unless TRUNCATE is #defined.
*/

typedef unsigned long uint;	/* 32 bit int for obsolete compiler :-) */

#define two_to_the(x)	((uint)1 << (x))

uint isqrt(x) uint x;
{
  register uint u, v, q, w;

  u = x;
  q = two_to_the(16) * (two_to_the(16)-1);

  if (u > q)
    return two_to_the(16);

  for (w = two_to_the(31); w > 0; w >>= 2) {
    v = (q - w) >> 1;
    if (u > v) {
      u -= v;
      q = v + w;
    } else
      q = v;
  }

  q >>= 1;

#if TRUNCATE
  return (u < q) ? q-1 : q;
#else
  return q;
#endif
}

kcc@wucs1.wustl.edu (Kenneth Charles Cox) (08/03/89)

In article <7415@ecsvax.UUCP> utoddl@ecsvax.UUCP (Todd M. Lewis) writes:
>This may be the wrong place to post this, but I need C source
>to a reasonably fast integer square root routine.  (Actually
>a clear description of an efficient algorithm would do--I can
>write the code myself.)

>So, How does one efficiently go about taking the square root
>of a long?

The following is the base-2 version of the square-root extraction
algorithm in which pairs of digits are brought down at each step.
I once thought of entering this in the Obfuscated C Code contest.

Advantages:
	It is exact, in the sense that isqrt(n) = k means k*k <= n
	and (k+1)*(k+1) > n, for n >= 0.
	Uses only subtraction, shifting, and bitwise logical ops.
Disadvantages:
	Requires BITS/2-1 iterations; so if your longs are 32 bits,
	it uses 15 iterations.

I don't have any comparisons of this with other methods.  It would
depend a lot on the hardware; if * and / are expensive, this may
be faster than (for example) Newton-Raphson despite the greater
number of iterations.  I would be interested in the results of a
timing comparison if you make one.


#define BITS (8*sizeof(long))	/* bits in a long */
long
isqrt(n)
register long n;
{
register long root,rem,t1,i;

	if (n < 0) return(-1);
	if (n == 0) return(0);

	for (i = BITS-2; !(n&(3 << i)); i -= 2);
	rem = ((n>>i)&3)-1;
	root = 1;

	for (i -= 2; i >= 0; i -= 2) {
		rem = (rem << 2) | ((n>>i)&3);
		t1 = root<<2;
		if (rem >= t1) t1 |= 1;
		root <<= 1;
		if (rem >= t1) {
			root |= 1;
			rem -= t1;
		}
	}
	return(root);
}

--------------
Ken Cox
kcc@wucs1.wustl.edu

msb@sq.sq.com (Mark Brader) (08/04/89)

I would like to remind people that code like
> long sqrt(a)                    /* Integer square root */

does not conform to the proposed ANSI standard, assuming a hosted
environment (which is the usual situation).  The names of all of the
standard library functions are reserved and your program is non-
conforming if it uses any of them for its own purposes.  The solution
is of course to call it lsqrt() or some such name.

We just had this topic once, when someone wanted to define their
own sin() function.

Please, no discussion about the names being reserved; this has been hashed
over many times.

-- 
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
	A standard is established on sure bases, not capriciously but with
	the surety of something intentional and of a logic controlled by
	analysis and experiment. ... A standard is necessary for order
	in human effort.				-- Le Corbusier

This article is in the public domain.