[comp.lang.c] math function speed

ggs@ulysses.UUCP (02/17/87)

> mouse@mcgill-vision.UUCP (der Mouse):
> >I would really like a C compiler that produces anything that could
> >*touch* the "Kahan's magic square root" code of the 4.3 VAX libm.

Actually, I would too.  But most of the advantages of assembly language
come in the first few lines, where a few deft type puns compute a first
approximation of the sqrt by dividing the exponent by two.  The rest
would be straight C but for the mixing of single-precision with
double-precision.

In article <5622@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> The UNIX System V Release 2.0 sqrt routine, written in C, runs about
> half as fast as the 4.3BSD assembly code, and is about twice as accurate.

But the deft type pun in assembly language is still there; it has been
moved to frexp.s.

What's your definition of twice as accurate?  In 100,000 tests of random
arguments distributed over the positive range, I counted 8634 cases
with a 1-bit error and none worse.  Compared with most other math
functions, this is quite good.  For my reference I used a slow
algorithm that computes the "exact" answer to 1 extra bit and then
rounds.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

gwyn@brl-smoke.UUCP (02/18/87)

In article <2050@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes:
>What's your definition of twice as accurate?

In a test where the result of "sqrt" was compared to known correct
answers, for the following arguments:

	0	1/5	1/3	1/2	1	2	e
	3	pi	4	5	9	10	16
	49	100	10000

I got the following results:

SVR2
function "sqrt":

    max abs error:	  2.7755576e-17
    max rel error:	  2.4037034e-17

4.3BSD
function "sqrt":

    max abs error:	  5.5511151e-17
    max rel error:	  3.3669215e-17

ggs@ulysses.UUCP (02/21/87)

In article <5640@brl-smoke.ARPA>, gwyn@brl-smoke.UUCP writes:
> 
> In a test where the result of "sqrt" was compared to known correct
> answers, for the following arguments:
> 
> 	0	1/5	1/3	1/2	1	2	e
> 	3	pi	4	5	9	10	16
> 	49	100	10000
> 
> I got the following results:
> 
> SVR2 function "sqrt":
> 
>     max abs error:	  2.7755576e-17
>     max rel error:	  2.4037034e-17
> 
> 4.3BSD function "sqrt":
> 
>     max abs error:	  5.5511151e-17
>     max rel error:	  3.3669215e-17

I thought the usual measure of accuracy was in units of "least
significant bits".  I repeated my test that compared a "best" solution
with a library version, this time with the sqrt that I got from the BRL
System V emulation package (sorry, I don't have a real VAX System V to
try, and the 3B20 is a different beast).  My result, using 100000 random
positive doubles:

SVR2 function "sqrt"

    one bit errors:	20356
    two bit errors:	0

4.3BSD function "sqrt"

    one bit errors:	8634
    two bit errors;	0

I question your choice of test numbers: any sqrt function that misses
0 is broken.  Seven of the seventeen numbers are perfect squares,
and six are powers of two.  They are excellent boundary value tests, but
statistically unusual.

I would like to add a cheer to the group at Berkeley who re-worked the
math library for 4.3BSD.  It looks like a good start, and some people
seem to have put a lot of thought into it.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

mccaugh@uiucdcsm.UUCP (02/21/87)

 A numerical-analyst named McConnell makes chronic contributions to the APL
 Conference Proceedings...in particular, I believe the 1983 edition has an
 article by him definiteively addressing the "square-root" problem. It is
 really quite informative.
 Our own Prof. Gear presents a convincing strategy in his "Computer Organ-
 ization and Programming" (2nd edition) for speeding up square-root compu-
 tation on the IBM-360/370 (he presents the strategy without proof: I have
 a proof in case anyone is interested); not that it is necessarily optimal,
 but it is a "speed-up".
 No, Newton's (actually, Newton-Raphson's) Method is not "optimal": it only
 converges quadratically; for a really nice cubically-converging version,
 look at Franz's "Elements of Numerical Analysis". However, fewer itera-
 tions are bought at the price of more ocmputations per iteration.....

mccaugh@uiucdcsm.UUCP (02/21/87)

 The author of "Elements of Numerical Analysis" I was referring to is
 Henrici, not Franz.

ark@alice.UUCP (02/21/87)

In article <2096@ulysses.homer.nj.att.com>, ggs@ulysses.UUCP writes:
-> In article <5640@brl-smoke.ARPA>, gwyn@brl-smoke.UUCP writes:
-> > 
-> > In a test where the result of "sqrt" was compared to known correct
-> > answers, for the following arguments:
-> > 
-> > 	0	1/5	1/3	1/2	1	2	e
-> > 	3	pi	4	5	9	10	16
-> > 	49	100	10000
-> > 
-> > I got the following results:
-> > 
-> > SVR2 function "sqrt":
-> > 
-> >     max abs error:	  2.7755576e-17
-> >     max rel error:	  2.4037034e-17
-> > 
-> > 4.3BSD function "sqrt":
-> > 
-> >     max abs error:	  5.5511151e-17
-> >     max rel error:	  3.3669215e-17
-> 
-> I thought the usual measure of accuracy was in units of "least
-> significant bits".  I repeated my test that compared a "best" solution
-> with a library version, this time with the sqrt that I got from the BRL
-> System V emulation package (sorry, I don't have a real VAX System V to
-> try, and the 3B20 is a different beast).  My result, using 100000 random
-> positive doubles:
-> 
-> SVR2 function "sqrt"
-> 
->     one bit errors:	20356
->     two bit errors:	0
-> 
-> 4.3BSD function "sqrt"
-> 
->     one bit errors:	8634
->     two bit errors;	0
-> 
-> I question your choice of test numbers: any sqrt function that misses
-> 0 is broken.  Seven of the seventeen numbers are perfect squares,
-> and six are powers of two.  They are excellent boundary value tests, but
-> statistically unusual.

The IEEE P754 definition of square root requires that the result be
the most accurate possible.  That is, what you get must be exactly
equal to the infinite-precision square root, correctly rounded.

This is not as difficult as it may sound, because the rounding can
never result in a tie.  That is, the infinite-precision square root
of a floating-point number can never fall exactly between two
adjacent floating-point numbers, so you only need one extra bit
of precision to get the right answer.

To see that this is true, consider Y = sqrt(X), after rounding, and
scale Y by a a power of 2 (and X by that power of 2 squared)
until the low-order bit of Y represents 1.  Now, assume that
the rounding of Y came after a tie.  In other words, in infinite
precision, X = (Y + 0.5) ** 2.  Can this be possible if both X
and Y are integers?  (that's why I did the scaling)

No, it can't be possible.  (Y + 0.5) ** 2 is Y**2 + 2*Y*0.5 + 0.25,
which can't be an integer.

So for floating-point square root, it is entirely meaningful
to ask whether or not the results are correct, and looking at
relative error just conceals the real issue.