firth@sei.cmu.edu (Robert Firth) (08/22/89)
For those interested in this hackery, here's a very simple square root approximation. We assume the number is non negative, and in standard binary floating-point representation, with exponent and mantissa juxtaposed and with one hidden bit. Thus, the bit pattern k|bbb... means 2^k * (0.1bbb...). Take the machine word holding exponent and high mantissa, shift it right logically one place, and add (using integer addition) the bit pattern 0000|100... For example, a starting pattern of k0|0000 (ie 2^2k * 1/2) becomes k|1000 (ie 2^k * 3/4) (If the exponent is represented in excess-2^n form, then of course the bit pattern you add contains also the correction to the exponent, but that's a negligible detail) The other three cases of interest are k0|1111 (2^2k * (1-e)) => k|1111 (2^k * (1-e)) k1|0000 (2^(2k+1) * 1/2) => (k+1)|0000 (2^(k+1) * 1/2) k1|1111 (2^(2k+1) * (1-e)) => (k+1)|0111 (2^(k+1) * (3/4 - e)) The third case is exact (good thing, too), and the second is nearly so. The first and fourth cases have the greatest inaccuracy, and in effect take 1.5 as the first approximation to sqrt(2), which is an error of 86 in 1414 or less than 1 in 16. So we have at worst three good bits in this first approximation, whence three iterations of Newton's rule will give us at worst 31 bits. The initial approximation is one shift and one add, which is almost certainly faster than a lookup table. Moreover, the time to get the square root is quite predictable - within a factor of 2 surely even allowing for data-dependent division instructions. [Found in the PDP-11 Fortran IV mathlib; credit probably due to some anonymous person at DEC] However, note that a lookup table based on the last bit of the exponent and the first three of the mantissa can give you five bits' initial accuracy, which probably lets you save an iteration.