[comp.arch] Distance to next bit

pmontgom@euphemia.math.ucla.edu (Peter Montgomery) (05/12/91)

In article <3409@spim.mips.COM> zalman@mips.com (Zalman Stern) writes:
>A Count leading zeros (or ffs if you prefer) instruction is not a big deal
>other than that you have to design the hardware to do it. (I.e. it doesn't
>place unreasonable demands on the register file, it doesn't eat lots of
>opcode space.) However its not clear that this single instruction does the
>job. You can either look for a 1 or a 0 bit and you can look from most
>significant to least significant or vice-versa. Herman probably wants all
>of these options...

	It suffices to have a CLZ (count leading zeros) instruction;
the others are easily synthesized using the first.  Assume a 32-bit,
twos complement architecture.

	CLZ(n) 	returns the number of leading zeros of n
		(assumed to be a hardware primitive, result 0 to 32).
		For example, CLZ(negatives) = 0, CLZ(5) = 29, CLZ(0) = 32.

	CLZ(~n)  returns the number of leading ones of n (~ = bitwise NOT)

	31 - CLZ (n ^ (n-1)) counts the number of trailing zero bits of
		a nonzero integer n (^ = bitwise exclusive OR)

	31 - CLZ((n ^ (n+1)) counts the number of trailing one bits of
		an integer n != -1

	The latter two can also be synthesized from a COUNT function
(population count, i.e. number of one bits) function, via
COUNT(~(n | -n)) = COUNT((n-1) & ~n)  and  via COUNT(n & ~(n+1)).

	Although it is not a bottleneck in my programs,
I frequently take the truncated base 2 logarithm (LOG2)
of a positive integer: LOG2(n) = 31 - CLZ(n). 
I don't understand why language designers omit this primitive.
For example, I need LOG2 when determining where to begin
the loop for the left-to-right binary method of exponentiation.
In my multiple precision division routine, I use LOG2 when 
normalizing the divisor; I also use it when normalizing the
operand to a square root.  I count zeros from the right (instead of the left)
when I need to know the power of 2 dividing an integer,
such as in the binary GCD algorithm and in the algorithm
for evaluating a Jacobi symbol (quadratic reciprocity).

> (someone else asked)
>>(Is 'find next 1' similar enough to floating-point normalization that
>>the same hardware assistance can be used for both?"
>
>Putting it in the FP unit makes it harder to uses the result for shifts and
>such on machines with separate FP and integer units. (Hooking the integer
>register file up to the FP normalization hardware is going to be very
>painful.)
>
	Yes, my most frequent uses for the result are as a shift or
as a loop count.  Also, the FP hardware may look at only the mantissa
rather than the whole word (e.g., the NXi Xj,Bk normalize instruction
on the CDC CYBER series returned the shift count in Bk but
checked only at the lower 48 bits and the sign bit of Bj).

--
        Peter L. Montgomery 
        pmontgom@MATH.UCLA.EDU 
        Department of Mathematics, UCLA, Los Angeles, CA 90024-1555
If I spent as much time on my dissertation as I do reading news, I'd graduate.