[comp.parallel] Clever programming tricks wanted

gvw@uunet.UU.NET (MOBY) (01/11/89)

I'm looking for "clever" programming tricks.  For example,
consider the following problem:

    Given a computer word W in which one or more bits are set,
    change the lowest set bit from 1 to 0, and set the corresponding
    bit in another word B to 1.  Eg. if the input is W = 0x0860,
    return W' = 0x0840 and B' = 0x0020.

The obvious technique uses shifting and a bit mask.  A much slicker
approach is:

    Procedure getNextBit( W, B )
      W, B : Integer
    {
      B := W
      W := W MINUS 1    -- modulo minus (i.e., ignore overflow)
      W := W AND B      -- bit-wise AND
      B := B XOR W      -- bit-wise XOR
    }

I would like to hear about similar tricks that people have invented/
discovered.  In particular, I'm looking for fast ways to count the
number of 1 bits in a word (on machines that don't have this as a
machine instruction :-), and trying to find a description of a
mythical technique for creating doubly-linked lists using single
pointers and XOR operations.

Please send all correspondence to comp.lang.misc, or directly to
me (gvw@uk.ac.ed.etive).

Cheers,

Greg Wilson
-- 
Edinburgh Concurrent Supercomputer Project

dave@vax1.cc.uakron.edu (David Stoutamire) (01/19/89)

In article <4061@hubcap.UUCP> you write:
>I'm looking for "clever" programming tricks.  For example,
>...
>I would like to hear about similar tricks that people have invented/
>discovered.  In particular, I'm looking for fast ways to count the
>number of 1 bits in a word (on machines that don't have this as a
>machine instruction :-), and trying to find a description of a

I have invented a technique for counting one bits that I used
in a bit representation of a board game to do lickety-split
move generation.  It has some overhead but for long words
is efficient since it takes log time.  It may be less
efficient for sparse words with few bits to count,
but is much faster when the word becomes more
than half full.

The basic idea is to consider grouping the bits.  For example,
to count the ones in a 32 bit word (using unsigned arithmetic
operations):

Form two words, one consisting of the original word with all
even bits masked out, the other with all odds.  Shift so
that the bits are aligned, and do an addition.  We have now
converted the format from 32 single bit records to 16
double bit records.  Each double bit record is safe from
its rightmost neighbor's carry because the highest number
possible in the double bit record is 10.

Once again, mask and shift (by two bits now), and do the
addition.  Now you have 8 four-bit records.

Mask and shift by four bits to get 4 byte-size records.
Mask and shift by eight bits to get two 16-bit records.
Mask and shift by sixteen bits to get one 32-bit record
containing the number of bits in the original word.

I did this without looping for speed.

Example: (16 bits)

0110100010110101 -->    00 01 01 00 01 01 00 00
                       +01 00 00 00 00 01 01 01
			-- -- -- -- -- -- -- --
			01 01 01 00 01 10 01 01

0101010001100101 -->    0001 0001 0001 0001
		       +0001 0000 0010 0001
			---- ---- ---- ----
			0010 0001 0011 0010

0010000100110010 -->    00000010 00000011
		       +00000001 00000010
			-------- --------
			00000011 00000101

0000001100000101 -->    0000000000000011
		       +0000000000000101
			----------------
			0000000000001000 = number of ones bits

4 steps = log 16
	     2

				-= David Stoutamire =-

landman@Sun.COM (Howard A. Landman) (01/20/89)

In article <4131@hubcap.UUCP> dave@vax1.cc.uakron.edu (David Stoutamire) writes:
>I have invented a technique for counting one bits that I used
>in a bit representation of a board game to do lickety-split
>move generation.

This is essentially the same as the code I posted earlier.  I should point out,
however, that for EXTREMELY high performance applications it is much faster
to have lookup tables.  For 32-bit words this gets impractical, but it is
quite reasonable to break it into 16-bit halves, look up the two counts, and
sum them together.  This requires only 5 instructions (mask, lookup, shift,
lookup, add).  The counts can't get larger than 5 bits, so the table takes
no more than 64KB.

Your mileage may vary depending on cache size ...

	Howard A. Landman
	landman@hanami.sun.com