thor@stout.ucar.edu (Rich Neitzel) (01/11/89)
Recently jim@athsys.uucp (Jim Becker) wrote: > The problem is the most efficient method to simply count the >number of on bits in a sixteen bit short. This is a really easy >exercise, but one that has a variety of different approaches. I would >be interested in the opinions of programmers out there, an efficient >algorithm that is understandable. > > Assume that this is a basic sample algorithm: > > int > BitCount( num ) > short num; > { > int count = 0; > > while( num ) { > > count += (num & 1); > num = (num >> 1); > } > > return count; > } > > Is this efficient? Is this understandable? What is a better >algorithm? First a comment on this, what I call the 'classical shift' version. As listed above, it will not function on many cpus, since it uses right shifts. Since many cpus propagate the high bit, the loop will never terminate. Second, I do not consider this to be an efficient algorithm. While it appears to be good, since it terminates when there are no more set bits, it does not function well if there are many set bits or the only set bit is in the last position tested. A common solution is to use a lookup table and parse the word into nibbles or bytes. This is faster, but you must be certain that the table is correct. I do not relish the thought of entering 256 table entries. Efficiency suffers from the fact that the table lookup requires pointer indexing. I have found that the most efficient algorithm is one involving 'bit pair' addition. Attached are two routines that implement this algorithm. They rely on 'parallel' addition. The word involved is added as successively larger bit pairs. Both a 16 bit and 32 bit form are provided. I have chosen not to attempt to apply it to any other word lengths. This is because the algorithm is not truely 'portable', that is, the idea is portable, but the implemenation is dependent on the word size. The larger the word size the more pairs are involved. These routines have been tested in two environments. The first was on a PDP-11 under RSX-11M-Plus and the second on a Motorola MV133 (20 MHz 68020) under VxWorks. In both cases the test routine was the only job. They easily out performed the other algorithms tested. The completeing algorithms were the 'classical' shift and nibble lookup tables. The results were: Classical shift lut pair addition RSX 12.3 7.4 4.6 VxWorks 25 12 8 RSX times are seconds/65K iterations, VxWorks times are microseconds per iterations. The lookup table code would increase in speed if a byte table were used, but I doubt it would increase enough to beat the pair addition time. Supplied here are both the 16 and 32 bit routines, as well as the classic and lut routines. --------------------------------------------------------------------- /* Module: s_bitcnt.c Author: Richard E. K. Neitzel revision history ---------------- 1.0,9jan89,rekn written Explanation: This routine takes an integer and returns the number of bits it contains which are set. It does this via 'parallel' addition, turning the integer into sets of bit pairs of increasing size. WARNING! This assumes that shorts are 16 bits! */ int s_bitcnt(word) short word; { register short i, j, k; /* our work area */ if (!word) /* Why bother? */ return(0); j = word; k = j & 0x5555; /* Clear every other bit */ j >>= 1; /* Repeat for other bits */ j &= 0x5555; j += k; /* 1st pairs */ k = j & 0x3333; /* Clear every two bits */ j >>= 2; /* shift and repeat */ j &= 0x3333; j += k; /* 2nd pairs */ k = j & 0xff; /* Get lower pairs */ j &= 0xff00; /* Get upper pairs */ j >>= 8; j += k; /* 3rd pairs */ k = j & 0xf; /* Get last pairs */ j >>= 4; j += k; return(j); /* bye */ } --------------------------------------------------------------------- /* Module: l_bitcnt.c Author: Richard E. K. Neitzel revision history ---------------- 1.0,9jan89,rekn written Explanation: This routine takes an integer and returns the number of bits it contains which are set. It does this via 'parallel' addition, turning the integer into sets of bit pairs of increasing size. WARNING! This assumes that integers are 32 bits! */ int l_bitcnt(word) int word; { register int j, k; /* Our work area */ if (!word) /* Why bother? */ return(0); j = word; k = j & 0x55555555; /* Clear every other bit */ j >>= 1; /* Do again for cleared bits */ j &= 0x55555555; j += k; /* 1st pairs done */ k = j & 0x33333333; /* Clear every two bits */ j >>= 2; j &= 0x33333333; j += k; /* 2nd pairs done */ k = j & 0xffff; /* Only need last 4 sets */ j &= 0xffff0000; /* and top 4 here */ j = j >> 16; /* ready - */ j += k; /* add 3rd pairs */ k = j & 0xf0f; /* Clear bits */ j >>= 4; /* Repeat */ j &= 0xf0f; j += k; /* add 4th pairs */ k = j & 0xff; /* Now add the 8 bit pairs */ j >>= 8; j += k; return(j); /* bye */ } --------------------------------------------------------------------- int bitCount( num ) short num; { register short count = 0; register short tmp; tmp = num; while(tmp) /* Note the fix to avoid looping always */ { count += (tmp & 0x8000); tmp <<= 1; } return(count); } --------------------------------------------------------------------- short lut[] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4}; int luts(word) short word; { register short j, k,l; if (!word) return(0); j = word; k = j & 0xf; l += lut[k]; j >>= 4; k = j & 0xf; l += lut[k]; j >>= 4; k = j & 0xf; l += lut[k]; j >>= 4; k = j & 0xf; l += lut[k]; return(l); } ------------------------------------------------------------------------------- Richard Neitzel National Center For Atmospheric Research Box 3000 Boulder, CO 80307-3000 303-497-2057 thor@thor.ucar.edu Torren med sitt skjegg Thor with the beard lokkar borni under sole-vegg calls the children to the sunny wall Gjo'i med sitt shinn Gjo with the pelts jagar borni inn. chases the children in. -------------------------------------------------------------------------------
aeb@cwi.nl (Andries Brouwer) (01/16/89)
In article <1207@ncar.ucar.edu> thor@thor.ucar.edu (Rich Neitzel) gave a number of algorithms of which his `bit pair addition' was the fastest. My favourite routine is the following: /* find number of 1-bits in a */ bitcnt(a) register int a; { register int ct = 0; while(a){ a &= (a-1); ct++; } return(ct); } It is twice as fast on the average as the usual bit count versions, and comparable in speed to Neitzels routine (but much shorter). An important advantage is that it does not depend on the word size. Timing results: this routine Neitzels VAX 200000 iterations 12 sec 10 sec HARRIS 800000 iterations 7 sec 12 sec -- Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/17/89)
In article <7825@boring.cwi.nl> aeb@cwi.nl (Andries Brouwer) writes: > a &= (a-1); Such algorithms should use unsigned integers, to avoid possible overflow problems on certain bit patterns.
leo@philmds.UUCP (Leo de Wit) (01/17/89)
In article <7825@boring.cwi.nl> aeb@cwi.nl (Andries Brouwer) writes: |In article <1207@ncar.ucar.edu> thor@thor.ucar.edu (Rich Neitzel) |gave a number of algorithms of which his `bit pair addition' was |the fastest. My favourite routine is the following: | | /* find number of 1-bits in a */ | bitcnt(a) register int a; { | register int ct = 0; | while(a){ | a &= (a-1); | ct++; | } | return(ct); | } | |It is twice as fast on the average as the usual bit count versions, |and comparable in speed to Neitzels routine (but much shorter). |An important advantage is that it does not depend on the word size. |Timing results: this routine Neitzels | VAX 200000 iterations 12 sec 10 sec | HARRIS 800000 iterations 7 sec 12 sec But the speed DOES depend on the number of bits set in the integer! Each iteration of the loop removes and counts the least significant bit, so this means 32 iterations on a unsigned int, 32 bits wide, with all bits set. In this particular situation, it may well end up being slower than an ordinary shift-&-count-until-zero. What input data were used for those timing results??? Leo.