kruger@16bits.dec.com (I've got 50nS memory. What did you say?) (02/27/88)
A one-bit per integer implementation is very poor. In this case, you can use full word width of the machine by making effective use of table lookup. This cuts the amount of work by a factor of 8 or so for the bit rearrangement, and using other tables, you can do each piece of the tabled expansions/compressions in one shot. The whole process can be sped up by at least an order of magnitude, maybe as much as two. (The larger the word width, the better -- a 64 bit machine will be very efficient. Consider the bit rearrangement. Each byte is broken up and the bits distributed across the others. So two 32-bit tables, each with 256 entries, can be used to OR in the correct bits into the other sections. In other words: current = (L0, R0) next = (L1, R1) L1 = Table1[byte0(R0)] | Table2[byte1(R0)] | Table3[byte2(R0)] | Table3[byte3(R0)] | Table4[byte0(L0)] | Table5[byte1(L0)] | Table6[byte2(L0)] | Table7[byte3(L0)] R1 is computed in the same way. Effectively, each table lookup does 4 bits on a 32-bit machine, or 8 bits on a 64 bit machine. Compare this to the horribly slow shuffle that they do in Numerical Recipes! Ugh! Admittedly, you need a byte addressable machine to do this efficiently. Similar approaches on other parts of the algorithm yield even bigger performance increases. dov ======================================================================== Received: by decwrl.dec.com (5.54.4/4.7.34) id AA13798; Fri, 26 Feb 88 10:24:24 PST