[sci.crypt] Implementing DES Poorly

kruger@16bits.dec.com (I've got 50nS memory. What did you say?) (02/27/88)
A one-bit per integer implementation is very poor. In this case, you can use 
full word width of the machine by making effective use of table lookup. This
cuts the amount of work by a factor of 8 or so for the bit rearrangement, and
using other tables, you can do each piece of the tabled expansions/compressions
in one shot. The whole process can be sped up by at least an order of magnitude,
maybe as much as two. (The larger the word width, the better -- a 64 bit machine
will be very efficient.
 
Consider the bit rearrangement. Each byte is broken up and the bits distributed
across the others. So two 32-bit tables, each with 256 entries, can be used to
OR in the correct bits into the other sections. In other words:
 
current = (L0, R0)
next = (L1, R1)
 
L1 = Table1[byte0(R0)] | Table2[byte1(R0)] | Table3[byte2(R0)] |
     Table3[byte3(R0)] |
     Table4[byte0(L0)] | Table5[byte1(L0)] | Table6[byte2(L0)] |
     Table7[byte3(L0)]
 
R1 is computed in the same way. Effectively, each table lookup does 4 bits on
a 32-bit machine, or 8 bits on a 64 bit machine. Compare this to the horribly
slow shuffle that they do in Numerical Recipes! Ugh! Admittedly, you need a byte
addressable machine to do this efficiently.

Similar approaches on other parts of the algorithm yield even bigger performance
increases.

dov
 
========================================================================
Received: by decwrl.dec.com (5.54.4/4.7.34)
	id AA13798; Fri, 26 Feb 88 10:24:24 PST