[sci.crypt] turning a plaintext source into a good "random" key

kruger@16bits.dec.com (I've got 50nS memory. What did you say?) (03/02/88)

While I agree that plaintext has to much regularity to use as a key, it also
has a tremendous amount of variety. The trick is to correctly extract the
random element, while somehow 'chopping up' the regularities. The basic concept
of the book method is good -- you've got a large, effectively one-time pad (justkeep a list of books if you send LOTS of messages!)

As a good start, I would propose a finite state machine. Given the current
letter, and the position in the word, do a table lookup using the next word to
(excuse me, letter) to determine what the next character should be. Even before
this, it is probably good to store the letters in a six-bit code, as there is
obviously way too many zeros in the high bits of a byte. Now given this finite-
state code, do some sort of scrambling (exchange bits at various intervals,
for example) and you're well on the way to reducing the pattern of the key.

If there is still too much 'pattern' in the code, taking the low-order bits and
leaving the high-order ones would probably kill most of it. And for that matter,removing the vowels in the original plaintext key would also up the randomness.

Well, on with the criticism!
dov

jmm@thoth15.berkeley.edu (03/02/88)

Would removing all words of less than x characters significantly improve the
quality of one-time book pads?  And what would a useful value of x be?
(I would suggest x=5 at a minimum to eliminate all instances of 'this,' 'that,'
etc.  Would this be enough?)

/ James Moore		/    	|  jmm@bartleby.berkeley.edu
/ 			/	|--------------------------------------------|
/ Ma ta Gaeilge agut,	/	|   Nil aon speis ag Ollscoile na	     |
/ scriobh i!		/	|   California im bharulacha fein.           |

mlm@NL.CS.CMU.EDU (Michael Mauldin) (03/02/88)

In article <8803011848.AA05787@decwrl.dec.com>, kruger@16bits.dec.com writes:
> The trick is to correctly extract the random element, while somehow
> 'chopping up' the regularities.

For something more demonstrably "random" why not use your favorite
redundancy reducer (read: file compressor) to destroy the pattern?
Huffman code based on bigrams, or use Lempel-Ziv compress.  The better
the compress routine, the less redundancy there will be in key stream.
You may still have to deal with some regularities in the resulting bits.

Your suggestion of ignoring vowels and using the low order bits doesn't
seem to work well at all (did you test it?).  Here are histograms for the
text of the US Constitution (50,090 characters):

Character counts (ignoring punctuation and digits < 100 occurences):

SP   10397   ****************************************************************
,      642   ***
.      377   **
a     2779   *****************
b      659   ****
c     1276   *******
d     1307   ********
e     5385   *********************************
f     1077   ******
g      454   **
h     2041   ************
i     2607   ****************
j      100   
k       46   
l     1496   *********
m      792   ****
n     2708   ****************
o     2856   *****************
p      834   *****
q       52   
r     2325   **************
s     2831   *****************
t     3913   ************************
u      883   *****
v      481   **
w      385   **
x      129   
y      553   ***
z       31   

Low 4 bits of all letters except vowels:

0000 11248   ****************************************************************
0001  2943   ****************
0010  3037   *****************
0011  4149   ***********************
0100  5241   *****************************
0101  6287   ***********************************
0110  1584   *********
0111   862   ****
1000  2198   ************
1001  3200   ******************
1010   160   
1011   128   
1100  2166   ************
1101   918   *****
1110  3113   *****************
1111  2856   ****************

So you see there is still quite a lot of structure left.

Michael L. Mauldin (Fuzzy)		Department of Computer Science
ARPA: Michael.Mauldin@NL.CS.CMU.EDU	Carnegie Mellon University
Phone: (412) 268-3065			Pittsburgh, PA  15213-3890