[net.crypt] Code Breaking

sanand@radha.UUCP (Sanand Patel) (04/25/86)

Forgive my naivette, but someone mentioned that the U.S. Somebody or Other,
has essentially no problems in breaking codes (grossly paraphrased) ...
he mentioned the 'KAL' vs. Russia case.

[naive mode]

Does this apply to the simple scheme of "exclusive or" against a large
key (I know almost nothing about cryptology ...) 
e.g.

plaintext--> Some very important text that must be kept secret
My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK

Coded-------> Exclusive-Or of S+T, o+h, m+i etc.

plaintext-->  Exclusive-Or of c1+T, c2+h , c3+i etc.
on other side

Now, not withstanding how the key is passed, is the above scheme breakable,
especially with very large keys (say 100 or 200 letters) ?

---
--- {ihnp4,decvax,allegra,linus}!utzoo!dciem!radha!sanand
--- 416-293-9722 ext248

henry@utzoo.UUCP (Henry Spencer) (04/27/86)

> Does this apply to the simple scheme of "exclusive or" against a large
> key (I know almost nothing about cryptology ...) 
> ...
> Now, not withstanding how the key is passed, is the above scheme breakable,
> especially with very large keys (say 100 or 200 letters) ?

Exclusive-or with the key is the usual implementation method for substitution
encryption techniques today.  (The alternative is transposition, which
rearranges the order of characters rather than changing them; complex
systems often incorporate both substitution and transposition.)	 The key
(pun intended) question is how the key sequence for exclusive-oring is
generated.  It is usually desirable to transmit a relatively short key by
some secure means, and then mechanically transform that into a quasi-infinite
sequence that can be exclusive-ored with an arbitrarily long message.

If, as in your example, you merely repeat the short key over and over to
generate the key sequence, a knowledgeable high-school kid with a computer
to help can read your messages, if they are reasonably long.  This is a
"simple polyalphabetic" cipher; it is not terribly hard to solve.  Simply
put:  one guesses at the length of the short key, groups together all
characters enciphered with the same letter of the short key, and then sees
whether the resulting groups follow English-like letter-frequency patterns.
(One needs a reasonable number of letters per group, hence the requirement
for a substantial message.)  If the patterns look right, one has guessed
the key length right (or at least, one has a multiple of the right one).
Then one solves each group as a simple one-for-one substitution cipher.

Guessing key length may mean exhaustive search (or statistical methods that
give the same effect), but it is more common than you would think to have
words in the message repeat, lined up with the same part of the key.  The
result will be repetitions in the encrypted message, which will appear at
multiples of the key length.  These can offer strong hints; of course, one
*can* get repetitions by pure chance...

If you use the same key for several messages, then the codebreaker can
lump all the messages together for solution, giving larger groups and
quicker and more positive results.

Running the message through two successive encryptions with different keys
does not help much, because the result is equivalent to a single encryption
with a key whose length is the least common multiple of the key lengths.

If you use a very long or effectively infinite (e.g. the Encyclopedia
Brittanica) key, but it is English text, the patterns found in English
text can still be exploited to break your cipher.  It *is* harder.

The ultimate encryption scheme is the "one-time pad", in which you use
a non-repeating purely random key sequence which is never re-used.  (One
pre-computer way of implementing this is a pad of pre-printed key sheets,
which are torn off and destroyed as they are used; hence the name.)  The
one-time pad is provably unbreakable:  there is not enough information in
the encrypted stuff to recover the message, unless you have the key.  Its
drawback is that there is no "short key", and you need as much key text
as message text (remember, you can't re-use it).  This practical difficulty
has limited its use.

Most modern encryption schemes essentially use very complex transformations
to go from the short key to the key sequence, and often stir in some messy
transposition operations as well.  Beware that complexity doesn't necessarily
equal security, as witness the double-encryption case I mentioned above.
There is no substitute for advice from an expert.  (NB, I am not an expert.)
-- 
Support the International
League For The Derision		Henry Spencer @ U of Toronto Zoology
Of User-Friendliness!		{allegra,ihnp4,decvax,pyramid}!utzoo!henry

john@anasazi.UUCP (John Moore) (04/27/86)

In article <113@radha.UUCP> sanand@radha.UUCP (Sanand Patel) writes:
>Forgive my naivette, but someone mentioned that the U.S. Somebody or Other,
>has essentially no problems in breaking codes (grossly paraphrased) ...
>he mentioned the 'KAL' vs. Russia case.

That was a mistaken reference. In the KAL case, the transmissions were
in plaintext. The important point there was that the US in fact was
monitoring the transmissions at all. The US maintains an extensive
network of listening stations, and essentially records the entire
useful electromagnetic spectrum at each of these points. It is then an easy
matter to play back the tapes after an incident and extract the
suitable intelligence.
>
>[naive mode]
>
>Does this apply to the simple scheme of "exclusive or" against a large
>key (I know almost nothing about cryptology ...) 
>e.g.
>
>plaintext--> Some very important text that must be kept secret
>My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK
>
>Coded-------> Exclusive-Or of S+T, o+h, m+i etc.
>
>plaintext-->  Exclusive-Or of c1+T, c2+h , c3+i etc.
>on other side
>
>Now, not withstanding how the key is passed, is the above scheme breakable,
>especially with very large keys (say 100 or 200 letters) ?
>
This is a classic and well studied cryptosystem.
In modern cryptology, your examples  would be considered small keys and,
given enough ciphertext, they are trivially breakable by examining the
letter frequency statistics: once the key length has been determined 
(by simply assuming each possible length and then gathering letter f
requency  statistics, and chosing the least random length, the
letter frequency plus a good knowledge of the plaintext language
and a talent for crypography (or... a big computer) then solves
the cryptogram.
   Most modern crypto schemes attempt to do the equivalent of generating
a very large (millions of bits) key from a small one via a concealed
algorithm. Part of this key can be thought of as exclusive OR'ed
against the plaintext, and part of this can be thought of as a
permutation matrix to scramble the order of bits in the plaintext
(although lots of schemes don't bother with the permuting of
the bits). Even these schemes can be broken for surprisingly long
sequences (and who knows how long... those are extremely well guarded
secrets).

For a nice overview book on the subject, with lots of history, read
"The Codebreakers" by Kahn (available in paperback). There are also
a number of higher level texts available through your local university
bookstore. The DES algorithm, which is a US Government standard ,although
not considered secure enough for classified information, is used
for high-value electronic funds transfer. This algorithm is PUBLIC 
and in fact you can read the exact algorithm in modern texts or buy
a copy of the algorithm from the National Bureau of Standards. All that
is necessary to break this algorithm is to guess a 56 bit key, but whether
that can be done in real time without gigantic special purpose computers
is under debate. Take a look at it if you want to see a tough (to laymen)
modern cryptosystem.


-- 
John Moore (NJ7E/XE1HDO)
{decvax|ihnp4|hao}!noao!terak!anasazi!john
{hao!noao|decvax|ihnp4|seismo}!terak!anasazi!john
terak!anasazi!john@SEISMO.CSS.GOV
(602) 861-7607 (day or evening)
7525 Clearwater Pkwy, Paradise Valley, AZ, 85253 (Home Address)

The opinions expressed here are obviously not mine, so they must be
someone else's.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/28/86)

In article <113@radha.UUCP> sanand@radha.UUCP (Sanand Patel) writes:
>plaintext--> Some very important text that must be kept secret
>My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK
>
>Coded-------> Exclusive-Or of S+T, o+h, m+i etc.
>
>plaintext-->  Exclusive-Or of c1+T, c2+h , c3+i etc.
>on other side
>
>Now, not withstanding how the key is passed, is the above scheme breakable,
>especially with very large keys (say 100 or 200 letters) ?

How hard it is to break this depends on how much ciphertext is
available and how long the key is.  Step by step:

A simple form of Fourier analysis can fairly reliably determine the
period of the repeating key.  This allows us to "stack" statistics
for each cell of the cycle (this method is credited to Kasiski):

ThisIsMySecretKey	(key unknown except for length)

Some very importa	(actually, ciphertext is used here
nt text that must	 since the plaintext isn't yet known)
 be kept secret..

Each column gives a uniliteral frequency distribution.  Further,
if there is enough text, one may be able to determine that some
columns are encrypted under the same key (by cross-correlations,
known as a "chi" test after Friedman).  Such columns can be
merged into a super-stacked frequency distribution:
(For simplicity, I assume monocase text & key here.)

ThisMyecrK

Someer imr
nt tt hatu
 be ptsect
o. v.ap...
m.ex.t ...
e.ke..r...
...y..t...
...t..s...
... ......

If the statistics are good enough, you can assume the most frequent
letter in each column is "E", etc.  Or, assuming some simple method
such as cyclic (Caesar) addition or XOR was used to combine the key
with the plaintext, one may be able to check each column distribution
against the small number of possibilities (more correlation), and
thereby automatically determine the column keys.

In other words, this method is not very secure except for very short
messages per key.

Supposing one were to try to defeat Kasiski analysis by using a non-
repeating key?  There are two possibilities:  (1) random key, as in
the one-time pad; (2) nonrandom key, such as text from an agreed-upon
book.  In the latter case, one tries to simultaneously reconstruct
the key text and the plaintext; if you can guess a good crib (the
standard example is to assume the plaintext starts off "REFERENCE
YOUR MESSAGE ..."), then you can tell quickly that you got it right,
since the recovered key is readable.  It then is fairly easy to
simultaneously unravel the rest of the key and plaintext.

The second most hilarious encryption scheme I ever heard of used just
a short key to get started, then when the key ran out it used the
already-encrypted plaintext to provide the rest of the key, as in:

ThisIsMySecretKeySome very important text that mu.. (KEY)
Some very important text that must be kept secret.. (PLAINTEXT)

(The most hilarious scheme was similar, but instead used the
previous ciphertext as key for the remainder of the message!
Exercise for the student:  Why is this even funnier than the other?)

To be really secure without using a one-time key, you have to make
sure that the key is unintelligible AND that any structure it has
is insufficient to exploit by algebraic statistics.  For example,
a shift-register (CRC style) pseudo-random sequence has plenty of
internal structure and is unsafe for long message encryption.

os848@homxb.UUCP (M.AJEMIAN) (04/29/86)

There are several very good elementary (and not so elementary texts) on
cryptography:

D. Kahn, The Codebreakers, Macmillan Co.
	More of a historical book on the origins of modern cryptography,
	including a fascinating discussion of how the German's Enigma was
	broken during WWII.

S.M.Matyas and C.H.Meyer, Cryptography: A New Dimension in Computer Data
	Security, John Wiley & Sons, 1982.
	Very well-written intro to cryptographic techniques, including one
	of the best introductions to DES and RSA around.

D.W.Davies and W.L.Price, Security for Computer Networks, John Wiley & Sons,
	1984.
	A well-written book on network security, including an excellent
	discussion on key distribution.

Pat Wood
Pipeline Associates, Inc.

ihnp4!whuxn!phw5!phw
bellcore!phw5!phw

smb@ulysses.UUCP (Steven Bellovin) (04/29/86)

> plaintext--> Some very important text that must be kept secret
> My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK
> 
> Now, not withstanding how the key is passed, is the above scheme breakable,
> especially with very large keys (say 100 or 200 letters) ?

As has been pointed out, this scheme is very breakable for keys as *short*
as 200 characters...  The first general solution was derived circa 1860,
by a Prussian named Kasicki; William Friedman devised a better solution
around 1920.  For security, the key must be comparable in length to the
message itself, and without any pattern.  In the case you cite, for example,
A cryptanalyst who had made a partial entry into the key could look for
key letters that made that phrase legal English.

By far the best introduction to cryptology is David Kahn's "The Codebreakers".
If you want technical details, be sure to get the hardcover edition; the
paperback is seriously abridged.  Another good introduction (though of course
much less comprehensive) is the December 1979 issue of ACM Computer Surveys.
For a detailed mathematical treatment, see Beker and Piper's book (whose title
escapes me at the moment).

henry@utzoo.UUCP (Henry Spencer) (05/04/86)

> The second most hilarious encryption scheme I ever heard of used just
> a short key to get started, then when the key ran out it used the
> already-encrypted plaintext to provide the rest of the key...

This and variations on it are known as "auto-key" systems, as I recall.
They actually aren't bad if the bad guy has *no* *idea* that you might
be using such a scheme.  If he once suspects it -- either from hearing
about your scheme, or from breaking part of a message and noting the
self-similarity -- it's all over.

In a not-really-related vein, one way to make your messages hard to break
is to do data compression on them first.  Cryptanalysis exploits redundancy;
data compression considerably reduces redundancy, and makes the patterns
of the remaining redundancy much more subtle.  HOWEVER... don't use a
compression method that starts its output with a distinctive header, or
you've just handed any smart snoopers the keys to the vault.
-- 
Join STRAW: the Society To	Henry Spencer @ U of Toronto Zoology
Revile Ada Wholeheartedly	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

dhenson@islenet.UUCP (Donald D. Henson) (05/04/86)

> Now, not withstanding how the key is passed, is the above scheme breakable,
> especially with very large keys (say 100 or 200 letters) ?
> 
> ---
> --- {ihnp4,decvax,allegra,linus}!utzoo!dciem!radha!sanand
> --- 416-293-9722 ext248

Yes, easily.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/06/86)

In article <6649@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>They actually aren't bad if the bad guy has *no* *idea* that you might
>be using such a scheme.  If he once suspects it -- either from hearing
>about your scheme, or from breaking part of a message and noting the
>self-similarity -- it's all over.

As Henry probably knows, in practical cryptography the conservative
assumption is that the "enemy" knows or can find out the general
encryption system used; therefore, all security resides in the key.

davids@cit-vax.Caltech.Edu (David Schweizer) (05/07/86)

Organization : California Institute of Technology
Keywords: 

In article <13436@ucbvax.BERKELEY.EDU> desj@brahms.UUCP (David desJardins)
writes:
>In article <113@radha.UUCP> sanand@radha.UUCP (Sanand Patel) writes:
>>[naive mode]
>>
>>plaintext--> Some very important text that must be kept secret
>>My Key ----> ThisIsMySecretKeyThisIsMySecretKeyThisIsMySecretK
>>Coded-------> Exclusive-Or of S+T, o+h, m+i etc.
>>
>>Now, not withstanding how the key is passed, is the above scheme breakable,
>>especially with very large keys (say 100 or 200 letters) ?
>
>   Yes.  This is *extremely* easy to break, with only a very moderate amount
>of ciphertext, and *no* plaintext.  
>[more crypanalytic remarks]
>   No offense intended, but cryptography is a real field.  Why would anyone
>bother with complicated encryption algorithms if trivial things like this
>would do?  I suggest an elementary book on cryptography and cryptanalysis;
>I don't happen to know which ones are good, but maybe someone can suggest one?
>
>   -- David desJardins

Start with Kahn, of course, as other people have suggested.  Then, before
going on to recent Computer Science oriented texts on cryptography, you
might want to learn to break such ciphers *by hand*.  My favorite
introductory texts on cryptanalysis are:

Abraham Sinkov
Elementary Cryptanalysis   A Mathematical Approach
Mathematical Association of America, New Mathematical Library #22  (1966)

Helen Fouche Gaines
Cryptanalysis   A Study of Ciphers and Their Solution
Dover  (1956)

Both discuss polyalphabetic substitution.  The original copyright on 
Gaines' book is 1939, well before modern electronic aids.
-- 
                               --David Schweizer
ARPA:  davids@csvax.caltech.edu
UUCP:  ...!seismo!cit-vax!davids

paul@proper.UUCP (paul) (05/09/86)

I understand that compression into non-byte chunks makes it ***very*** hard
to decrypt with non-secure algorithms. Is this true?