[sci.crypt] how do you tell encrytped data from

leather@osiris.cso.uiuc.edu (01/09/88)

> /* Written 11:36 pm  Jan  5, 1988 by leonard@bucket.UUCP ...   */
> /* ---------- "how do you tell encrytped data from" ---------- */
> An interesting question has crossed my mind. If someone presents you with
> an allegedly encrypted message, How can you tell if it really is encrypted
> as opposed to being a bunch of random characters?
> 
> [...] 
> 
	Although the _suspicious_ nature of an even distribution of letters
may be a give-away indication of *JUNK* messages for amateur and "semi-pro"
encryption schemes, it cannot be absolute proof.

	I have dabbled some in the crypto-analysis field, myself.  It is not
difficult to think of some methods for creating *SMOOTH* distributin messages:
	1.  massage the message with your existing encryption scheme,
	2.  perform a distribution analysis on the encoded result,
	3.  obtain the complementary distribution analysis - for *SMOOTHNESS*,
	4.  apply a padding of additional characters to the previously
 	    encoded result, yielding the smoothed encrypted message:
		a.  padding could be in the form of "encasing" the previously
		    encoded result (either/both ends of message by line, block,
		    ...),
		b.  or "enbedding" the characters in some simple/complex 
                    pattern.

	The real question is why bother, except to annoy or distract?
If you have a sufficiently effetive cypher to begin with, why add complexity
to its encoding/decoding process?  If you are going to hide a *REAL* message
in a diversionary barrage of *JUNK* messages, you could just as well hide it
in *JUNK* that used ordinary English, French, Russian, Chinese, ... sentence
packages, in which case the _normal_ distribution patterns are going to lead
to equally wrong conclusions.  The point is, there are ever so many ways for
going about this business of encryption that one analysis technique alone is
insufficient to determine what *IS* or *IS_NOT* encrypted data.  Even if one
goes to the trouble of using a less complex technique for encryption of some
message, who is to say that the _message_ was not the *JUNK*.  GIGO (garbage
in, garbage out)?  It may be *JUNK*, but it _was_ encrypted!  But then, ....

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  David A. Leatherman                  ~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  (ARPA: leather@osiris.cso.uiuc.edu)  ~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~         leathrmn@uxe.cso.uiuc.edu     ~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

gordon@sneaky.UUCP (01/10/88)

> An interesting question has crossed my mind. If someone presents you with
> an allegedly encrypted message, How can you tell if it really is encrypted
> as opposed to being a bunch of random characters?

You can't.  And if I wanted to really screw things up, and had plenty of
transmission bandwidth, I could use an encoding method that was mostly,
but not completely noise.  For example, each netnews article I post could 
have encoded in it one bit, which is either 1 or 0 depending on whether the
number of lines in it plus the number of newsgroups it is cross-posted to
is odd or even.  I have to post about 200 netnews articles to transmit one 
sentence.  I defy anyone to prove that there is really a message!
For a less blatant waste of transmission bandwidth, how about using the
"parity" bits only to contain the message, and the lower 7 bits contain
random quotes from "talk.abortion" and "talk.bizarre"?

> I know that transposition and *simple* substitution can be detected by
> letter frequency analysis. But is a "flat" distibution evidence of random
> data?

No.  Among other things, crypt(1) output gives a fairly flat distribution
in its 256-character output alphabet.  (the version of crypt I am thinking of 
is based on a WW-II rotor machine, and uses DES only to scramble the user's 
typed key.  (You don't need to crack the DES to crack the encryption).  This 
version was in System III, among others).  In fact, it gives an absolutely 
flat distribution for any message consisting of one character repeated a 
multiple of 256 times and any key.  (If not a multiple of 256, the 
distribution is as flat as you can get with that message length).

You also get a flat distribution, more or less, by encoding techniques 
as "encode the i th character of the message by rotating it i positions in 
the alphabet", assuming the message was reasonably long.  This doesn't even 
have an encryption key.  A "real" encryption might use a keyword J characters
long, and encode the i th character of the message by rotating it by its
position in the message (i) plus an amount determined from the (i mod J)th 
character of the key.  For messages that are long relative to the key length 
and the alphabet size, this will give a fairly flat distribution.  You will
be able to detect distribution patterns by taking every (J * alphabet size)th
character, and looking at the frequency distribution for that.  

> For my purposes, both "one-time pad" ciphers and anything that operates on
> units other than characters can be considered random! If it is that complex,
> then I'm not likely to crack it!

I'm not quite sure how to interpret this.  Would this mean that running the
message through uuencode(1) (which maps every 3 characters of 8 bits each
into 4 characters out of a 6-bit 64-character alphabet) is too complex?  It 
doesn't even have an encryption key!  Running stuff through uuencode does tend 
to flatten out the distribution somewhat, but a message consisting mostly of
the letter x will still have several high-frequency characters.

To REALLY flatten out the distribution, there are several popular UNIX and
DOS utilities to do this, under the classification "data compression programs".
These include pack, compress, ARC, and compact.  If you don't like the
use of all 8 bits in the output, run the result through a filter to transform
it to a smaller alphabet.  These include things like uuencode/uudecode, 
btoa/atob (distributed with compress 4.0), encode/decode (distributed with B 
news 2.11), and, if you're really desparate, od, the octal dump program.
(A few years back, someone posted a program to take an octal dump and 
re-create the file it was a dump of).  

These programs do not encrypt (that is, there is no encryption key, and just
knowing which programs are used and in what order is sufficient to recover
the plaintext.  To get reasonable security, transform the text in several
stages:

plaintext -> compress -> your favorite encryption -> uuencode -> ciphertext

Compress first, then encrypt.  You save on message transmission costs and
you mess up some clues used to attack the encryption - such as "all characters
going into the encryption probably have the high-order bit turned off", because
the compression removed much of the redundancy.

					Gordon Burditt
					...!ihnp4!sys1!sneaky!gordon

truett@cup.portal.com (01/12/88)

Randomness or flatness of the probability distribution of the result is neither
necessary nor sufficient for a good encryptor.

First, it is not necessary.  Consider the one time pad.  In a binary stream, a
one time pad will convert "random" streams to "nonrandom" ones just as often as
it converts "nonrandom" (presumably plaintext) streams to apparently "random"
ones.  The only way to camouflage this effect is to increase the size of the
apparent alphabet or state space of the message channel (somebody correct me if
I'm wrong here.).

It isn't sufficient either.  Consider the linear congruential pseudorandum
number generator:  y = (ax + b) mod (n).  For suitable a,b,n, you can use the
successive characters of the message as values of x and the resulting stream of
y values will be quite random.  Another way is to use a data compression scheme
like the ARC system used on many electronic bulletin board systems.  The
reduction of redundancy in a data stream makes it look more like noise than the
original.

Actually, good random number generators make quite good encryptors.  So do
ECC generators.  Remember that Shannon demonstrated that the closer a noisy
channel got to zero error, the more the data stream has to look like noise.
Ergo:  apply a really good ECC to the data stream (thus adding redundant data,
see my comment above) and "BINGO" you have a good encryptor.

Again, the answer is NO, you can't tell noise and encrypted data apart without
a known encryption system.  In the latter case, you tell by decrypting.

Truett Lee Smith, Sunnyvale, CA
UUCP:  truett@cup.portal.com

leonard@bucket.UUCP (Leonard Erickson) (01/15/88)

In article <-63293657@sneaky> gordon@sneaky.UUCP writes:
<> I know that transposition and *simple* substitution can be detected by
<> letter frequency analysis. But is a "flat" distibution evidence of random
<> data?
<
<No.  Among other things, crypt(1) output gives a fairly flat distribution
<in its 256-character output alphabet.  (the version of crypt I am thinking of 
<is based on a WW-II rotor machine, and uses DES only to scramble the user's 
<typed key.  (You don't need to crack the DES to crack the encryption).  This 
<version was in System III, among others).  In fact, it gives an absolutely 
<flat distribution for any message consisting of one character repeated a 
<multiple of 256 times and any key.  (If not a multiple of 256, the 
<distribution is as flat as you can get with that message length).
<
<You also get a flat distribution, more or less, by encoding techniques 
<as "encode the i th character of the message by rotating it i positions in 
<the alphabet", assuming the message was reasonably long.  This doesn't even 
<have an encryption key.  A "real" encryption might use a keyword J characters
<long, and encode the i th character of the message by rotating it by its
<position in the message (i) plus an amount determined from the (i mod J)th 
<character of the key.  For messages that are long relative to the key length 
<and the alphabet size, this will give a fairly flat distribution.  You will
<be able to detect distribution patterns by taking every (J * alphabet size)th
<character, and looking at the frequency distribution for that.  
 
All the above examples, including the crypt that simulates a rotor machine
are polyalphabetic substitutions. I have program that will do analysis of
a text by calculating the variance for all the key lengths up to a user 
specified limit. It outputs the length of with the "best fit". 

On both random text and the suspect text, it always output the limit length
as the best fit. And the fit was none too good.

<I'm not quite sure how to interpret this.  Would this mean that running the
<message through uuencode(1) (which maps every 3 characters of 8 bits each
<into 4 characters out of a 6-bit 64-character alphabet) is too complex?  It 
<doesn't even have an encryption key!  Running stuff through uuencode does tend 
<to flatten out the distribution somewhat, but a message consisting mostly of
<the letter x will still have several high-frequency characters.
<
<To REALLY flatten out the distribution, there are several popular UNIX and
<DOS utilities to do this, under the classification "data compression programs".
<These include pack, compress, ARC, and compact.  If you don't like the
<use of all 8 bits in the output, run the result through a filter to transform
<it to a smaller alphabet.  These include things like uuencode/uudecode, 
<btoa/atob (distributed with compress 4.0), encode/decode (distributed with B 
<news 2.11), and, if you're really desparate, od, the octal dump program.
<(A few years back, someone posted a program to take an octal dump and 
<re-create the file it was a dump of).  

Well, I've never even seen any _hints_ on how to attack a cipher where the 
plaintext characters can't be "localized". Since you can't do statistical
analysis on the characters (at least not and get results that _mean_ 
anything!! :-) it would appear that you are reduced to trial and error.

Uuencode and atob are at the extreme edge of the ability range expected of
the probable originators of those old messages. Compress and ARC were 
out of the question! (we're talking c-64s and BASIC here!).  

A favorite that has a fair amount of popularity, is XORing the plaintext
with a key on a byte-by-byte basis. It is simple, and you decode by running
the ciphertext thru the encryptor again. Of course it is only as secure as
key.

You would also be surprised at how long you can stump even a skilled amateur
with something silly like switching character sets. (eg using EBCDIC where
ASCII is expected)      
-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."

gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/15/88)

In article <2415@cup.portal.com> truett@cup.portal.com writes:
>In a binary stream, a one time pad will convert "random" streams to "nonrandom"
>ones just as often as it converts "nonrandom" (presumably plaintext) streams to
>apparently "random" ones.  The only way to camouflage this effect ...

This is quite misleading.  There is no effect to "camouflage", since
the so-called effect being described is the purely theoretical one
where the entire space of possible messages is considered.  Useful
messages are a tiny fraction of this space.  In practice, the cipher
stream does not really exhibit "nonrandomness" (on the average) when
the one-time pad is applied to random plaintext.

>Actually, good random number generators make quite good encryptors.

Actually, assuming you're still talking about such pseudo-random
sequences as are produced by linear congruential feedback shift
registers or other such simple schemes, they make TERRIBLE encryptors.
They have far too much structure with far too small a key.  Therefore
they're quite easy to break, using the proper tools.

truett@cup.portal.com (01/18/88)

The idea that the set of possible plaintext messages is small relative to the
set of possible messages is not always valid.  How about the encryption of
binary object codes or the vectorized files of the plans of a classified
mechanical device?  A good cryptographer must assume that any possible message
has a chance of eventually appearing for encryption.

So long as the encryption process is one-to-one between the spaces of possible
plaintext messages and cyphertext messages, then "nonrandom" (whatever that
means!) messages are encrypted into "rnadom" messages just as often as the
"random" ones are encrypted into "nonrandom" ones.  What *is* rare, so long as
the nonrandom messages are a smaller space than the space of all messages, is
the event in which a "nonrandom" message is encrypted into a "nonrandom"
message.  There are exceptions:  remember what the Germans thought of Navajo?

Truett Lee Smith, Sunnyvale, CA
UUCP:  truett@cup.portal.com