leather@osiris.cso.uiuc.edu (01/09/88)
> /* Written 11:36 pm Jan 5, 1988 by leonard@bucket.UUCP ... */ > /* ---------- "how do you tell encrytped data from" ---------- */ > An interesting question has crossed my mind. If someone presents you with > an allegedly encrypted message, How can you tell if it really is encrypted > as opposed to being a bunch of random characters? > > [...] > Although the _suspicious_ nature of an even distribution of letters may be a give-away indication of *JUNK* messages for amateur and "semi-pro" encryption schemes, it cannot be absolute proof. I have dabbled some in the crypto-analysis field, myself. It is not difficult to think of some methods for creating *SMOOTH* distributin messages: 1. massage the message with your existing encryption scheme, 2. perform a distribution analysis on the encoded result, 3. obtain the complementary distribution analysis - for *SMOOTHNESS*, 4. apply a padding of additional characters to the previously encoded result, yielding the smoothed encrypted message: a. padding could be in the form of "encasing" the previously encoded result (either/both ends of message by line, block, ...), b. or "enbedding" the characters in some simple/complex pattern. The real question is why bother, except to annoy or distract? If you have a sufficiently effetive cypher to begin with, why add complexity to its encoding/decoding process? If you are going to hide a *REAL* message in a diversionary barrage of *JUNK* messages, you could just as well hide it in *JUNK* that used ordinary English, French, Russian, Chinese, ... sentence packages, in which case the _normal_ distribution patterns are going to lead to equally wrong conclusions. The point is, there are ever so many ways for going about this business of encryption that one analysis technique alone is insufficient to determine what *IS* or *IS_NOT* encrypted data. Even if one goes to the trouble of using a less complex technique for encryption of some message, who is to say that the _message_ was not the *JUNK*. GIGO (garbage in, garbage out)? It may be *JUNK*, but it _was_ encrypted! But then, .... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David A. Leatherman ~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (ARPA: leather@osiris.cso.uiuc.edu) ~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ leathrmn@uxe.cso.uiuc.edu ~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gordon@sneaky.UUCP (01/10/88)
> An interesting question has crossed my mind. If someone presents you with > an allegedly encrypted message, How can you tell if it really is encrypted > as opposed to being a bunch of random characters? You can't. And if I wanted to really screw things up, and had plenty of transmission bandwidth, I could use an encoding method that was mostly, but not completely noise. For example, each netnews article I post could have encoded in it one bit, which is either 1 or 0 depending on whether the number of lines in it plus the number of newsgroups it is cross-posted to is odd or even. I have to post about 200 netnews articles to transmit one sentence. I defy anyone to prove that there is really a message! For a less blatant waste of transmission bandwidth, how about using the "parity" bits only to contain the message, and the lower 7 bits contain random quotes from "talk.abortion" and "talk.bizarre"? > I know that transposition and *simple* substitution can be detected by > letter frequency analysis. But is a "flat" distibution evidence of random > data? No. Among other things, crypt(1) output gives a fairly flat distribution in its 256-character output alphabet. (the version of crypt I am thinking of is based on a WW-II rotor machine, and uses DES only to scramble the user's typed key. (You don't need to crack the DES to crack the encryption). This version was in System III, among others). In fact, it gives an absolutely flat distribution for any message consisting of one character repeated a multiple of 256 times and any key. (If not a multiple of 256, the distribution is as flat as you can get with that message length). You also get a flat distribution, more or less, by encoding techniques as "encode the i th character of the message by rotating it i positions in the alphabet", assuming the message was reasonably long. This doesn't even have an encryption key. A "real" encryption might use a keyword J characters long, and encode the i th character of the message by rotating it by its position in the message (i) plus an amount determined from the (i mod J)th character of the key. For messages that are long relative to the key length and the alphabet size, this will give a fairly flat distribution. You will be able to detect distribution patterns by taking every (J * alphabet size)th character, and looking at the frequency distribution for that. > For my purposes, both "one-time pad" ciphers and anything that operates on > units other than characters can be considered random! If it is that complex, > then I'm not likely to crack it! I'm not quite sure how to interpret this. Would this mean that running the message through uuencode(1) (which maps every 3 characters of 8 bits each into 4 characters out of a 6-bit 64-character alphabet) is too complex? It doesn't even have an encryption key! Running stuff through uuencode does tend to flatten out the distribution somewhat, but a message consisting mostly of the letter x will still have several high-frequency characters. To REALLY flatten out the distribution, there are several popular UNIX and DOS utilities to do this, under the classification "data compression programs". These include pack, compress, ARC, and compact. If you don't like the use of all 8 bits in the output, run the result through a filter to transform it to a smaller alphabet. These include things like uuencode/uudecode, btoa/atob (distributed with compress 4.0), encode/decode (distributed with B news 2.11), and, if you're really desparate, od, the octal dump program. (A few years back, someone posted a program to take an octal dump and re-create the file it was a dump of). These programs do not encrypt (that is, there is no encryption key, and just knowing which programs are used and in what order is sufficient to recover the plaintext. To get reasonable security, transform the text in several stages: plaintext -> compress -> your favorite encryption -> uuencode -> ciphertext Compress first, then encrypt. You save on message transmission costs and you mess up some clues used to attack the encryption - such as "all characters going into the encryption probably have the high-order bit turned off", because the compression removed much of the redundancy. Gordon Burditt ...!ihnp4!sys1!sneaky!gordon
truett@cup.portal.com (01/12/88)
Randomness or flatness of the probability distribution of the result is neither necessary nor sufficient for a good encryptor. First, it is not necessary. Consider the one time pad. In a binary stream, a one time pad will convert "random" streams to "nonrandom" ones just as often as it converts "nonrandom" (presumably plaintext) streams to apparently "random" ones. The only way to camouflage this effect is to increase the size of the apparent alphabet or state space of the message channel (somebody correct me if I'm wrong here.). It isn't sufficient either. Consider the linear congruential pseudorandum number generator: y = (ax + b) mod (n). For suitable a,b,n, you can use the successive characters of the message as values of x and the resulting stream of y values will be quite random. Another way is to use a data compression scheme like the ARC system used on many electronic bulletin board systems. The reduction of redundancy in a data stream makes it look more like noise than the original. Actually, good random number generators make quite good encryptors. So do ECC generators. Remember that Shannon demonstrated that the closer a noisy channel got to zero error, the more the data stream has to look like noise. Ergo: apply a really good ECC to the data stream (thus adding redundant data, see my comment above) and "BINGO" you have a good encryptor. Again, the answer is NO, you can't tell noise and encrypted data apart without a known encryption system. In the latter case, you tell by decrypting. Truett Lee Smith, Sunnyvale, CA UUCP: truett@cup.portal.com
leonard@bucket.UUCP (Leonard Erickson) (01/15/88)
In article <-63293657@sneaky> gordon@sneaky.UUCP writes:
<> I know that transposition and *simple* substitution can be detected by
<> letter frequency analysis. But is a "flat" distibution evidence of random
<> data?
<
<No. Among other things, crypt(1) output gives a fairly flat distribution
<in its 256-character output alphabet. (the version of crypt I am thinking of
<is based on a WW-II rotor machine, and uses DES only to scramble the user's
<typed key. (You don't need to crack the DES to crack the encryption). This
<version was in System III, among others). In fact, it gives an absolutely
<flat distribution for any message consisting of one character repeated a
<multiple of 256 times and any key. (If not a multiple of 256, the
<distribution is as flat as you can get with that message length).
<
<You also get a flat distribution, more or less, by encoding techniques
<as "encode the i th character of the message by rotating it i positions in
<the alphabet", assuming the message was reasonably long. This doesn't even
<have an encryption key. A "real" encryption might use a keyword J characters
<long, and encode the i th character of the message by rotating it by its
<position in the message (i) plus an amount determined from the (i mod J)th
<character of the key. For messages that are long relative to the key length
<and the alphabet size, this will give a fairly flat distribution. You will
<be able to detect distribution patterns by taking every (J * alphabet size)th
<character, and looking at the frequency distribution for that.
All the above examples, including the crypt that simulates a rotor machine
are polyalphabetic substitutions. I have program that will do analysis of
a text by calculating the variance for all the key lengths up to a user
specified limit. It outputs the length of with the "best fit".
On both random text and the suspect text, it always output the limit length
as the best fit. And the fit was none too good.
<I'm not quite sure how to interpret this. Would this mean that running the
<message through uuencode(1) (which maps every 3 characters of 8 bits each
<into 4 characters out of a 6-bit 64-character alphabet) is too complex? It
<doesn't even have an encryption key! Running stuff through uuencode does tend
<to flatten out the distribution somewhat, but a message consisting mostly of
<the letter x will still have several high-frequency characters.
<
<To REALLY flatten out the distribution, there are several popular UNIX and
<DOS utilities to do this, under the classification "data compression programs".
<These include pack, compress, ARC, and compact. If you don't like the
<use of all 8 bits in the output, run the result through a filter to transform
<it to a smaller alphabet. These include things like uuencode/uudecode,
<btoa/atob (distributed with compress 4.0), encode/decode (distributed with B
<news 2.11), and, if you're really desparate, od, the octal dump program.
<(A few years back, someone posted a program to take an octal dump and
<re-create the file it was a dump of).
Well, I've never even seen any _hints_ on how to attack a cipher where the
plaintext characters can't be "localized". Since you can't do statistical
analysis on the characters (at least not and get results that _mean_
anything!! :-) it would appear that you are reduced to trial and error.
Uuencode and atob are at the extreme edge of the ability range expected of
the probable originators of those old messages. Compress and ARC were
out of the question! (we're talking c-64s and BASIC here!).
A favorite that has a fair amount of popularity, is XORing the plaintext
with a key on a byte-by-byte basis. It is simple, and you decode by running
the ciphertext thru the encryptor again. Of course it is only as secure as
key.
You would also be surprised at how long you can stump even a skilled amateur
with something silly like switching character sets. (eg using EBCDIC where
ASCII is expected)
--
Leonard Erickson ...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/15/88)
In article <2415@cup.portal.com> truett@cup.portal.com writes: >In a binary stream, a one time pad will convert "random" streams to "nonrandom" >ones just as often as it converts "nonrandom" (presumably plaintext) streams to >apparently "random" ones. The only way to camouflage this effect ... This is quite misleading. There is no effect to "camouflage", since the so-called effect being described is the purely theoretical one where the entire space of possible messages is considered. Useful messages are a tiny fraction of this space. In practice, the cipher stream does not really exhibit "nonrandomness" (on the average) when the one-time pad is applied to random plaintext. >Actually, good random number generators make quite good encryptors. Actually, assuming you're still talking about such pseudo-random sequences as are produced by linear congruential feedback shift registers or other such simple schemes, they make TERRIBLE encryptors. They have far too much structure with far too small a key. Therefore they're quite easy to break, using the proper tools.
truett@cup.portal.com (01/18/88)
The idea that the set of possible plaintext messages is small relative to the set of possible messages is not always valid. How about the encryption of binary object codes or the vectorized files of the plans of a classified mechanical device? A good cryptographer must assume that any possible message has a chance of eventually appearing for encryption. So long as the encryption process is one-to-one between the spaces of possible plaintext messages and cyphertext messages, then "nonrandom" (whatever that means!) messages are encrypted into "rnadom" messages just as often as the "random" ones are encrypted into "nonrandom" ones. What *is* rare, so long as the nonrandom messages are a smaller space than the space of all messages, is the event in which a "nonrandom" message is encrypted into a "nonrandom" message. There are exceptions: remember what the Germans thought of Navajo? Truett Lee Smith, Sunnyvale, CA UUCP: truett@cup.portal.com