leonard@bucket.UUCP (Leonard Erickson) (01/06/88)
An interesting question has crossed my mind. If someone presents you with an allegedly encrypted message, How can you tell if it really is encrypted as opposed to being a bunch of random characters? I know that transposition and *simple* substitution can be detected by letter frequency analysis. But is a "flat" distibution evidence of random data? For my purposes, both "one-time pad" ciphers and anything that operates on units other than characters can be considered random! If it is that complex, then I'm not likely to crack it! This is inspired by my fiinding some notes I made a year or so ago when I and some friends were exchanging encrypted msgs over a public BBS. A few of our friends cracked or first efforts and a typical "arms race" of better ciphers followed by better analytic techniques ensued. We finally came (after about 5 changes) came up with a scheme that only one outsiders could read part of. Some of the other users started using their own ciphers. The sysop didn't mind until someone started posting HUGE messages which he suspected were just garbage. But the challenge was, were they garbage or real messages? Under his rules, messages (even encrypted!) were ok, junk wasn't. I made a few attempts to analyze them, but while I was fairly certain, I could never be sure. (when an 8k msg uses the 26 letters so evenly that the spread better most used and least used is 12, you get *real* suspicious :-) -- Leonard Erickson ...!tektronix!reed!percival!bucket!leonard CIS: [70465,203] "I used to be a hacker. Now I'm a 'microcomputer specialist'. You know... I'd rather be a hacker."
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/08/88)
In article <660@bucket.UUCP> leonard@bucket.UUCP (Leonard Erickson) writes: >An interesting question has crossed my mind. If someone presents you with >an allegedly encrypted message, How can you tell if it really is encrypted >as opposed to being a bunch of random characters? The only foolproof way is to decrypt it. >I know that transposition and *simple* substitution can be detected by >letter frequency analysis. But is a "flat" distibution evidence of random >data? No, most really good encryption systems pretty much randomizes the cipher stream, insofar as simple statistical tests can determine. Conversely, just because some putative cipher stream appears nonrandom does not mean that it has meaningful underlying plaintext. It could be a hoax. There are those who think the two undeciphered Beale cryptograms are hoaxes. >For my purposes, both "one-time pad" ciphers and anything that operates on >units other than characters can be considered random! If it is that complex, >then I'm not likely to crack it! You can't crack a true one-time pad system by analysis anyway, no matter how competent a cryptanalyst you are. >(when an 8k msg uses the 26 letters so evenly that the >spread better most used and least used is 12, you get *real* suspicious :-) Flatter-than-random distributions are not necessarily hoaxes. They're actually pretty easy to obtain; Mil Cryp has an example.
mjr@osiris.UUCP (Marcus J. Ranum) (01/10/88)
Reminds me of a wonderful concept in a book by Stanitslaw Lem (Memoirs Found in a Bathtub): The underground of the Pentagon is all that remains after the end of the world, but the spooks are sure they have been infiltrated. One of the best tools they have is a super-dooper computer than can decrypt *ANYTHING* if it tries hard enough. In fact, it can take plain text, and find the secret and subversive messages hidden in it - even if there weren't any to begin with... Another problem with Gywn's suggestion that the only way to tell if it is encrypted (decrypt it) - how can you tell that it is really "meaningful" ? Anyone who has listened to first year latin students trying to "decrypt" latin will realize that garbage in does not always imply the same garbage out. --mjr(); -- ------------------------------------------------------------------------------ ...ich bin in einem dusenjet ins jahr 53 vor chr...ich lande im antiken Rom... einige gladiatoren spielen scrabble...ich rieche PIZZA...
hes@ecsvax.UUCP (Henry Schaffer) (01/10/88)
These days it is quite common to compress text (and other files too.) The various compressions schemes such as Huffman encoding and Lempel- Ziv-Welch tend to even out (smooth out) the character distribution (i.e., they hope to maximize the information transmitted per character, or per byte). If this file is then encrypted, the smoothness may end up in the encrypted data. Conversly, random data need not have a smooth character distribution. (It is common to confuse "random" with "equi-probable". However, even though most examples given in courses of "random" are "equi-probable there is no necessary connection.) --henry schaffer n c state univ
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/11/88)
In article <1499@osiris.UUCP> mjr@osiris.UUCP (Marcus J. Ranum) writes: >Another problem with Gywn's suggestion that the only way to tell if it >is encrypted (decrypt it) - how can you tell that it is really "meaningful"? It's not that hard in practice. Witness the following, which have been uniquely cryptanalyzed into plaintext which all involved agreed was correct (and in cases where it could be checked with the encryptor, its correctness was verified): (a) the Chaucer transposition with spelling variations that was posted to this newsgroup not long ago; (b) "challenger" cryptograms Some of the latter are extremely short, and have plaintext that nobody in his right mind would really use. I've personally broken quite short messages, some 25 letters long, encrypted by some of the simpler methods. Usually, student cryppies would all get essentially the same result (sometimes off in one or two letters) for cryptanalysis of practice cryptograms, if they were able to crack them at all. It is extremely hard to make up a cryptogram of modest length that has two valid but different decryptions. In fact, I don't know of any of length more than a couple of dozen letters, although I can imagine some ways to try to do this for somewhat longer messages. This would make an interesting puzzle for someone. On the other hand, there are people such as those who find Baconian ciphers in Shakespeare's works, who put the message there themselves. It's pretty easy to tell when this happens; either the decipherment method is ambiguous, so the message could never have been realistically expected to be read even by its intended recipient, or so many "corrections" and otherwise unjustified special twists on the decipherment scheme have been introduced that the information content of the decipherment rivals that of the supposed message being extracted. It is interesting to note that William F. Friedman, perhaps the greatest cryptanalyst of all time, and his future wife got their start in the field by being employed at "Col." Fabyan's Riverbank Labs, which was trying to find just such hidden messages in Shakespeare. The Friedmans showed how the same methodology could be used to produce messages with precisely opposite meaning, i.e. the methodology was making up the whole content of the messages based on the analyst's preconceptions about what was to be found. On the other side of this issue, several rum-runner codes (generally much more ambiguous to crack than ciphers) were broken by the Friedmans, and they were able to demonstrate their methodology satisfactorily to the federal courts.
karn@faline.bellcore.com (Phil R. Karn) (01/12/88)
Re random and encrypted messages: There is a strong connection between randomness and cryptography. "Randomness" is in the eye of the beholder. All it means for you to say that something is "random" is that *you* can't predict what the next element in the sequence is, not that it doesn't contain some underlying redundancy that someone else with additional information (or computing power) might discover. It therefore follows that any strong encryption algorithm would necessarily produce what appeared to be totally random ciphertext. Feed a counter into DES and you have an excellent random number generator; unfortunately it is usually too slow to be practical. > On the other hand, there are people such as those who find Baconian > ciphers in Shakespeare's works, who put the message there themselves. > It's pretty easy to tell when this happens; either the decipherment > method is ambiguous, so the message could never have been > realistically expected to be read even by its intended recipient, > or so many "corrections" and otherwise unjustified special twists > on the decipherment scheme have been introduced that the > information content of the decipherment rivals that of the supposed > message being extracted. Quite true. I know I'm going to get flamed for this, but this is *precisely* what people do all the time with the Bible. Phil
chapman@sco.COM (Brian Chapman Mx321) (01/19/88)
In article <660@bucket.UUCP> leonard@bucket.UUCP (Leonard Erickson) writes:
< An interesting question has crossed my mind. If someone presents you with
< an allegedly encrypted message, How can you tell if it really is encrypted
< as opposed to being a bunch of random characters?
It is the goal of any good encryption technique, in this class
of techniques, to make the data look random.
Any sort of pattern in the encrypted data is a crack you can
drive a decryption wedge into.
< I know that transposition and *simple* substitution can be detected by
< letter frequency analysis. But is a "flat" distibution evidence of random
< data?
< For my purposes, both "one-time pad" ciphers and anything that operates on
< units other than characters can be considered random! If it is that complex,
< then I'm not likely to crack it!
Yet UNIX crypt is a simple substitution code with the extra twists
that the offset in the file is added to the character (mod 256)
and you change substution tables every 256 characters.
This tends to flatten out a naively taken distrubution.
< (when an 8k msg uses the 26 letters so evenly that the spread
< better most used and least used is 12, you get *real* suspicious :-)
^^^^^^ you mean "between"?
You mean the character counts vary between 309 and 321.
Sounds about right to me for random looking data.
Artificialy flat distributions mean that the data is
either a fraud or the encryption method pads with extra
characters.
--
uunet!-\
Brian Chapman microsof!-->sco!chapman
ihnp4!-/
Pay no attention to the man behind the curtain! -- The Great & Powerfull Oz
hes@ecsvax.UUCP (Henry Schaffer) (01/20/88)
In article <275@sysco>, chapman@sco.COM (Brian Chapman Mx321) writes: > In article <660@bucket.UUCP> leonard@bucket.UUCP (Leonard Erickson) writes: > < An interesting question has crossed my mind. If someone presents you with > < an allegedly encrypted message, How can you tell if it really is encrypted > < as opposed to being a bunch of random characters? > ... > You mean the character counts vary between 309 and 321. > Sounds about right to me for random looking data. I'm taking "random" as meaning multinomial with equi-probability. In this case - looking at any one character should be found about 8k/26=307 times, with a standard deviation of about 17. That makes a spread of 12 between the most and least frequent of the 26 look quite small, indeed. > Artificialy flat distributions mean that the data is > either a fraud or the encryption method pads with extra > characters. Hmm, is this correct? I don't know how flat the output of something like Huffman encoding can be. I know it isn't encryption - but perhaps something like this in conjunction with encryption would give a flatter-than-random character distribution without character padding. > -- > Brian Chapman microsof!-->sco!chapman --henry schaffer n c state univ
dunc%moria@Sun.COM (duncs home) (01/20/88)
In article <660@bucket.UUCP> leonard@bucket.UUCP (Leonard Erickson) writes: > ... >I know that transposition and *simple* substitution can be detected by >letter frequency analysis. But is a "flat" distibution evidence of random >data? No. Consider this simple substitution cipher: 'a' = 'abcdefghijklmnopqrstuvwxyz' 'b' = 'bcdefghijklmnopqrstuvwxyza' 'c' = 'cdefghijklmnopqrstuvwxyzab' ... 'z' = 'zabcdefghijklmnopqrstuvwxy' The frequency distribution of the letters will be absolutely flat, yet it does contain a message. This shows that a flat distribution does not mean absence of a message (and also that flat distribution, by itself, does not guarantee a good cipher!)
mlm@NL.CS.CMU.EDU (Michael Mauldin) (01/21/88)
In article <275@sysco>, chapman@sco.COM (Brian Chapman Mx321) writes: > Artificialy flat distributions mean that the data is > either a fraud or the encryption method pads with extra > characters. The encryption algorithm might very easily generate purposefully "flat" distributions by a variety of methods. You could switch tables after every block to shuffle "overused" bytes to less commonly used meanings. You can even guarantee perfectly flat counts by not altering the "meaning" of each byte. This turns out to require extra bytes, since once you make a decision that "I've used \035 enough, now", the remaining bytes encode fewer bits. Still, the result would be neither fraudulent nor padded with extra characters (the extra bytes would be "necessary"). If you got a really good encryption algorithm, this would be a pointless add-on feature, but that doesn't mean that seomone else isn't doing it. Michael L. Mauldin (Fuzzy) Department of Computer Science ARPA: Michael.Mauldin@NL.CS.CMU.EDU Carnegie-Mellon University Phone: (412) 268-3065 Pittsburgh, PA 15213-3890
jackm@devvax.JPL.NASA.GOV (Jack Morrison) (01/23/88)
In article <39394@sun.uucp> dunc@sun.UUCP (duncs home) writes: >In article <660@bucket.UUCP> leonard@bucket.UUCP (Leonard Erickson) writes: >> ... >>I know that transposition and *simple* substitution can be detected by >>letter frequency analysis. But is a "flat" distibution evidence of random >>data? > >No. Consider this simple substitution cipher: > >'a' = 'abcdefghijklmnopqrstuvwxyz' >'b' = 'bcdefghijklmnopqrstuvwxyza' >'c' = 'cdefghijklmnopqrstuvwxyzab' > ... >'z' = 'zabcdefghijklmnopqrstuvwxy' > >The frequency distribution of the letters will be absolutely flat, yet it >does contain a message. This shows that a flat distribution does not mean >absence of a message (and also that flat distribution, by itself, does not >guarantee a good cipher!) Cute, but keep in mind that a "distribution" can be more than just a histogram of letter frequencies; the example cipher would have a very lopsided distibution in other measurements, for example digram frequencies. (I realize the proposal was not meant as a good cipher, and I'm just adding to the discussion; this is NOT a flame, I just don't see where to stick the :-) !) [What? [The [followup [has [to [be [longer [than [the [original? [Than [eat [this!]-- Jack C. Morrison Jet Propulsion Laboratory (818)354-1463 jackm@jpl-devvax.jpl.nasa.gov "The paycheck is part government property, but the opinions are all mine."