bet@duke.UUCP (Bennett E. Todd III) (07/04/85)
Trivial encryption algorithms are attacked by analysis of frequency distributions, more or less. Wouldn't a simpleminded substitution or transposition algorithm be beefed up to the point of requiring search of the key space by applying a good compression program to the plaintext first? The whole point to data compression is the redundancy of tokens, and removing it by recoding the data to make all tokens equally likely. If I were to use a simple substitution cypher with an arbitrary premutation of bytes on the output of a good compression program how could the resulting file be attacked? -Bennett-- Bennett Todd -- Duke Computation Center, Durham, NC 27706 -- +1 919 684 3695 UUCP - bet@ecsvax, bet@duke, ...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/05/85)
> If I were to use a simple substitution cypher with an > arbitrary premutation of bytes on the output of a good compression > program how could the resulting file be attacked? The general idea of removing redundancy to make cryptanalysis harder is a good one. However, most data compression schemes amount to fairly simple encryptors (e.g. each 8-bit input chunk might be taken to a constant output chunk of width from 2 to 8 bits, with a prepended mapping table). Methods for dealing with this form of variable-length encryption exist and could be re-invented if necessary. The weak point of the proposed scheme seems to be in conveying the permutation and compression keys; if the permutation is not changed for each message, then it can be broken through by a "stacking" scheme. In any case, you have to expect that any constant elements of an encryption scheme are potentially known to the cryptanalyst. So, if you enclose permutation information with the ciphertext, it will be a simple matter for the analyst to extract it and untranspose the text the same way the intended recipient would. Then he just has to work on the compression scheme through an overlaid simple substitution. Although he won't be able to apply frequency distribution tests to strip the substitution, these are not usually heavily used in breaking simple substitution anyhow. If I had to analyze such a message, I'd probably try to break out the compression table at this point and analyze possible substitutions against a "crib" assumed to be present in the message (like, "From: ..." at the front of the plaintext). It should be possible to rule out all but a handful of possible substitutions this way. Then the rest of the task is just extending the meaning of the plaintext a bit at a time while checking for consistency against what is already known. In other words, it would be a lot of work but most cryptanalysis is like that.
bet@duke.UUCP (Bennett E. Todd III) (07/06/85)
Assuming of course that the algorithm is public knowledge, then in any encryption scheme not based on a "trapdoor" (and all of those I am aware of are computationally abusive to the *users* of the system by the time they are intricate enough to be secure), the key must be communicated separately from the message, and securely. If I were to take the output of the LZ-compression program compress(1) which has been posted to the net recently (no frequency tables), and permute the bytes in it with a systematic shuffle, and then maybe apply a substitution throughout to ward off any brute force efforts, then I would consider the shuffle and substitution to compose the key -- I visit the friend in question, put on some Police real loud, and whisper to him/her the shuffle that I will use. Has some part of this procedure violated the current rules about what constitutes an encryption system worthy of note? I realize I haven't addressed the key distribution problem, but then if I meet my friend *once*, then I can include a shuffle definition and substitution in the plaintext, and the friend will know to use them to decrypt the *next* message. This all seems far more simple-minded than the intricate number-theoretical approaches currently being used in cryptology, and for the life of me I can't see what even the NSA and a flock of Crays could do about it. Think about it: I give you a binary file, and tell you that if you shuffle the bytes in the file in some fashion, and apply some substitution to them, then you can run the result through compress(1) and read the document. The transposition and substitution (neither adequate to ward of determined attack alone) are sitting on top of a uniform distribution of bytes, +/- some epsilon relatively independent of the source text. Given a permutation chosen to keep bytes within say 32K chunks, none of the operations involved is particularly slow. Am I missing something? (besides any backgound whatsoever in cryptology, thanks:-). -Bennett-- Bennett Todd -- Duke Computation Center, Durham, NC 27706 -- +1 919 684 3695 UUCP - bet@ecsvax, bet@duke, ...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/07/85)
> ... I realize I haven't addressed the > key distribution problem, but then if I meet my friend *once*, then I > can include a shuffle definition and substitution in the plaintext, > and the friend will know to use them to decrypt the *next* message. No, no! Once the "bad guy" breaks ANY message, all subsequent ones would come unraveled like a chain stitch. Key distribution really is a major problem in data cryptosystems. The simpler the enciphering algorithm, the more key that has to be supplied to maintain reasonable security.
jim@randvax.UUCP (Jim Gillogly) (07/08/85)
In article <5992@duke.UUCP> bet@ecsvax.UUCP (Bennett E. Todd III) writes: > Wouldn't a simpleminded substitution or >transposition algorithm be beefed up to the point of requiring search >of the key space by applying a good compression program to the >plaintext first? The algorithm would certainly be strengthened (although probably not to the point of requiring search of the entire key space). However, you need to be a little careful. Some compression programs (e.g. "pack", a Huffman-coding program) will take a first pass through the data to assign the variable-length codes, put out the decoding tree at the beginning of the output file, then follow with the compressed data. To attack a simple subsititution of this (or perhaps even the result of running an XOR'ed shift register (or other "random number" stream) across it), the structure of the decoding tree could be observed from the program, and compared with the result. > If I were to use a simple substitution cypher with an >arbitrary premutation of bytes on the output of a good compression >program how could the resulting file be attacked? I'm not sure what you mean by an "arbitrary" permutation of the bytes. If you mean a fixed permutation like the initial and final (bitwise) permutations of the DES, there wouldn't be any additional security in the permutation, since we assume it's known to the cryptanalyst. If you mean picking a permutation based on some key-controlled "random number" stream, that will certainly add strength. Let's take the simple sub first. Simple substitution alone might be crackable given some assumptions about the underlying text. For example, if we assume English written in ASCII we can try a number of decoding trees based on standard English, including branches for situations where there are close choices. I believe there are a lot fewer likely decoding trees than possible keys. Spaces, for example, would have a short bit string and would happen frequently, so when we start getting close on the high frequency letters and digraphs we'd start getting reasonable-looking output. There will be more choices for low-freq letters and digraphs, but they will also show up in the text less often to mess us up; and when they do, they're likely to have similar-length encoding strings. A transposition of the kind you describe would probably be attackable with a chosen-plaintext attack, or maybe even known-plaintext. The chosen- plaintext approach is the nastiest possible attack for the cryptanalyst to make, since it assumes that not only does he know the correspondence between plaintext and ciphertext, but he can also control what plaintext is to be enciphered. This is sometimes the case in a database application, for example: I send an invoice to somebody who will be putting it into a database, and I include in my address (perhaps) some information whose encryption will tell me what I need to know. In _The Codebreakers_ somewhere David Kahn tells about a code system that was giving trouble ... the cryptanalysts produced a memo that included some words that were in doubt, leaked it to the target agents, and then read the encryption as it got sent verbatim to home base. All this is to point out that chosen-plaintext is not out of the question as an attack. In any case, if one is trying to figure out how a transposition works and has the luxury of chosen-plaintext, one can put zeroes everywhere except in one location, and see where it goes; put it everywhere possible and you've unwound the transposition for that block. If all blocks use the same transposition, you're done. If not, knowing how the transposition is produced may give some insight into the random number stream, which may be broken by as few numbers as have been used to produce this particular transposition. So compression alone won't be enough to turn a weak system into an "unbreakable" (modulo exhaustive key search) one. Note that the DES is not known (by me, anyway) to be subject to any of these attacks (including the most powerful chosen-plaintext attack). -- Jim Gillogly {decvax, vortex}!randvax!jim jim@rand-unix.arpa
tli@oberon.UUCP (Tony Li) (07/09/85)
The problem with your beautiful theory is that you haven't changed the frequency of the characters in your text. By running compress on the plaintext, you've simply inserted a substitution cipher before your shuffle cipher. Once the natural language of the plaintext becomes known, it should quickly become obvious to the cryptographer that your character frequencies are incorrect. This leads immediately to the idea of a substitution. May I suggest Gaines as an excellent place to begin your reading? Even the introductory chapters and a pocket calculator are useful for simple cryptograms. Cheers, Tony ;-) -- Tony Li ;-) Usc Computer Science Uucp: ...!{{decvax,ucbvax}!sdcsvax,hplabs,allegra,trwrb}!sdcrdcf!uscvax!tli Bitnet: tli@uscvaxq, tli@jaxom, tli@ramoth Csnet: tli@usc-cse.csnet Arpa: tli@usc-ecl.arpa
jim@randvax.UUCP (Jim Gillogly) (07/10/85)
In article <78@oberon.UUCP> tli@oberon.UUCP (Tony Li) writes: >The problem with your beautiful theory is that you haven't changed the >frequency of the characters in your text. By running compress on the >plaintext, you've simply inserted a substitution cipher before your shuffle >cipher. Once the natural language of the plaintext becomes known, it should >quickly become obvious to the cryptographer that your character frequencies >are incorrect. This leads immediately to the idea of a substitution. Doug Gwyn and I posted suggestions on how to attack this kind of cipher, but the solution here isn't as easy as he makes it look. Compress does "only" insert a simple substitution before the transposition, but it is a variable-length substitution. Therefore (if the permutation is "random", as the poster specified) the permuted ciphertext elements contain PIECES of letters, not permuted letters themselves. As the original poster pointed out, the distribution of a compressed file will look very even on a byte-wise frequency count -- the individual letter frequencies are suppressed. Doug suggested that it would be difficult to pass the information about which transposition to use, and correctly points out that if the transposition is fixed it is useless. It seems to me that the problem is no more difficult than the usual key-exchange problem: whenever two correspondants need to communicate securely they need to find a common key. Several methods have been published for doing this (fairly) securely. Depending on how paranoid you are you can phone it in... Given the keyword, use it to initialize a random number generator (not a linear congruential one, please!), which you then use for a stream of pseudo-random bytes. Then it's off to Knuth vol. 3 to find out how to generate a random permutation (exchange each element of your block in turn with some randomly selected element of the block). The recipient uses the inverse permutation and you're in business. -- Jim Gillogly {decvax, vortex}!randvax!jim jim@rand-unix.arpa
markb@sdcrdcf.UUCP (Mark Biggar) (07/11/85)
Another thing that could be done would be to scramble the initial subistution table that compress uses (compress starts out with a table giving subistutions for the ascii char set and add subs for longer strings as it is compressing) so that even if the final subistution and translation are known there are still 127! possible decompressions. Of course distributing the initial table has all the key transmission problems as well. Mark Biggar {allegra,burdvax,cbosgd,hplabs,ihnp4,akgua,sdcsvax}!sdcrdcf!markb