kent@xanth.UUCP (Kent Paul Dolan) (01/01/70)
I am currently involved (for a communications package) in a FORTRAN implementation of the recently published (CACM V30N6 June 1987 p520-540) adaptive arithmetic data compression algorithm. (It works fine; I needed to talk across a line that would only accept printable ASCII characters, so I adapted the algorithm to encode to 13 bits and then pass 6.5 bits per message byte. (95*95 printable ASCII characters holds 13 bits of information quite conveniently.)) This seems to be even more interesting than the adaptive Huffman encoding, for purposes of confusing decrypters, because individual bit boundaries don't match with individual character boundaries, NOR VICE VERSA! A particular, non-standard start-up encounter count list for the characters (in the adaptive form, which is the only interesting one) might be a sufficient encryption scheme by itself to "lend confusion to the enemy". In any case, I'm pretty sure that every bit in the compressed message depends at least slightly on every character up to then in the uncompressed message. I don't have the math skills to analyze this, but I'd sure like to see comments on this one from the net! Kent, the man from xanth. -- Kent Paul Dolan, LCDR, NOAA, Retired; ODU MSCS grad student // Yet UUCP : kent@xanth.UUCP or ...{sun,harvard}!xanth!kent // Another CSNET : kent@odu.csnet ARPA : kent@xanth.cs.odu.edu \\ // Happy USPost: P.O. Box 1559, Norfolk, Virginia 23501-1559 \// Amigan! Voice : (804) 587-7760 -=][> Last one to Ceres is a rotten egg! -=][> I code reactor power plant control in C. I add "count_of_recent_alarms" to "count_of_rods_to_lift". C has weak type checking; the compiler doesn't notice. A major coolant valve sticks, a spate of alarms occur. All die. Oh, the embarrassment!
palmer@tybalt.caltech.edu (David Palmer) (08/07/87)
If you are going to encrypt a message by the standard breakable techniques, (Caesar cypher, Ultra, Purple, NBS DES, whatever technique the NSA is selling to our allies :-) Is decryption more difficult if you use Hoffman coding (including Hoffman coding of common combinations such as 'th') on the plaintext before encypherment? I am assuming that the people doing the attack know that you are doing Hoffman coding and know what the code is. (For those who are not familiar with the technique, Hoffman coding uses variable length characters, the more common characters requiring fewer bits. This usually decreases the total length of a message (in bits), assuming that it is non-pathological and not overly concerned with zebu and oxen in Zanzibar and Qom) It seems to me that Hoffman coding makes the plaintext look more 'random' and, so, less amenable to statistical attack. Is this so? I don't know how to crack anything harder than a simple substitution cypher, so details would be welcome. David Palmer palmer@tybalt.caltech.edu ...rutgers!cit-vax!tybalt.caltech.edu!palmer
bs@augusta.UUCP (Burch Seymour) (08/10/87)
in article <3515@cit-vax.Caltech.Edu>, palmer@tybalt.caltech.edu (David Palmer) says: > > It seems to me that Hoffman coding makes the plaintext look more 'random' > and, so, less amenable to statistical attack. Is this so? An in a similar vein, it would seem that preceeding the encryption with a simpleminded substitution, say to 17 bit characters, which flattened out the distribution on the use of the alphabet, ie mapping more characters to 'e' than 'q' would also make the results difficult to attack. Any comments? -Burch Seymour-
jgy@hropus.UUCP (John Young) (08/10/87)
David Palmer writes: > If you are going to encrypt a message by the standard breakable techniques, > (Caesar cypher, Ultra, Purple, NBS DES, whatever technique the NSA is > selling to our allies :-) Is decryption more difficult if you use > Hoffman coding (including Hoffman coding of common combinations such as 'th') > on the plaintext before encypherment? I am assuming that the people doing > the attack know that you are doing Hoffman coding and know what the code is. > . > . > . > It seems to me that Hoffman coding makes the plaintext look more 'random' > and, so, less amenable to statistical attack. Is this so? > My first reaction would be to say "yes it would be less amenable to statistical attack", but then I wondered, if it's known your doing Huffman coding (you say it is) then that algorithm has already done the frequency analysis for the code breaker, and the partial-cypher may now be in a "stricter" form. Comments from experts??? (I'm not either)
jim@randvax.UUCP (08/10/87)
In article <3515@cit-vax.Caltech.Edu> palmer@tybalt.caltech.edu.UUCP (David Palmer) writes: >If you are going to encrypt a message by the standard breakable techniques, >(Caesar cypher, Ultra, Purple, NBS DES, whatever technique the NSA is >selling to our allies :-) Is decryption more difficult if you use >Hoffman coding (including Hoffman coding of common combinations such as 'th') >on the plaintext before encypherment? I am assuming that the people doing >the attack know that you are doing Hoffman coding and know what the code is. Data compression methods such as Huffman coding will normally make the cryptanalysis more difficult: the variable-length strings of the underlying plaintext will make it hard for the cryptanalyst to spot repetitions that are important for breaking in. However, a common implementation of Huffman coding involves computing the optimal tree of assignments of bit strings to ascii characters before encoding, then prepending this tree to the compressed text. The tree is in a standard format, which we assume is available to the cryptanalyst; this might result in half of several hundred bytes being known. So rather than improving the security, this implementation would actually give the cracker a leg up... Better would be the adaptive flavor of Huffman coding, which starts with a standard English (or C code or assembler) decoding tree and changes the tree on the fly as the symbol frequencies move away from the starting tree. This would eliminate the initial cribs, leaving the cryptanalyst with only the usual tools, such as guessing or discovering pieces of the underlying plaintext, Huffmanizing them with a standard tree, and testing the result for consistency. -- Jim Gillogly {hplabs, ihnp4}!sdcrdcf!randvax!jim jim@rand-unix.arpa
karn@faline.bellcore.com (Phil R. Karn) (08/10/87)
> Is decryption more difficult if you use Hoffman coding..?
Yes, it is well known that increasing the entropy ("randomness") of the
plaintext before encryption is an excellent way to thwart cryptanalysis.
It's easy to see this. Suppose you have LOTS of CPU cycles available,
maybe even a parallel array of a million DES chips, but you need to know
when you've found the answer. If you've got matched ciphertext and
plaintext, no problem. But if you've only got ciphertext, you'll need to
guess the expected properties of the plaintext to know when you've found
the right answer. If the plaintext is ASCII-encoded English text, it
will leap right out at you with even the simplest statistical analysis
(all parity bits off, typical English character distribution, etc).
However, if the plaintext has been compressed, these statistics go away
(assuming the compression algorithm is doing its job). You simply don't
know when you stumble on the right answer.
I've heard it from someone that should know that "knowing when you've found
the answer" is indeed perhaps THE hardest part of cryptanalysis.
Practical point: many compression utilities put "magic numbers" at the
beginning of their compressed output so they can tell if they're being
fed random garbage when decompressing. The cryptanalyst could look for
these magic numbers, though if they're small compared to the block size
of the encryption algorithm he would get a very large number of false
alarms. (E.g. a 1 byte magic number with 8 byte DES blocks would yield
one false alarm every 256 trial keys, on average).
Phil
biep@cs.vu.nl (J. A. "Biep" Durieux) (08/12/87)
In article <1034@faline.bellcore.com> karn@faline.bellcore.com (Phil R. Karn) writes: >> Is decryption more difficult if you use Hoffman coding..? > >Yes, it is well known that increasing the entropy ("randomness") of the >plaintext before encryption is an excellent way to thwart cryptanalysis. >I've heard it from someone that should know that "knowing when you've found >the answer" is indeed perhaps THE hardest part of cryptanalysis. That sounds as if true, but I don't think Huffman encoding will make that so difficult (that is, Huffman encoding on a letter-by-letter basis), since English is also rather easily recognisable (from random bits) by it's inter- character statistics, i.e. the relative frequencies of groups of letters. It will be a bit more work, through, and a *real good* compress algorithm will take advantage of these regularities too (if it's there for multiple uses. Otherwise telling the recipient how to decode might be more work than sending the plaintext). -- Biep. (biep@cs.vu.nl via mcvax) To be the question or not to be the question, that is.
jim@randvax.UUCP (Jim Gillogly) (08/13/87)
>In article <1034@faline.bellcore.com> karn@faline.bellcore.com (Phil R. Karn) writes: >>... it is well known that increasing the entropy ("randomness") of the >>plaintext before encryption is an excellent way to thwart cryptanalysis. >>I've heard it from someone that should know that "knowing when you've found >>the answer" is indeed perhaps THE hardest part of cryptanalysis. In article <851@klipper.cs.vu.nl> biep@cs.vu.nl (J. A. "Biep" Durieux) responds: >That sounds as if true, but I don't think Huffman encoding will make that >so difficult (that is, Huffman encoding on a letter-by-letter basis), since >English is also rather easily recognisable (from random bits) by it's inter- >character statistics, i.e. the relative frequencies of groups of letters. I agree with Phil, assuming the decoding tree is not known. Huffman coding (letter-by-letter or not) results in variable-length bit strings, so that looking at a random piece of output you can't tell where the letters (or groups of letters) begin and end. This gives considerably more room for error. It's not terribly easy to decrypt Huffman-coded stuff (even before a superencryption) without the decoding tree. ------ If you know or guess that a particular word appears at a certain place in the output, you can try to use that to build up the decryption. That's how I'd try to attack it anyway. But I wouldn't expect to have it done before lunch. Jim Gillogly -- Jim Gillogly {hplabs, ihnp4}!sdcrdcf!randvax!jim jim@rand-unix.arpa