[net.crypt] Encryption using compression

bet@duke.UUCP (Bennett E. Todd III) (07/04/85)

Trivial encryption algorithms are attacked by analysis of frequency
distributions, more or less. Wouldn't a simpleminded substitution or
transposition algorithm be beefed up to the point of requiring search
of the key space by applying a good compression program to the
plaintext first? The whole point to data compression is the redundancy
of tokens, and removing it by recoding the data to make all tokens
equally likely. If I were to use a simple substitution cypher with an
arbitrary premutation of bytes on the output of a good compression
program how could the resulting file be attacked?

-Bennett-- 

Bennett Todd -- Duke Computation Center, Durham, NC 27706 -- +1 919 684 3695
UUCP - bet@ecsvax, bet@duke, ...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/05/85)

> If I were to use a simple substitution cypher with an
> arbitrary premutation of bytes on the output of a good compression
> program how could the resulting file be attacked?

The general idea of removing redundancy to make cryptanalysis
harder is a good one.  However, most data compression schemes
amount to fairly simple encryptors (e.g. each 8-bit input
chunk might be taken to a constant output chunk of width from
2 to 8 bits, with a prepended mapping table).  Methods for
dealing with this form of variable-length encryption exist and
could be re-invented if necessary.

The weak point of the proposed scheme seems to be in conveying
the permutation and compression keys; if the permutation is not
changed for each message, then it can be broken through by a
"stacking" scheme.  In any case, you have to expect that any
constant elements of an encryption scheme are potentially
known to the cryptanalyst.  So, if you enclose permutation
information with the ciphertext, it will be a simple matter for
the analyst to extract it and untranspose the text the same way
the intended recipient would.  Then he just has to work on the
compression scheme through an overlaid simple substitution.
Although he won't be able to apply frequency distribution tests
to strip the substitution, these are not usually heavily used
in breaking simple substitution anyhow.  If I had to analyze such
a message, I'd probably try to break out the compression table
at this point and analyze possible substitutions against a
"crib" assumed to be present in the message (like, "From: ..."
at the front of the plaintext).  It should be possible to rule
out all but a handful of possible substitutions this way.
Then the rest of the task is just extending the meaning of the
plaintext a bit at a time while checking for consistency against
what is already known.

In other words, it would be a lot of work but most cryptanalysis
is like that.

bet@duke.UUCP (Bennett E. Todd III) (07/06/85)

Assuming of course that the algorithm is public knowledge, then in
any encryption scheme not based on a "trapdoor" (and all of those I
am aware of are computationally abusive to the *users* of the system
by the time they are intricate enough to be secure), the key must be
communicated separately from the message, and securely. If I were to
take the output of the LZ-compression program compress(1) which has
been posted to the net recently (no frequency tables), and permute
the bytes in it with a systematic shuffle, and then maybe apply a
substitution throughout to ward off any brute force efforts, then I
would consider the shuffle and substitution to compose the key -- I
visit the friend in question, put on some Police real loud, and
whisper to him/her the shuffle that I will use.  Has some part of
this procedure violated the current rules about what constitutes an
encryption system worthy of note? I realize I haven't addressed the
key distribution problem, but then if I meet my friend *once*, then I
can include a shuffle definition and substitution in the plaintext,
and the friend will know to use them to decrypt the *next* message.
This all seems far more simple-minded than the intricate
number-theoretical approaches currently being used in cryptology, and
for the life of me I can't see what even the NSA and a flock of Crays
could do about it.

Think about it: I give you a binary file, and tell you that if you
shuffle the bytes in the file in some fashion, and apply some
substitution to them, then you can run the result through compress(1)
and read the document. The transposition and substitution (neither
adequate to ward of determined attack alone) are sitting on top of a
uniform distribution of bytes, +/- some epsilon relatively independent
of the source text. Given a permutation chosen to keep bytes within
say 32K chunks, none of the operations involved is particularly slow.
Am I missing something? (besides any backgound whatsoever in
cryptology, thanks:-).

-Bennett-- 

Bennett Todd -- Duke Computation Center, Durham, NC 27706 -- +1 919 684 3695
UUCP - bet@ecsvax, bet@duke, ...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/07/85)

> ... I realize I haven't addressed the
> key distribution problem, but then if I meet my friend *once*, then I
> can include a shuffle definition and substitution in the plaintext,
> and the friend will know to use them to decrypt the *next* message.

No, no!  Once the "bad guy" breaks ANY message, all subsequent ones
would come unraveled like a chain stitch.

Key distribution really is a major problem in data cryptosystems.
The simpler the enciphering algorithm, the more key that has to
be supplied to maintain reasonable security.

jim@randvax.UUCP (Jim Gillogly) (07/08/85)

In article <5992@duke.UUCP> bet@ecsvax.UUCP (Bennett E. Todd III) writes:
>                             Wouldn't a simpleminded substitution or
>transposition algorithm be beefed up to the point of requiring search
>of the key space by applying a good compression program to the
>plaintext first?

The algorithm would certainly be strengthened (although probably not to
the point of requiring search of the entire key space).  However, you need
to be a little careful.  Some compression programs (e.g. "pack", a
Huffman-coding program) will take a first pass through the data to assign
the variable-length codes, put out the decoding tree at the beginning of
the output file, then follow with the compressed data.  To attack a simple
subsititution of this (or perhaps even the result of running an XOR'ed
shift register (or other "random number" stream) across it), the structure
of the decoding tree could be observed from the program, and compared with
the result.

>                If I were to use a simple substitution cypher with an
>arbitrary premutation of bytes on the output of a good compression
>program how could the resulting file be attacked?

I'm not sure what you mean by an "arbitrary" permutation of the bytes.
If you mean a fixed permutation like the initial and final (bitwise)
permutations of the DES, there wouldn't be any additional security in
the permutation, since we assume it's known to the cryptanalyst.  If you
mean picking a permutation based on some key-controlled "random number"
stream, that will certainly add strength.  Let's take the simple sub first.

Simple substitution alone might be crackable given some assumptions about
the underlying text.  For example, if we assume English written in ASCII
we can try a number of decoding trees based on standard English, including
branches for situations where there are close choices.  I believe there are
a lot fewer likely decoding trees than possible keys.  Spaces, for example,
would have a short bit string and would happen frequently, so when we
start getting close on the high frequency letters and digraphs we'd start
getting reasonable-looking output.  There will be more choices for low-freq
letters and digraphs, but they will also show up in the text less often to
mess us up; and when they do, they're likely to have similar-length encoding
strings.

A transposition of the kind you describe would probably be attackable with
a chosen-plaintext attack, or maybe even known-plaintext.  The chosen-
plaintext approach is the nastiest possible attack for the cryptanalyst to
make, since it assumes that not only does he know the correspondence between
plaintext and ciphertext, but he can also control what plaintext is to
be enciphered.  This is sometimes the case in a database application, for
example:  I send an invoice to somebody who will be putting it into a
database, and I include in my address (perhaps) some information whose
encryption will tell me what I need to know.  In _The Codebreakers_ somewhere
David Kahn tells about a code system that was giving trouble ... the
cryptanalysts produced a memo that included some words that were in doubt,
leaked it to the target agents, and then read the encryption as it got
sent verbatim to home base.  All this is to point out that chosen-plaintext
is not out of the question as an attack.

In any case, if one is trying to figure out how a transposition works and
has the luxury of chosen-plaintext, one can put zeroes everywhere except
in one location, and see where it goes; put it everywhere possible and
you've unwound the transposition for that block.  If all blocks use the
same transposition, you're done.  If not, knowing how the transposition
is produced may give some insight into the random number stream, which may
be broken by as few numbers as have been used to produce this particular
transposition.

So compression alone won't be enough to turn a weak system into an
"unbreakable" (modulo exhaustive key search) one.  Note that the DES is
not known (by me, anyway) to be subject to any of these attacks (including
the most powerful chosen-plaintext attack).
-- 
	Jim Gillogly
	{decvax, vortex}!randvax!jim
	jim@rand-unix.arpa

tli@oberon.UUCP (Tony Li) (07/09/85)

The problem with your beautiful theory is that you haven't changed the
frequency of the characters in your text.  By running compress on the
plaintext, you've simply inserted a substitution cipher before your shuffle
cipher.  Once the natural language of the plaintext becomes known, it should
quickly become obvious to the cryptographer that your character frequencies
are incorrect.  This leads immediately to the idea of a substitution.

May I suggest Gaines as an excellent place to begin your reading?  Even the
introductory chapters and a pocket calculator are useful for simple
cryptograms.

Cheers, 
Tony ;-)
-- 
Tony Li ;-)		Usc Computer Science
Uucp: ...!{{decvax,ucbvax}!sdcsvax,hplabs,allegra,trwrb}!sdcrdcf!uscvax!tli
Bitnet: tli@uscvaxq, tli@jaxom, tli@ramoth
Csnet: tli@usc-cse.csnet
Arpa: tli@usc-ecl.arpa

jim@randvax.UUCP (Jim Gillogly) (07/10/85)

In article <78@oberon.UUCP> tli@oberon.UUCP (Tony Li) writes:
>The problem with your beautiful theory is that you haven't changed the
>frequency of the characters in your text.  By running compress on the
>plaintext, you've simply inserted a substitution cipher before your shuffle
>cipher.  Once the natural language of the plaintext becomes known, it should
>quickly become obvious to the cryptographer that your character frequencies
>are incorrect.  This leads immediately to the idea of a substitution.

Doug Gwyn and I posted suggestions on how to attack this kind of cipher, but
the solution here isn't as easy as he makes it look.  Compress does "only"
insert a simple substitution before the transposition, but it is a
variable-length substitution.  Therefore (if the permutation is "random",
as the poster specified) the permuted ciphertext elements contain PIECES of
letters, not permuted letters themselves.  As the original poster pointed
out, the distribution of a compressed file will look very even on a byte-wise
frequency count -- the individual letter frequencies are suppressed.

Doug suggested that it would be difficult to pass the information about which
transposition to use, and correctly points out that if the transposition is
fixed it is useless.  It seems to me that the problem is no more difficult
than the usual key-exchange problem: whenever two correspondants need to
communicate securely they need to find a common key.  Several methods have
been published for doing this (fairly) securely.  Depending on how paranoid
you are you can phone it in...

Given the keyword, use it to initialize a random number generator (not
a linear congruential one, please!), which you then use for a stream of
pseudo-random bytes.  Then it's off to Knuth vol. 3 to find out how to
generate a random permutation (exchange each element of your block in
turn with some randomly selected element of the block).  The recipient
uses the inverse permutation and you're in business.
-- 
	Jim Gillogly
	{decvax, vortex}!randvax!jim
	jim@rand-unix.arpa

markb@sdcrdcf.UUCP (Mark Biggar) (07/11/85)

Another thing that could be done would be to scramble the initial subistution
table that compress uses (compress starts out with a table giving
subistutions for the ascii char set and add subs for longer strings as it
is compressing) so that even if the final subistution and translation are
known there are still 127! possible decompressions.  Of course distributing
the initial table has all the key transmission problems as well.

Mark Biggar
{allegra,burdvax,cbosgd,hplabs,ihnp4,akgua,sdcsvax}!sdcrdcf!markb