[comp.compression] Security of PKZIP's encryption

is@athena.cs.uga.edu (Bob Stearns) (03/26/91)

While I commonly recommend PKZIP (tm) for saving space on a hard disk, I have
been asked how strong its encryption (-spassword) option is. I have no way of
testing this and wonder if anyone out there in net land has investigated it.
If you have investigated the actual code and can explain the general algorithm
I can form my own opinion of its strength. If you have investigated by attempts
to crack it and have (succeeded|failed), explanation of your techniques and
results would assist. Any assistance would be greatly appreciated. Either
E-mail or net replies are acceptable. I will summarize if sufficient interest
appears either as "me too" or information replies to this message. 

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (03/27/91)

From article <1991Mar26.150049.20882@athena.cs.uga.edu>,
by is@athena.cs.uga.edu (Bob Stearns):

> While I commonly recommend PKZIP (tm) for saving space on a hard disk,
> I have been asked how strong its encryption (-spassword) option is.

As I pointed out in my CACM article on splay-tree based compression, the
initial state of just about any adaptive compression algorithm can be
used as a key for encryption/decryption of the compressed stream.  This
applies to my splay-tree based algorithm as well as applying to LZW,
adaptive arithmetic codes and various adaptive Huffman codes.  I have
enquired periodically about the security of the -p password option on my
splay-tree based compressor, but other than one E-mail response from
Moscow (of all places), I've heard nothing.  The response from Moscow
was that they were investigating this issue but had no results yet.

So, Bob Stearns, pleasy summarize what you learn to the net, and
netlanders, please send E-mail to Bob if you know anything about this.

					Doug Jones
					jones@cs.uiowa.edu

Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) (03/28/91)

I've used the technique you mentioned in your article in my own arithmetic 
compressor, but like you don't really know how secure it is (hmmm, email 
from Moscow...I've been trying to mail someone there for over a month 
with no success).  Anyway, about the PKZIP encryption:  From memory 
this encrypts some short checksum using the CRC tables, which is then 
decrypted to check that the supplied password is correct.  IMHO this 
is a major flaw, since it does away with the need for any sort of fancy 
attack on the encryption security (eg known plaintext).  All you need 
to do is use the (very fast) encryption technique to do a brute-force 
attack on the checksum until the password drops out.

Disclaimer:  The details of the ZIP encryption are somewhat hazy at 
the moment...

Peter.

d88-jwa@cyklop.nada.kth.se (Jon W{tte) (04/01/91)

In article <1991Mar28.111130.1092@kcbbs> Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes:


   with no success).  Anyway, about the PKZIP encryption:  From memory 
   this encrypts some short checksum using the CRC tables, which is then 
   decrypted to check that the supplied password is correct.  IMHO this 
   is a major flaw, since it does away with the need for any sort of fancy 
   attack on the encryption security (eg known plaintext).  All you need 
   to do is use the (very fast) encryption technique to do a brute-force 
   attack on the checksum until the password drops out.

One might argue that this can be good at some times.

Word Perfect has an encryption scheme that uses XOR between
the text and the data you enter as password. It took me one
night of brute force to break three six-letter passwords for
a bankruptcy investigation (though the E.O. that had encrypted
hos illegal business letters might have lost his trust in
computer security :-)

It's always a trade-off between a) how important the data is
if you forget the key b) speed and c) how long the information
has to be secure. A) and C) obviously contradict each other...

Hmm. Maybe followups should go to sci.crypt...

						h+@nada.kth.se
						Jon W{tte
--
"The IM-IV file manager chapter documents zillions of calls, all of which
seem to do almost the same thing and none of which seem to do what I want
them to do."  --  Juri Munkki in comp.sys.mac.programmer

parsons@matt.ksu.ksu.edu (Ghost in the Machine) (04/02/91)

is@athena.cs.uga.edu (Bob Stearns) writes:

>While I commonly recommend PKZIP (tm) for saving space on a hard disk, I have
>been asked how strong its encryption (-spassword) option is. I have no way of
>testing this and wonder if anyone out there in net land has investigated it.
>If you have investigated the actual code and can explain the general algorithm
>I can form my own opinion of its strength.

Here's what I found floating around on my roomates computer in the 
PKzip archive file.  This should help you to pass judgement on the security
of the password encryption.

DISCLAIMER:  This is part of the application notes for PKZIP.  Much
has been deleted about the actual compression algorithm, and only the
relevent part about password encryption remains.


Decryption
----------

The encryption used in PKZIP was generously supplied by Roger
Schlafly.  PKWARE is grateful to Mr. Schlafly for his expert
help and advice in the field of data encryption.

PKZIP encrypts the compressed data stream.  Encrypted files must
be decrypted before they can be extracted.

Each encrypted file has an extra 12 bytes stored at the start of
the data area defining the encryption header for that file.  The
encryption header is originally set to random values, and then
itself encrypted, using 3, 32-bit keys.  The key values are 
initialized using the supplied encryption password.  After each byte
is encrypted, the keys are then updated using psuedo-random number
generation techniques in combination with the same CRC-32 algorithm 
used in PKZIP and described elsewhere in this document.

The following is the basic steps required to decrypt a file:

1) Initialize the three 32-bit keys with the password.
2) Read and decrypt the 12-byte encryption header, further
   initializing the encryption keys.
3) Read and decrypt the compressed data stream using the
   encryption keys.


Step 1 - Initializing the encryption keys
-----------------------------------------

Key(0) <- 305419896
Key(1) <- 591751049
Key(2) <- 878082192

loop for i <- 0 to length(password)-1
    update_keys(password(i))
end loop


Where update_keys() is defined as:


update_keys(char):
  Key(0) <- crc32(key(0),char)
  Key(1) <- Key(1) + (Key(0) & 000000ffH)
  Key(1) <- Key(1) * 134775813 + 1
  Key(2) <- crc32(key(2),key(1) >> 24)
end update_keys


Where crc32(old_crc,char) is a routine that given a CRC value and a 
character, returns an updated CRC value after applying the CRC-32 
algorithm described elsewhere in this document.


Step 2 - Decrypting the encryption header
-----------------------------------------

The purpose of this step is to further initialize the encryption
keys, based on random data, to render a plaintext attack on the
data ineffective.


Read the 12-byte encryption header into Buffer, in locations
Buffer(0) thru Buffer(11).

loop for i <- 0 to 11
    C <- buffer(i) ^ decrypt_byte()
    update_keys(C)
    buffer(i) <- C
end loop


Where decrypt_byte() is defined as:


unsigned char decrypt_byte()
    local unsigned short temp
    temp <- Key(2) | 2
    decrypt_byte <- (temp * (temp ^ 1)) >> 8
end decrypt_byte


After the header is decrypted, the last two bytes in Buffer
should be the high-order word of the CRC for the file being
decrypted, stored in Intel low-byte/high-byte order.  This can
be used to test if the password supplied is correct or not.


Step 3 - Decrypting the compressed data stream
----------------------------------------------

The compressed data stream can be decrypted as follows:


loop until done
    read a charcter into C
    Temp <- C ^ decrypt_byte()
    update_keys(temp)
    output Temp
end loop


In addition to the above mentioned contributors to PKZIP and PKUNZIP, 
I would like to extend special thanks to Robert Mahoney for suggesting 
the extension .ZIP for this software.


References:

    Storer, James A. "Data Compression, Methods and Theory",
       Computer Science Press, 1988
    
    Held, Gilbert  "Data Compression, Techniques and Applications,
		    Hardware and Software Considerations"
       John Wiley & Sons, 1987

karn@epic.bellcore.com (Phil R. Karn) (04/03/91)

In article <1991Apr2.070810.10812@maverick.ksu.ksu.edu>,
parsons@matt.ksu.ksu.edu (Ghost in the Machine) describes the PKZIP
encryption algorithm.

At first glance, it doesn't look all that strong to me since all of
the operations appear to be linear. Although the PKZIP feature is fast
and convenient, when I want real security I encrypt the entire .ZIP
archive with DES.

Phil

madler@nntp-server.caltech.edu (Mark Adler) (04/03/91)

>> At first glance, it doesn't look all that strong to me since all of
>> the operations appear to be linear.

Linear?  In what field?  I have an implementation of it in C if anyone
would like to take a crack at it (pun intended).  I have put some
thought into it (not much, but some), and I can't see how to reverse
the pseudo-random sequence, given the encrypted and the clear.

>> Although the PKZIP feature is fast
>> and convenient, when I want real security I encrypt the entire .ZIP
>> archive with DES.

Sounds pretty secure.  Then again, I'm not sure I'd trust any encryption
method that was approved by the NSA.  Especially since they will not say
how the various arbitrary-looking bit flipping was derived.

Is there any source out there for RSA encryption?

Mark Adler
madler@pooh.caltech.edu

karn@epic.bellcore.com (Phil R. Karn) (04/04/91)

In article <1991Apr3.070045.22296@nntp-server.caltech.edu>, madler@nntp-server.caltech.edu (Mark Adler) writes:
|> 
|> >> At first glance, it doesn't look all that strong to me since all of
|> >> the operations appear to be linear.
|> 
|> Linear?  In what field?

Well, most of the operations seem to be additions and CRC calculations.
CRCs are certainly linear, as are additions. I don't see any nonlinear
substitutions and permutations going on.

I am only a layman cryptographer, but I am familiar with many of the
design principles of ciphers: nonlinearity and non-affineness are
essential properties for the building blocks, repeated permutation and
substitution operations are much stronger than the individual
operations themselves, and so on. This cipher does not seem to follow
those principles. I'll shut up on this point and let the experts
comment further if they want.

|> Sounds pretty secure.  Then again, I'm not sure I'd trust any encryption
|> method that was approved by the NSA.  Especially since they will not say
|> how the various arbitrary-looking bit flipping was derived.

At the risk of stirring up an old debate, I can say that I personally
believe DES to be well designed, at least for a cipher with only 56
bits in the key. The "Differential Cryptanalysis of DES" paper from
last year's Crypto conference sheds a lot of light on this subject.

I also take some heart from the NSA's strong resistance to the lifting
of export controls on DES. One might claim that this was merely a ploy
on their part to make us trust a cipher that they have cracked, but I
think that gives them too much credit.

Phil

madler@nntp-server.caltech.edu (Mark Adler) (04/04/91)

In article <1991Apr3.212713.18209@bellcore.bellcore.com> karn@thumper.bellcore.com writes:
>In article <1991Apr3.070045.22296@nntp-server.caltech.edu>, madler@nntp-server.caltech.edu (Mark Adler) writes:
>|> Linear?  In what field?
>Well, most of the operations seem to be additions and CRC calculations.
>CRCs are certainly linear, as are additions. I don't see any nonlinear
>substitutions and permutations going on.

Ah, so you think it's linear on GF(2) polynomials.  It's not.  There is
an "or" operation snuck in there for precisely the reason you mention--
to foil an algebraic approach to inverting the pseudo-random sequence.

This is not to say the scheme is secure, of course.  But I do think
that there was some thought put into it by someone familiar with the
field.

Mark Adler
madler@pooh.caltech.edu