[comp.sys.ibm.pc] PKWARE "UnZIPs" New Compression Software

raf@cup.portal.com (Robert A Freed) (02/02/89)

Phil Katz (PKWARE Inc.) is about to release the first versions of his
new file compression software for MS-DOS.  According to a bulletin
posted this week on the PKWARE BBS, the approximate release date for
the new shareware programs, PKZIP and PKUNZIP, is February 6, 1989.

Under the terms of last year's lawsuit settlement agreement with SEA
Inc., as of January 31, 1989, PKWARE may no longer distribute its
ARC-compatible programs PKPAK and PKUNPAK (PKARC and PKUNARC).
Although functionally similar to the earlier products, the new PKZIP
and PKUNZIP programs are completely redesigned.  These utilize a new
file format and improved data compression methods.

Complete technical details have been distributed to software developers
and Beta testers of the new programs.  I have received permission from
Phil Katz to post this documentation on USENET, but I'll defer to
public opinion due to the length of the material (approximately 25K
bytes of text).

The following is an excerpt from the distributed file DISCLAIM.DOC,
dated 11 Jan 1989:

> Dedication
> ----------
>
> The file format of the files created by these programs, which file format
> is original with the first release of this software, is hereby dedicated to
> the public domain.  Further, the filename extension of ".ZIP", first used in
> connection with data compression software on the first release of this
> software, is also hereby dedicated to the public domain, with the fervent
> and sincere hope that it will not be attempted to be appropriated by anyone
> else for their exclusive use, but rather that it will be used to refer to
> data compression and librarying software in general, of a class or type
> which creates files having a format generally compatible with this
> software.

For additional information contact:

     PKWARE Inc.
     7545 North Port Washington Road
     Suite 205
     Glendale, WI 53217
     Voice: 414-352-3670  (9am to 5pm Central Time)

     PKWARE BBS
     Data:  414-352-7176  (Up 24 hours)
     300/1200/2400 Baud, MNP class 5 available
     8 Data bits, No parity, 1 Stop bit

DISCLAIMER:  I have no association with Phil Katz and/or PKWARE Inc.,
except as an enthusiastic supporter of the new ZIP file format.

Robert A. Freed                     raf@cup.portal.com
Newton Centre, MA                   ...!sun!portal!cup.portal.com!raf

raf@cup.portal.com (Robert A Freed) (02/09/89)

In response to numerous requests, attached are the document files from
Phil Katz' Software Development Kit for his new PKZIP and PKUNZIP
shareware file compression programs.  (Phil requested specifically that
the Beta test versions of the .EXE files not be distributed.)

I am sure the first official release of the software will be posted to
comp.sys.ibm.pc as soon as it becomes available (this week, according
to a bulletin on the PKWARE BBS), so please do not fill my mailbox with
requests.

DISCLAIMER:  I have no association with Phil Katz and/or PKWARE Inc.,
except as an enthusiastic supporter of the new ZIP file format.

Robert A. Freed                     raf@cup.portal.com
Newton Centre, MA                   ...!sun!portal!cup.portal.com!raf

     Files attached:

     README   1ST      637   1-10-89  21:05:02
     SDK      DOC     1372   1-11-89  18:38:44
     DISCLAIM DOC     1622   1-11-89   0:15:02
     BETA     DOC    12420   1-10-89  19:39:46
     FORMAT   DOC     7395   1-11-89   0:15:34
     EXTRACT  DOC     5382   1-11-89   0:25:58
     BUGREP   DOC     1270   1-11-89   0:12:34

----- cut here --------------- README.1ST --------------- cut here -----

Files included in the Software Development Kit:

    BETA.DOC		Description of the command line arguments
			for PKZIP and PKUNZIP.

    BUGREP.DOC		Form to complete if reporting bugs in the
			software.

    DISCLAIM.DOC	Disclaimers and license of information.

    EXTRACT.DOC		Technical description of the algorithms used in
			the software and how to extract files created
			by PKZIP.

    FORMAT.DOC		Technical description of the .ZIP file format
			used by the software.

    PKUNZIP.EXE		
    PKZIP.EXE		PKUNZIP/PKZIP Beta software.

    SDK.DOC		General documenation for the Software Development
			Kit.

----- cut here ---------------- SDK.DOC ----------------- cut here -----


    Welcome to the Software Development Kit and Beta software release for 
PKZIP and PKUNZIP data compression software.  If you wish to report any 
bugs in the software, please use the enclosed form in the file BUGREP.DOC.
The estimated release date for this software is January 31, 1989, so please
return any bug reports before that date if possible.

    The file README.1ST contains a complete description of all the files 
included in the development kit.

    This is NOT information on: 1) Full-screen, menu-driven versions
of the software or 2) Libraries of compression routines for incorporation
into other applications.  These things are under development and you will 
be notified as information becomes available.


    I would like to expressly thank David Schwaderer who has generously 
contributed his expertise in CRC calculations to the software.  I would 
also like to thank Graeme McRae who's principles for repeated string 
elimination, as used in his SCRNCH (tm) program, were licensed and
incorporated into the software and Bill Tullis, who coined the name "ZIP".

    In addition, I would like to thank all those (too numerous to mention)
that have supported PKWARE in the past through letters, messages, and 
registrations, and will embrace this new, next-generation in compression 
technology.



					>Phil Katz>

----- cut here -------------- DISCLAIM.DOC -------------- cut here -----

Disclaimer
----------

Although PKWARE will attempt to supply current and accurate information 
relating to its file formats, algorithms, and the subject programs, the 
possibility of error can not be eliminated.  PKWARE therefore expressly 
disclaims any warranty that the information contained in the associated 
materials relating to the subject programs and/or the format of the files 
created or accessed by the subject programs and/or the algorithms used by 
the subject programs, or any other matter, is current, correct or accurate 
as delivered.  Any risk of damage due to any possible inaccurate 
information is assumed by the user of the information.  Furthermore, the 
information relating to the subject programs and/or the file formats 
created or accessed by the subject programs and/or the algorithms used by 
the subject programs is subject to change without notice.


Dedication
----------

The file format of the files created by these programs, which file format 
is original with the first release of this software, is hereby dedicated to 
the public domain.  Further, the filename extension of ".ZIP", first used in 
connection with data compression software on the first release of this 
software, is also hereby dedicated to the public domain, with the fervent 
and sincere hope that it will not be attempted to be appropriated by anyone 
else for their exclusive use, but rather that it will be used to refer to 
data compression and librarying software in general, of a class or type 
which creates files having a format generally compatible with this 
software.

----- cut here ---------------- BETA.DOC ---------------- cut here -----

Explanation of options for PKZIP and PKUNZIP
--------------------------------------------

General:

      The PKZIP and PKUNZIP software was written from scratch, and
      not derived from PKPAK and PKUNPAK.  While some of the
      functions in the new software are designed to emualate those
      of PKPAK or PKUNPAK on a functional level, they are implemented
      differently, so there can (and will) be some differences in
      operation.

      All options for the software must be preceeded by a '-'
      character or the MS-DOS switch character (usually '/')
      and generally can be placed *anywhere* on the command line.
      Most options except where noted can be combined (i.e. "-x -y"
      or "-xy")  Each program has a default action, and can be run
      without any options or commands.

      Some functions are not fully implemented in the Beta software,
      and I have attempted to note this.  Also, the release software
      may sport additional features not listed here.


Some features that didn't make it into Beta 0.80 release:

      Spanning diskettes.  As detailed in FORMAT.DOC, the ZIP file
      format is designed to span multiple disks.  However, in order
      to meet time constraints this functionality will not be added
      until later, perhaps in the full-screen version of the software.
      
      Self-extracting ZIPfiles.
      
      Configuration files.  PKZIP (and possibly PKUNZIP) will
      eventually support configuration options to select certain
      defaults for the software.  Currently, there are some
      selections made arbitrarily in the Beta software that will
      be user configurable in configuration files in later releases
      of the software.
      
      Encryption.
      
      Individual file comments.

    See specific options listed below for more information.


PKZIP:

    general command format:
    
	pkzip [options] zipfilename [filespec...]
	
    The default extension for the Zipfile is .ZIP if no
    extension is specified.

    If no filespecs are specified, "*.*" is assumed.  Any DOS
    wilcarded filespecs (with paths) are allowed.  List files
    can be used, and are specified by preceeding the filename
    with a '@' symbol.  Entries within a list file can contain
    paths and wildcards.  List files should work for all options
    that take command line file specifications.
    
    Files are added to the ZIP in the precise order they are
    specified.  They are extracted in the exact same order.
    No sorting of files is performed when being added to the
    Zipfile.

    
    Options are:
    
	-a		Add files to ZIP.  This is the default
			if no other options are specified.

	-b[path]	Create temporary zipfile on an alternate
			drive and path.  If no path is specified,
			the current drive is used.  Usefull for
			updating zipfiles larger than half the
			size of a floppy etc.

	-c		Add file comments to individual files.
			Not implemented in Beta 0.80.

	-d		Delete the specified files from the Zipfile.

	-e[a,b][n]	Extra compression.  Use a slower but more
			efficient compression algorithm.  The 'a' and 'b'
			options specify that extra compression is to
			be applied to either ASCII or BINARY files.
			"-e" is the same as "-eb".  The optional n is
			the 'compression factor', from 1 to 4.  1 provides
			the fastest operation, and 4 (usually) the best
			compression.  The default factor is 2.  For binary
			files, a value of "-eb2" usually provides good
			performance and significantly better compression
			over the default algorithm.  The a and b options
			can not be mixed in the same command.  If you
			want to enable extra compression for both ASCII
			and BINARY files, two option are needed (e.g
			"-ea3 -eb2")

	-f		Freshen files in ZIP.  Add files to the ZIP only
			if the file already is in the ZIP and the files
			are also dated later then those within the ZIP.

	-g		Encrypt files.  Not implemented in 0.80.
	
	-h		Help.

	-i		Incremental add.  Add files to the ZIP only
			if the DOS archive directory attribute is set.
			The archive bit is then cleared after being
			added to the ZIP.

	-l		License screen.

	-m		Move files to ZIP.  Delete the specified files
			after adding to ZIP.  Can be used in conjunction
			with the Add, Freshen, and Update options.

	-p		Store relative paths with filenames in ZIPfile.
			Meaningful only if used with the R option below.

	-r		Recurse subdirectories from the specified
			directories.  For example:
			"pkzip source d:\*.c e:\headers\*.h" will
			search the entire D: drive for *.c files, and
			will search E:\HEADERS and all directories below
			E:\HEADERS for *.h files.  An entire directory
			tree can be zipped and restored using the P and
			R options.  For example "pkzip -r -p stuff" will
			zip all files in the current subdirectory, and
			all directories below the current subdirectory.
			PKUNZIP can then restore this directory tree
			either in the same directory, or at any place
			in the directory tree.  (See the D option for
			PKUNZIP below)
			
			By default, the filename only will be stored,
			unless the P option is specified as above.  If
			the P option is used, then the relative path
			will be stored.  For example, if the file
			"e:\headers\prog\startup\xyz.h" was found in the
			first example, "prog/startup/xyz.h" will be stored
			in the ZIP.

	-u		Update.  Add files to the ZIP only if they are
			not currently within the ZIP or are dated later
			than those within the ZIP.

	-v[t]		View files in the ZIP.  "-vt" lists the files
			in a long format with extra technical information.
			NOTE: Although this option in the Beta 0.80
			software displays a sorted listing by filename,
			the files within the ZIP are not sorted and most
			likely not in the order listed.  The order in
			which the files are listed is completely arbitrary.
			The release software might have user specifiable
			options to view the Zipfile with different sort
			or nosort options.

	-z		Zipcomment.  Add a comment for the Zipfile.
			This comment is automatically displayed by
			PKZIP or PKUNZIP when processing the Zipfile.



PKUNZIP
-------

    general command format:
    
	pkzip [options] zipfilename [filespec...] [output-path]
	
    The default extension for the Zipfile is .ZIP if no
    extension is specified.  The Zipfilename can contain
    wildcards.  For example, "pkunzip -t *" will test all
    Zipfiles in the current directory.

    If no filespecs are specified, "*.*" is assumed.  Any DOS
    wilcarded filespecs (with paths) are allowed.  List files
    can be used, and are specified by preceeding the filename
    with a '@' symbol.  Entries within a list file can contain
    paths and wildcards.  List files should work for all options
    that take command line file specifications.
    
    The output-path specifies which drive and directory, files
    should be extracted to.
    
    
    Options are:
    
	-c[m]		Extract to console [with more].  More
			is not implemented in Beta 0.80.

	-d		Use pathnames stored in the ZIP and create
			them if necessary upon extraction.  Will also
			create the output-path if it does not exist.
			For example, say drive C: has the following
			directory tree:
			
			root----+-abc--+-dir1-
				|      |
				|      +-dir2-+-dir3-
				|             |
				+-xyz--       +-dir4-
				|
				+-pdq--
			
			If then, you were to execute
			"pkzip -r -p a:stuff c:\abc\*.*".  You then could
			execute "pkunzip -d a:stuff c:\pdq\newabc".  After-
			wards, drive C: would look like:

			root----+-abc--+-dir1-
				|      |
				|      +-dir2-+-dir3-
				|             |
				+-xyz--       +-dir4-
				|
				+-pdq--+-newabc-+-dir1-
						|
						+-dir2-+-dir3-
						       |
						       +-dir4-

			All the files originally in C:\ABC and
			its subdirectories will have been restored
			to C:\PDQ\NEWABC, and the directory tree
			recreated.

			If this option is not specified, the filenames
			only stored in the ZIP will be used, any pathnames
			will be ignored.
			
	-e		Execute file from ZIP.  Not implemented in
			Beta 0.80.

	-g		Decryption.  Not implemented in Beta 0.80.

	-h		Help.

	-l		License screen.

	-n		Newer.  Extract files from the ZIP only if they
			are newer then the ones on the disk.
			
	-o		Overwrite existing files without query.  By
			default the software will prompt if existing
			files should be overwritten.

	-p[a,b,c][1,2,3,4]	Extract to printer.  The A and B
			options specify that the print device should
			be placed explicity in either ASCII or BINARY
			mode.  The C specifies that the data should
			be sent to the COM port instead.  The port
			number (LPT or COM) can also be specified.

			If neither ASCII or BINARY mode is specified,
			or ASCII mode is specified, the software will
			send a formfeed and carriage return to the
			print device after each file.  If no mode is
			specified, whatever the default mode is for
			the device will be used.  Most DOS character
			devices are in translated ASCII (cooked) I/O mode
			by default, but other software can place them
			in untranslated BINARY (raw) I/O mode.
			
			The default device used is PRN.  If C is given
			without a port number, COM1 will be used.

			Example: "pkunzip stuff *.doc -p3" extracts
			the .DOC files in STUFF.ZIP to LPT3.

			Example: "pkunzip fonts *.fon -pbc2" extracts
			the .FON files in FONTS.ZIP to COM2.  The COM2
			device is placed into untranslated BINARY (raw)
			I/O mode before extraction.
	
	-t		Test.  The specified files are extracted to
			the NUL device, and the 32 bit CRC value for
			the file is calculated.

	-v		View.  View the files in the ZIP.  See note
			for the View option for PKZIP.

	-x		Extract files.  This is the default action if
			no other option is specified.



Misc
----

PKZIP will look in the DOS environment for the string "PKTMP=path"
and will use the specified drive/path for temporary files if present.
Under DOS 3.0 or higher, a unique file name will be used for all
temporary files.

Both PKZIP and PKUNZIP when opening files for read-only type access
will open files in "Share Deny Write" mode under DOS 3.0 or higher.
Also, under DOS 3.0 or higher DOS "Critical Errors" are intercepted
and interrogated.  If an error occurs with a Locus of Network, and a
suggested action of Retry or Delayed Retry, the software will perform
the suggested action.  If after several retries the operation still fails,
the default DOS error prompt will be executed.


Errorlevels
-----------

The software returns the following exit codes:

    PKZIP:	0	No error.
		1	Bad file name or file specification.
	 	2,3	Error in Zipfile.
		4-11	Insufficient Memory.
		12	No files were found to add to the ZIP,
		        or no files were specified for deletion.
		13	File not found.  The specified Zipfile
			or list file was not found.
		14	Disk full.
		15	Zipfile is read-only and can not be modified.
		16	Bad or illegal parameters specified.

    PKUNZIP:	0	No error.
    		1	Warning error (such as failed CRC check)
		2,3	Error in Zipfile.
		4-8	Insufficient Memory.
		9	File not found.  No Zipfiles found.
		10	Bad or illegal parameters specified.
		50	Disk Full.
		51	Unexpected EOF in Zipfile.


    Please note that if testing errorlevels in a batch file
    that DOS tests the errorlevel not for equality, but for
    greater than or equal to.  For example, if the software
    exits with an exit code of 10, errorlevel 10 will be true,
    and so will errorlevel 9, errorlevel 8, and so on.
    >>Errorlevel 0 is always true<<.  Therefore, errorlevels should
    be tested in descending order.  For example:
    
    pkunzip stuff -d d:\temp
    if errorlevel 51 goto err51
    if errorlevel 50 goto err50
    if errorlevel 10 goto err10
    if errorlevel 9 goto err9
    if errorlevel 4 goto err4
    if errorlevel 2 goto err2
    if errorlevel 1 goto err1
    echo No Error
    goto exit
    :err51
    echo Unexpected EOF
    goto exit
    :err50
    echo Disk Full
    goto exit
    .
    .
    .

----- cut here --------------- FORMAT.DOC --------------- cut here -----
General Format
--------------

  Files stored in arbitrary order.  Large zipfiles can span multiple
  diskette media.

  Overall zipfile format:

    [local file header+file data] . . .
    [central directory] end of central directory record


  A.  Local file header:
  
	local file header signature	4 bytes  (0x04034b50)
	version needed to extract	2 bytes
	general purpose bit flag	2 bytes
	compression method		2 bytes
	last mod file time 		2 bytes
	last mod file date		2 bytes
	crc-32   			4 bytes
	compressed size			4 bytes
	uncompressed size		4 bytes
	filename length			2 bytes
	extra field length		2 bytes

	filename (variable size)
	extra field (variable size)
      

  B.  Central directory structure:

      [file header] . . .  end of central dir record

      File header:

	central file header signature	4 bytes  (0x02014b50)
	version made by			2 bytes
	version needed to extract	2 bytes
	general purpose bit flag	2 bytes
	compression method		2 bytes
	last mod file time 		2 bytes
	last mod file date		2 bytes
	crc-32   			4 bytes
	compressed size			4 bytes
	uncompressed size		4 bytes
	filename length			2 bytes
	extra field length		2 bytes
	file comment length		2 bytes
	disk number start		2 bytes
	internal file attributes	2 bytes
	external file attributes	4 bytes
	relative offset of local header	4 bytes

	filename (variable size)
	extra field (variable size)
	file comment (variable size)

      End of central dir record:

	end of central dir signature	4 bytes  (0x06054b50)
	number of this disk		2 bytes
	number of the disk with the
	start of the central directory	2 bytes
	total number of entries in
	the central dir on this disk	2 bytes
	total number of entries in
	the central dir			2 bytes
	size of the central directory   4 bytes
	offset of start of central
	directory with respect to
	the starting disk number	4 bytes
	zipfile comment length		2 bytes
	zipfile comment (variable size)
      



  C.  Explanation of fields:

      version made by
      
	  The upper byte indicates the host system (OS) for the
	  file.  Software can use this information to determine
	  the line record format for text files etc.  The current
	  mappings are:
	  
	  0 - IBM (MS-DOS)	1 - Amiga	2 - VMS
	  3 - *nix		4 thru 255 - unused
	  
	  The lower byte indicates the version number of the 
	  software used to encode the file.  The value/10 
	  indicates the major version number, and the value 
	  mod 10 is the minor version number.

      version needed to extract
      
	  The minimum software version needed to extract the 
	  file, mapped as above.

      general purpose bit flag:

	  The lowest bit, if set, indicates that the file is 
	  encrypted.  The upper three bits are reserved and 
	  used internally by the software when processing the 
	  zipfile.  The remaining bits are unused in version 
	  1.0.

      compression method:
      
	  (see accompanying documentation for algorithm 
	  descriptions)
      
	  0 - The file is stored (no compression)
	  1 - The file is Shrunk
	  2 - The file is Reduced with compression factor 1
	  3 - The file is Reduced with compression factor 2
	  4 - The file is Reduced with compression factor 3
	  5 - The file is Reduced with compression factor 4

      date and time fields:

	  The date and time are encoded in standard MS-DOS 
	  format.

      CRC-32:
      
	  The CRC-32 algorithm was generously contributed by 
	  David Schwaderer and can be found in his excellent 
	  book "C Programmers Guide to NetBIOS" published by
	  Howard W. Sams & Co. Inc.  The 'magic number' for 
	  the CRC is 0xdebb20e3.  The proper CRC pre and post 
	  conditioning is used, meaning that the CRC register 
	  is pre-conditioned with all ones (a starting value 
	  of 0xffffffff) and the value is post-conditioned by 
	  taking the one's complement of the CRC residual.
	  
      compressed size:
      uncompressed size:

	  The size of the file compressed and uncompressed, 
	  respectively.
      
      filename length:
      extra field length:
      file comment length:

	  The length of the filename, extra field, and comment 
	  fields respectively.  The combined length of any
	  directory record and these three fields should not
	  generally exceed 65,535 bytes.

      disk number start:

	  The number of the disk on which this file begins.

      internal file attributes:

	  The lowest bit of this field indicates, if set, that 
	  the file is apparently an ASCII or text file.  If not
	  set, that the file apparently contains binary data.
	  The remaining bits are unused in version 1.0.

      external file attributes:

	  The mapping of the external attributes is 
	  host-system dependent (see 'version made by').  For 
	  MS-DOS, the low order byte is the MS-DOS directory 
	  attribute byte.

      relative offset of local header:

	  This is the offset from the start of the first disk on
	  which this file appears, to where the local header should
	  be found.

      filename:

	  The name of the file, with optional relative path.  
	  The path stored should not contain a drive or 
	  device letter, or a leading slash.  All slashes 
	  should be forward slashes '/' as opposed to 
	  backwards slashes '\' for compatibility with Amiga
	  and Unix file systems etc.

      extra field:

	  This is for future expansion.  If additional information
	  needs to be stored in the future, it should be stored
	  here.  Earlier versions of the software can then safely
	  skip this file, and find the next file or header.  This
	  field will be 0 length in version 1.0.

      file comment:

	  The comment for this file.


      number of this disk:
      
	  The number of this disk, which contains central 
	  directory end record.
    
      number of the disk with the start of the central directory:
      
	  The number of the disk on which the central 
	  directory starts.

      total number of entries in the central dir on this disk:
      
	  The number of central directory entries on this disk.
	  
      total number of entries in the central dir:
      
	  The total number of files in the zipfile.
      

      size of the central directory:
      
	  The size (in bytes) of the entire central directory.

      offset of start of central directory with respect to
      the starting disk number:
      
	  Offset of the start of the central direcory on the 
	  disk on which the central directory starts.
      
      zipfile comment length:
      
	  The length of the comment for this zipfile.
      
      zipfile comment:
      
	  The comment for this zipfile.


  D.  General notes:

      1)  All fields unless otherwise noted are unsigned and stored
	  in Intel low-byte:high-byte, low-word:high-word order.

      2)  String fields are not null terminated, since the
	  length is given explicitly.

      3)  Local headers should not span disk boundries.  Also, even
	  though the central directory can span disk boundries, no
	  single record in the central directory should be split
	  across disks.

      4)  The entries in the central directory may not necessarily
	  be in the same order that files appear in the zipfile.

----- cut here -------------- EXTRACT.DOC --------------- cut here -----

UnShrinking
-----------

Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm 
with partial clearing.  The initial code size is 9 bits, and 
the maximum code size is 13 bits.  Shrinking differs from 
conventional Dynamic Ziv-lempel-Welch implementations in several 
respects:

1)  The code size is controlled by the compressor, and is not 
    automatically increased when codes larger than the current 
    code size are created (but not necessarily used).  When 
    the decompressor encounters the code sequence 256 
    (decimal) followed by 1, it should increase the code size 
    read from the input stream to the next bit size.  No 
    blocking of the codes is performed, so the next code at 
    the increased size should be read from the input stream 
    immediately after where the previous code at the smaller 
    bit size was read.  Again, the decompressor should not 
    increase the code size used until the sequence 256,1 is 
    encountered.

2)  When the table becomes full, total clearing is not 
    performed.  Rather, when the compresser emits the code 
    sequence 256,2 (decimal), the decompressor should clear 
    all leaf nodes from the Ziv-Lempel tree, and continue to 
    use the current code size.  The nodes that are cleared 
    from the Ziv-Lempel tree are then re-used, with the lowest 
    code value re-used first, and the highest code value 
    re-used last.  The compressor can emit the sequence 256,2
    at any time.



Expanding
---------

The Reducing algorithm is actually a combination of two 
distinct algorithms.  The first algorithm compresses repeated 
byte sequences, and the second algorithm takes the compressed
stream from the first algorithm and applies a probabilistic 
compression method.  

The probabilistic compression stores an array of 'follower 
sets' S(j), for j=0 to 255, corresponding to each possible 
ASCII character.  Each set contains between 0 and 32 
characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.  
The sets are stored at the beginning of the data area for a 
Reduced file, in reverse order, with S(255) first, and S(0) 
last.  

The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, 
where N(j) is the size of set S(j).  N(j) can be 0, in which 
case the follower set for S(j) is empty.  Each N(j) value is 
encoded in 6 bits, followed by N(j) eight bit character values 
corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If 
N(j) is 0, then no values for S(j) are stored, and the value 
for N(j-1) immediately follows.

Immediately after the follower sets, is the compressed data 
stream.  The compressed data stream can be interpreted for the 
probabilistic decompression as follows:


let Last-Character <- 0.
loop until done
    if the follower set S(Last-Character) is empty then
	read 8 bits from the input stream, and copy this
	value to the output stream.
    otherwise if the follower set S(Last-Character) is non-empty then
	read 1 bit from the input stream.
	if this bit is not zero then
	    read 8 bits from the input stream, and copy this
	    value to the output stream.
	otherwise if this bit is zero then
	    read B(N(Last-Character)) bits from the input 
	    stream, and assign this value to I.
	    Copy the value of S(Last-Character)[I] to the 
	    output stream.
	
    assign the last value placed on the output stream to 
    Last-Character.
end loop


B(N(j)) is defined as the minimal number of bits required to 
encode the value N(j)-1.


The decompressed stream from above can then be expanded to 
re-create the original file as follows:


let State <- 0.

loop until done
    read 8 bits from the input stream into C.
    case State of
	0:  if C is not equal to DLE (144 decimal) then
		copy C to the output stream.
	    otherwise if C is equal to DLE then
		let State <- 1.

	1:  if C is non-zero then
		let V <- C.
		let Len <- L(V)
		let State <- F(Len).
	    otherwise if C is zero then
		copy the value 144 (decimal) to the output stream.
		let State <- 0

	2:  let Len <- Len + C
	    let State <- 3.
    
	3:  move backwards D(V,C) bytes in the output stream 
	    (if this position is before the start of the output 
	    stream, then assume that all the data before the 
	    start of the output stream is filled with zeros).
	    copy Len+3 bytes from this position to the output stream.
	    let State <- 0.
    end case
end loop


The functions F,L, and D are dependent on the 'compression 
factor' (see FORMAT.DOC), 1 through 4, and are defined as follows:

For compression factor 1:
    L(X) equals the lower 7 bits of X.
    F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
    D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
For compression factor 2:
    L(X) equals the lower 6 bits of X.
    F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
    D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
For compression factor 3:
    L(X) equals the lower 5 bits of X.
    F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
    D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
For compression factor 4:
    L(X) equals the lower 4 bits of X.
    F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
    D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.

----- cut here --------------- BUGREP.DOC --------------- cut here -----

If you encounter a bug in the software, please complete and
return the following form.  Return forms to:

PKWARE Inc.
7545 North Port Washington Road
Suite 205
Glendale, WI 53217

or via BBS to:

PKWARE BBS
414-352-7176
300/1200/2400 Baud, MNP class 5 available
8 Data bits, No parity, 1 Stop bit
Up 24 hours

or by voice at:

414-352-3670  9am to 5pm Central Time


    Name: _____________________________

 Address: _____________________________

City, ST: _____________________________
                                    ZIP
Day phone: ____________________________

Eve phone: ____________________________


Serial # of your Software Development Kit diskette __________________

Make & Model number of your computer ________________________________

DOS Version and OEM _______________________  MS-DOS or PC-DOS? (circle one)

Please list ALL memory resident programs used, including device drivers,
network drivers, and terminate and stay resident type software:






Please describe as accurately as possible the bug or anomoly encountered,
including what commands were executed, and if at all possible (please!?)
how to re-create the problem.  Attach additional sheets, printouts, or
diskettes if neccesary:

----- cut here -------------- end of files -------------- cut here -----

Robert A. Freed                     raf@cup.portal.com
Newton Centre, MA                   ...!sun!portal!cup.portal.com!raf

raf@cup.portal.com (Robert A Freed) (02/14/89)

CORRECTION:  In my previous posting of the document files from Phil
Katz' Software Development Kit (article <14480@cup.portal.com>), there
appeared in the first file, SDK.DOC, a credit to an individual

> who coined the name "ZIP".

I have been informed by Phil Katz that this credit did not appear in
his original SDK.DOC file, which I received second-hand.  I apologize
for the error, which was unintentional.  With the exception of that
single phrase, the posted material is completely accurate.

According to Phil, credit for the name "ZIP" belongs to Bob Mahoney,
sysop of the 75-line Exec-PC BBS in Milwaukee, WI.

-- Bob Freed

keithe@tekgvs.LABS.TEK.COM (Keith Ericson) (02/16/89)

In article <14623@cup.portal.com> raf@cup.portal.com (Robert A Freed) writes:
>
>> ...who coined the name "ZIP".
>
>
>According to Phil [Katz], credit for the name "ZIP" belongs to Bob Mahoney,
>sysop of the 75-line Exec-PC BBS in Milwaukee, WI.
>

Except - how about the ZIP program that 'zips' files back and forth
between IBM/Compatibles over the serial ports at ~115kbaud...?


-->     ZIP and its documentation are (c)1988 E. Meyer, all rights reserved.  
--> They may be freely distributed, but not modified or sold for profit without 
--> my written consent.  The user takes full responsibility for any damages 
-->

And don't forget the Post Office's Zonal Improvement Plan (ZIP) codes, eh.

I mean, see just how ridiculous this can all get?!?!?

kEITH (I'm sticking with ZOO to avoid the whole SEA/PK mess) eRICSON

akk2@uhura.cc.rochester.edu (Atul Kacker) (02/16/89)

In article <14623@cup.portal.com> raf@cup.portal.com (Robert A Freed) writes:
>
>> who coined the name "ZIP".
>
>According to Phil, credit for the name "ZIP" belongs to Bob Mahoney,
>sysop of the 75-line Exec-PC BBS in Milwaukee, WI.

I just happened to be going through my collection of PC software and found
that I have a program called 'ZIP' that bears the copyright

Copyright (c) 1985,1986 by Edward V. Dong

that among other things has a squeeze/unsqueeze feature.  I wonder if Mr.Katz
is aware of said program.  I don't know much about the legal aspects but I'm
sure Mr. Katz would not want another name recognition fiasco.



-- 
Atul Kacker  |     Internet: akk2@uhura.cc.rochester.edu
             |     UUCP: {ames,cmcl2,decvax,rutgers}!rochester!ur-cc!akk2
-------------------------------------------------------------------------------

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (02/16/89)

In article <868@ur-cc.UUCP> akk2@uhura.cc.rochester.edu (Atul Kacker) writes:

| I just happened to be going through my collection of PC software and found
| that I have a program called 'ZIP' that bears the copyright
| 
| Copyright (c) 1985,1986 by Edward V. Dong
| 
| that among other things has a squeeze/unsqueeze feature.  I wonder if Mr.Katz
| is aware of said program.  I don't know much about the legal aspects but I'm
| sure Mr. Katz would not want another name recognition fiasco.

  I don't think Phil has a sense of self-preservation. He could have
used .PKA (Phil Katz Archive) format and not interfered with anything
existing. Instead he chose ZIP, which has been used for at least two
programs, one of which you mentioned does compression.

  I thought Dean Cooper was working on this, and he has better sense.
His archive program produces .DWC files. If the DWC archiver had
continued to evolve and been made more portable it might have pushed ARC
and zoo and zip into the background.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

amlovell@phoenix.Princeton.EDU (Anthony M Lovell) (02/19/89)

  Why did he not use Katz' N Jammer (.JAM) as a program name and
extension?
-- 
amlovell@phoenix.princeton.edu     ...since 1963.

raf@cup.portal.com (Robert A Freed) (02/19/89)

In article <4650@tekgvs.LABS.TEK.COM> keithe@tekgvs.LABS.TEK.COM
(Keith Ericson) writes:
> Except - how about the ZIP program that 'zips' files back and forth
> between IBM/Compatibles over the serial ports at ~115kbaud...?
> [...]
> And don't forget the Post Office's Zonal Improvement Plan (ZIP) codes, eh.

These comments remind me of something I read in a court complaint not
too long ago. ;-)  I'll never understand this obsession with names.
Personally, I'd prefer to discuss the technical merits of Phil Katz'
new programs, data compression methods, and file format.

> I mean, see just how ridiculous this can all get?!?!?

Indeed.

> kEITH (I'm sticking with ZOO to avoid the whole SEA/PK mess) eRICSON

I'm switching to PKZIP because it saves my time, space on my disks, and
charges on my phone bills.  Also, I prefer to support the one author
(of the three programs referenced in the above parenthetical comment)
who has consistently contributed original improvements to the data
compression efficiency of his archiving software.

Bob Freed                           raf@cup.portal.com
Newton Centre, MA                   ...!sun!portal!cup.portal.com!raf

john@stiatl.UUCP (John DeArmond) (02/20/89)

In article <6490@phoenix.Princeton.EDU> amlovell@phoenix.Princeton.EDU (Anthony M Lovell) writes:
>
>  Why did he not use Katz' N Jammer (.JAM) as a program name and
>extension?

Because "JAM" is the registered name of a screen handling system published
by Jyacc.  See?  those pesky intellectual property lawyers DO do some good 
sometimes :-)

John

-- 
John De Armond, WD4OQC                     | Manual? ... What manual ?!? 
Sales Technologies, Inc.    Atlanta, GA    | This is Unix, My son, You 
...!gatech!stiatl!john                     | just GOTTA Know!!!