[net.sources] Compress for MS-DOS, part 1/2

bet@ecsvax.UUCP (Bennett E. Todd III) (07/07/86)

The responses were overwhelming; everybody wanted this. Herewith: the
documentation and makefile in this posting, and C sources in a second,
to the port of compress(1) to MS-DOS under MSC 3.0. This version should
port to other micros more easily than the original, I would think.
Executables for this can be found in net.micro.pc. Read the C source for
the credits; I had nothing to do with the development of this excellent
piece of software, and credit for this work shouldn't be lost.

#!/bin/sh
# Cut above the preceeding line, or cut here if you must.
# This is a shar archive.  Extract with sh, not csh.
# The rest of this file will extract:
# makefile compress.doc
sed 's/^X//' > makefile << '/*EOF'
Xcomsmlpc.exe: compress.c
X	msc compress,compress/FPi/Ze/Gs/Ot;
X	link compress,comsmlpc/NOI;
X
Xcombigpc.exe: compress.c
X	msc compress,compress/DBIG/FPi/Ze/Gs/Ot;
X	link compress,combigpc.big/NOI;
X	exepack combigpc.big combigpc.exe
X
Xcomsmlat.exe: compress.c
X	msc compress,compress/FPi/Ze/Gs/Ot/G2;
X	link compress,comsmlat/NOI;
X
X
Xcombigat.exe: compress.c
X	msc compress,compress/DBIG/FPi/Ze/Gs/Ot/G2;
X	link compress,combigat.big/NOI;
X	exepack combigat.big combigat.exe
/*EOF
ls -l makefile
sed 's/^X//' > compress.doc << '/*EOF'
X       COMPRESS(1)   MS-DOS Programmer's Manual   COMPRESS(1)
X       
X       NAME
X            compress, uncompress, zcat - compress or expand data
X       
X       SYNOPSIS
X            compress [-cdfivV] [-b bits] [name ...]
X            uncompress [-cfivV] [name ...]
X            zcat [-iV] [name ...]
X       
X       DESCRIPTION
X            Compress reduces the size of the named files using
X            adaptive Lempel-Ziv coding.  Whenever possible, each 
X            file is replaced by one with the extension .Z or XZ, 
X            while keeping the same modification times.  If no 
X            files are specified, the standard input is 
X            compressed to the standard output.  Compressed files 
X            can be restored to their original form using 
X            uncompress or zcat.
X       
X            The -c option makes compress/uncompress write to the
X            standard output; no files are changed.  The 
X            nondestructive behavior of zcat is identical to that
X            of uncompress -c.
X       
X            The -d (decompress) option makes compress restore
X            its input files to their normal form.  Uncompress is
X            identical to compress with the -d option specified.
X       
X            The -f option will force compression of "name". This
X            is useful for compressing an entire directory, even 
X            if some of the files do not actually shrink.  If -f
X            is not given, the user is prompted as to whether an 
X            existing file should be overwritten. 
X       
X            The -i (image mode) option suppresses the
X            transformation of text lines from MS-DOS (CR-LF 
X            delimited) form to UNIX (LF delimited) form during 
X            compression, and suppresses the reverse 
X            transformation during decompression. 
X       
X            The -v (verbose) option causes a message to be
X            printed, yielding the percentage of reduction for 
X            each file compressed. 
X       
X                                  1
X       COMPRESS(1)   MS-DOS Programmer's Manual   COMPRESS(1)
X       
X       
X            The -V option causes the current version and compile
X            options to be printed on stderr. 
X       
X            Compress uses the modified Lempel-Ziv algorithm
X            popularized in "A Technique for High Performance 
X            Data Compression", Terry A. Welch, IEEE Computer,
X            vol. 17, no. 6 (June 1984), pp. 8-19.  Common 
X            substrings in the file are first replaced by 9-bit 
X            codes 257 and up.  When code 512 is reached, the 
X            algorithm switches to 10-bit codes and continues to 
X            use more bits until the limit specified by the -b
X            flag is reached (default is the maximum for which 
X            the program was built). 
X       
X            "Bits" must be between 9 and the lesser of 16, and 
X            the limit imposed at compile-time.  The MS-DOS 
X            version of compress comes in two sizes.  One has a
X            12-bit limit, and will run in a machine with 128K 
X            bytes of available user memory.  The other has a 
X            16-bit limit, and requires about 450K bytes to run. 
X       
X            After the "bits" limit is attained, compress
X            periodically checks the compression ratio.  If it is 
X            increasing, compress continues to use the existing
X            code dictionary.  However, if the compression ratio 
X            decreases, compress discards the table of substrings
X            and rebuilds it from scratch.  This allows the 
X            algorithm to adapt to the next "block" of the file. 
X       
X            Note that the -b flag is omitted for uncompress,
X            since the "bits" parameter specified during 
X            compression is encoded within the output, along with 
X            a magic number to ensure that neither decompression 
X            of random data nor recompression of compressed data 
X            is attempted. 
X       
X            The amount of compression obtained depends on the 
X            size of the input, the number of "bits" per code, 
X            and the distribution of common substrings.  
X            Typically, text such as source code or English is 
X            reduced by 50-60%.  Compression is generally much 
X       
X                                  2
X       COMPRESS(1)   MS-DOS Programmer's Manual   COMPRESS(1)
X       
X            better than that achieved by Huffman coding (as used 
X            in SQ), and takes less time to compute. 
X       
X            Exit status is normally 0; if the last file is 
X            larger after (attempted) compression, the status is 
X            2; if an error occurs, exit status is 1. 
X       
X       SEE ALSO
X            SQ(1) 
X       
X       DIAGNOSTICS
X            Usage: compress [-cdfivV] [-b maxbits] [file ...] 
X                Invalid options were specified on the command 
X                line. 
X            Missing maxbits
X                Maxbits must follow -b. 
X            file: not in compressed format
X                The file specified to UNCOMPRESS has not been 
X                compressed. 
X            file: compressed with xx bits, can only handle yy bits 
X                "File" was compressed by a program that could 
X                deal with more "bits" than the compress code on 
X                this machine.  Recompress the file with smaller 
X                "bits". 
X            file: already has xx suffix -- no change
X                The file is assumed to be already compressed 
X                because the last two characters of its extension 
X                are ".Z" or "XZ".  Rename the file and try 
X                again. 
X            fn: part of filename extension will be replaced by XZ
X                File name, fn, contains at least two characters 
X                in the "extension" field.  The second and third 
X                will be replaced by "XZ" in the compressed 
X                file's name. 
X            fn already exists; do you wish to overwrite fn?
X                Respond "y" if you want the output file, fn, to 
X                be replaced; "n" if not. 
X            Compression: xx.xx%
X                Percentage of the input saved by compression. 
X                (Relevant only for -v.) 
X            -- file unchanged
X                No savings is achieved by compression.  The 
X       
X                                  3
X       COMPRESS(1)   MS-DOS Programmer's Manual   COMPRESS(1)
X       
X                input remains virgin. 
X       
X       BUGS
X            Although compressed files are compatible between 
X            machines with large memory, -b12 should be used for 
X            file transfer to architectures with a small process 
X            data space (64KB or less, as exhibited by the DEC 
X            PDP series, or the small MS-DOS version, for 
X            example). 
X       
X            MS-DOS version 2 does not permit a program to 
X            determine the name used to call it.  As a result, 
X            the aliases, uncompress and zcat, cannot be used.
X            They can be used under MS-DOS version 3, though the 
X            actual file name for uncompress will be
X            "uncompre.exe". 
X       
X            MS-DOS does not support UNIX-style file links.  As a 
X            result, even though compress, uncompress and zcat
X            are all the same program, it (they) will have to be 
X            stored three times, once under each of the three 
X            names, in order to use them under MS-DOS version 3.  
X            As explained in the previous paragraph, this is not 
X            an option under MS-DOS version 2.
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X       
X                                  4
X
/*EOF
ls -l compress.doc
exit
-- 

Bennett Todd -- Duke Computation Center, Durham, NC 27706-7756; (919) 684-3695
UUCP: ...{decvax,seismo,philabs,ihnp4,akgua}!mcnc!ecsvax!duccpc!bet