[net.sources] Fast portable version of compress.c

joe@petsd.UUCP (Joe Orost) (08/05/84)

<>
#!/bin/sh
cat >README <<'------ EOF ------'
Enclosed is a modified version of compress.c, along with scripts to make it
run identically to pack(1), unpack(1), an pcat(1).  Here is what I
(petsd!joe) and a colleague (petsd!peora!srd) did:

1. Removed VAX dependencies.
2. Changed the struct to separate arrays; saves mucho memory.
3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
4. Sorted the character next chain and changed the search to stop
prematurely.  This saves a lot on the execution time when compressing.

This version is totally compatible with the original version.  Even though
lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
machine, due to the size of the arrays.

Here is the README file from the original author:
 
>Well, with all this discussion about file compression (for news batching
>in particular) going around, I decided to implement the text compression
>algorithm described in the June Computer magazine.  The author claimed
>blinding speed and good compression ratios.  It's certainly faster than
>compact (but, then, what wouldn't be), but it's also the same speed as
>pack, and gets better compression than both of them.  On 350K bytes of
>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
>seconds, and compress (herein) also took 80 seconds.  But, compact and
>pack got about 30% compression, whereas compress got over 50%.  So, I
>decided I had something, and that others might be interested, too.
>
>As is probably true of compact and pack (although I haven't checked),
>the byte order within a word is probably relevant here, but as long as
>you stay on a single machine type, you should be ok.  (Can anybody
>elucidate on this?)  There are a couple of asm's in the code (extv and
>insv instructions), so anyone porting it to another machine will have to
>deal with this anyway (and could probably make it compatible with Vax
>byte order at the same time).  Anyway, I've linted the code (both with
>and without -p), so it should run elsewhere.  Note the longs in the
>code, you can take these out if you reduce BITS to <= 15.
>
>Have fun, and as always, if you make good enhancements, or bug fixes,
>I'd like to see them.
>
>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)

					regards,
					joe

--
Full-Name:  Joseph M. Orost
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
Phone:      (201) 870-5844
------ EOF ------
ls -l README
cat >Pack <<'------ EOF ------'
FILES="$@"
for FILE in $FILES
do
	case $FILE in
	*.Z)	echo "$FILE - Already packed, skipped";;
	?????????????? | ?????????????)	echo "$FILE - Name too long, skipped";;
	*)	echo "$FILE - \c"
		if test -f $FILE.Z; then
			echo "$FILE.Z Already exists, skipped"
			continue
		fi
		STATO=`ls -l $FILE`
		set .$STATO
		NLINKS=$2
		STATO=$4
		if test $NLINKS -ne 1; then
			echo "Has links, skipped"
			continue
		fi
		compress $FILE > $FILE.Z
		STAT=$?
		if test $STAT -ne 0; then
			echo "Bad status from compress, skipped"
			continue
		fi
		STATN=`ls -l $FILE.Z`
		set .$STATN
		STATN=$4
		if test $STATO -le $STATN; then
			echo "$FILE - No savings, skipped"
			rm $FILE.Z
			continue
		fi
		copystat $FILE $FILE.Z
		rm $FILE;;
	esac
done
------ EOF ------
chmod +x Pack
ls -l Pack
cat >Pack.l <<'------ EOF ------'
.TH PACK 1 local
.SH NAME
Pack \- compress files
.SH SYNOPSIS
.B Pack
name .\|.\|.
.SH DESCRIPTION
.I Pack\^
attempts to store the specified files in a compressed form.
Wherever possible (and useful), each input file
.I name\^
is replaced by a packed file
.IB name .Z
with the same access modes, access and modified dates, and owner as those of
.IR name .
This is performed by running compress(l) and copystat(l) on each file.
If
.I Pack\^
is successful,
.I name\^
will be removed.
Packed files
can be restored to their original form using
.IR Unpack (l)
or
.IR Pcat (l).
.PP
The amount of compression obtained depends on the size of the
input file and the common substring frequency distribution.
.PP
No packing will occur if:
.PP
.RS
the filename ends in .Z;
.br
the file name has more than 12 characters;
.br
the file has links;
.br
the file is a directory;
.br
the file cannot be opened;
.br
no disk storage blocks will be saved by packing;
.br
a file called
.IB name .Z
already exists;
.br
the
.B .Z
file cannot be created;
.br
an I/O error occurred during processing.
.PP
.RE
The last segment of the file name must contain no more than 12
characters to allow space for the appended
.B .Z
extension.
Directories cannot be compressed.
.SH EXAMPLE
.IP
Pack file1
.PP
will pack file "file1" into "file1.Z" and removes "file1"
if packing is successful.
.SH "SEE ALSO"
Unpack(l), compress(l), Pcat(l)
------ EOF ------
ls -l Pack.l
cat >Pcat <<'------ EOF ------'
for FILE
do
	case $FILE in
	*.Z)
		if test -f $FILE; then
			compress -d $FILE
		else
			echo "$FILE: not found"
		fi;;
	????????????? | ??????????????)
		echo "$FILE.Z: not found";;
	*)
		if test -f $FILE.Z; then
			compress -d $FILE.Z
		else
			echo "$FILE.Z: not found"
		fi;;
	esac
done
------ EOF ------
chmod +x Pcat
ls -l Pcat
cat >Pcat.l <<'------ EOF ------'
.TH PCAT 1 local
.SH NAME
Pcat \- expand files and write to standard output
.SH SYNOPSIS
.B Pcat
name .\|.\|.
.SH DESCRIPTION
.I Pcat\^
does for packed files what
.IR cat (1)
does for ordinary files.
The specified files are unpacked and written to the standard output.
Thus to view a packed file named
.IB name .Z
use:
.PP
.RS
pcat name.Z
.RE
or just:
.RS
pcat name
.PP
.RE
To create an unpacked copy, named
.IR nnn ,
of a packed file named
.IB name .Z
(without destroying
\f2name\^\fP\f3.Z\fP)
use the command:
.PP
.RS
Pcat name >nnn
.PP
.RE
Failure may occur if:
.PP
.RS
the file name (exclusive of the
.BR .Z )
has more than 12 characters;
.br
the file cannot be opened.
.RE
.SH EXAMPLE
.IP
Pcat file1
.PP
will display the packed file "file1.Z" for viewing in unpacked form.
------ EOF ------
ls -l Pcat.l
cat >Unpack <<'------ EOF ------'
for FILE
do
	case $FILE in
	*.Z)
		if test -f $FILE; then
			OUT=`expr $FILE : '\(.*\)\.Z'`
			compress -d $FILE > $OUT
			STAT=$?
			if test $STAT -ne 0; then
				echo "$FILE: Bad status from compress, skipped"
				continue
			fi
			copystat $FILE $OUT
			echo "$OUT: Unpacked"
			rm $FILE
		else
			echo "$FILE: not found"
		fi;;
	????????????? | ??????????????)
		echo "$FILE.Z: not found";;
	*)
		if test -f $FILE.Z; then
			compress -d $FILE.Z > $FILE
			STAT=$?
			if test $STAT -ne 0; then
				echo "$FILE.Z: Bad status from compress, skipped"
				continue
			fi
			copystat $FILE.Z $FILE
			echo "$FILE: Unpacked"
			rm $FILE.Z
		else
			echo "$FILE.Z: not found"
		fi;;
	esac
done
------ EOF ------
chmod +x Unpack
ls -l Unpack
cat >Unpack.l <<'------ EOF ------'
.TH UNPACK 1 local
.SH NAME
Unpack \- expand files 
.SH SYNOPSIS
.B Unpack
name .\|.\|.
.SH DESCRIPTION
.I Unpack\^
expands files created by
.IR Pack .
For each file
.I name\^
specified in the command, a search is made for a file called
.IB name .Z
(or just
.IR name ,
if
.I name\^
ends in
.BR .Z ).
This file is replaced by its expanded version by running compress(l) with
the '-d' option and copystat(l).
The new file has the
.B .Z
suffix stripped from its name, and has the same access modes,
access and modification dates, and owner as those of the packed file.
.PP
Failure may occur for the same reasons that it may in
.IR Pcat ,
as well as for the following:
.PP
.RS
a file with the ``unpacked'' name already exists;
.br
if the unpacked file cannot be created.
.PP
.RE
.SH EXAMPLE
.IP
Unpack file1
.PP
will replace the packed file "file1.Z" with its expanded version,
"file1".
.SH "SEE ALSO"
Pack(l), compress(l), Pact(l) 
------ EOF ------
ls -l Unpack.l
cat >compress.c <<'------ EOF ------'
#define STATS
#undef STATS
#define DEBUG
#undef DEBUG
/* 
 * compress.c - File compression ala IEEE Computer June 1984.
 * 
 * Author:	Spencer W. Thomas
 * 		Computer Science Dept.
 * 		University of Utah
 * Date:	Wed Jul  4 1984
 * Copyright (c) 1984 Spencer W. Thomas
 * 
 * WARNING: due to cc -O bug (dealing with ext instruction), do not
 *	compile this program with -O (4.2bsd, at least).
 * 
 * $Header: compress.c,v 1.6 84/08/01 22:08:00 joe Exp $
 * $Log:	compress.c,v $
 * Revision 1.6  84/08/01  22:08:00  joe
 * Sped up algorithm significantly by sorting the compress chain.
 *
 * Revision 1.5  84/07/13  13:11:00  srd
 * Added C version of vax asm routines.  Changed structure to arrays to
 * save much memory.  Do unsigned compares where possible (faster on
 * Perkin-Elmer)
 *
 * Revision 1.4  84/07/05  03:11:11  thomas
 * Clean up the code a little and lint it.  (Lint complains about all
 * the regs used in the asm, but I'm not going to "fix" this.)
 * 
 * Revision 1.3  84/07/05  02:06:54  thomas
 * Minor fixes.
 * 
 * Revision 1.2  84/07/05  00:27:27  thomas
 * Add variable bit length output.
 * 

 */
static char rcs_ident[] = "$Header: compress.c,v 1.6 84/08/01 22:08:00 joe Exp $";

#include "stdio.h"
#include "ctype.h"

#define	BITS	16			/* maximum number of bits/code */
int maxbits = BITS;			/* user settable max # bits/code */
long int maxmaxcode = 1 << BITS;
#define INIT_BITS 9
int n_bits = INIT_BITS;			/* initial number of bits/code */
long int maxcode = 1 << INIT_BITS - 1;	/* 1 << n_bits - 1 */
long int bytes_out = 0;			/* count how many are written */

/* 
 * One code could conceivably represent (1<<BITS) characters, but
 * to get a code of length N requires an input string of at least
 * N*(N-1)/2 characters.  With 10000 chars in the stack, an input
 * file would have to contain a 50Mb string of a single character.
 * This seems unlikely.
 */
#define MAXSTACK    10000		/* size of output stack */

unsigned short tab_next[1<<BITS];	/* chain of entries with same prefix */
unsigned short tab_chain[1<<BITS];	/* chain prefixed with this entry */
unsigned short tab_prefix[1<<BITS];	/* prefix code for this entry */
	 char  tab_suffix[1<<BITS];	/* last char in this entry */

long int free_ent = 0;			/* first unused entry */

long int getcode();

#ifdef DEBUG
int debug = 0;
#endif

#define NO_SCANARGS
/*****************************************************************
 * TAG( main )
 * 
 * Algorithm from "A Technique for High Performance Data Compression",
 * Terry A. Welch, IEEE Computer Vol 17, No 6 (June 1984), pp 8-19.
 * 
 * Usage: compress [-d] [file]
 * Inputs:
 *	-d:	    If given, decompression is done instead.
 * 
 * 	file:	    File to be compressed.  If none specified, stdin
 *		    is used.
 * Outputs:
 * 	stdout:	    Compressed form of file.
 * 
 * Assumptions:
 * 	File is better off compressed.  Goes ahead and compresses even
 * if the result is bigger.
 * Algorithm:
 * 	Modified Lempel-Ziv method (LZW).  Basically finds common
 * substrings and replaces them with a variable size code.  This is
 * deterministic, and can be done on the fly.  Thus, the decompression
 * procedure needs no input table, but tracks the way the table was
 * built.
 */


main( argc, argv )
register int argc; char **argv;
{
    register FILE * infile;
    register char * fname = NULL;
    int decomp_flg = 0;
#ifdef DEBUG
    int verbose = 0;
#endif
	extern int optind;
	extern char *optarg;
	register c;

#ifdef BSD4.2
    /* 4.2BSD dependent - take it out if not */
    setlinebuf( stderr );
#endif
#ifndef NO_SCANARGS
    if ( scanargs( argc, argv, "compress D%- d%- v%- b%-maxbits!d file%s",
		    &debug, &decomp_flg, &verbose,
		    &maxbits, &maxbits, &fname ) == 0 )
	exit( 1 );
#else
    /* 
     * Roll your own here using getopt or whatever.  All flags are
     * optional.
     * -D => debug
     * -d => decomp_flg
     * -v => verbose
     * -b maxbits => maxbits.  If -b is specified, then maxbits MUST be
     *	    given also.
     * if a string is left, must be filename.
     */
#ifdef DEBUG
	while ((c=getopt(argc, argv, "Dvdb:")) != EOF)
#else
	while ((c=getopt(argc, argv, "db:")) != EOF)
#endif
		switch (c) {
#if DEBUG
		case 'D': debug = 1; break;
		case 'v': verbose = 1; break;
#endif
		case 'd': decomp_flg = 1; break;
		case 'b':
			maxbits = atoi(optarg);
			if (maxbits > 0)
				break;
		case '?':
#ifdef DEBUG
			fprintf(stderr,"usage: %s [-Ddv -b maxbits] [fname]\n",argv[0]);
#else
			fprintf(stderr,"usage: %s [-d] [-b maxbits] [fname]\n",argv[0]);
#endif
			exit( 1 );
		}
#if 0
	fprintf(stderr,"argc = %d optind = %d   %s\n", argc, optind, argv[optind]);
#endif
	if (argc > 1 && optind < argc)
		fname = argv[optind];
#endif

    if ( maxbits > BITS )
	maxbits = BITS;
    maxmaxcode = 1 << maxbits;

    if ( fname != NULL )
    {
	if ( ( infile = fopen( fname, "r" ) ) == NULL )
	{
	    perror( fname );
	    exit( 1 );
	}
    }
    else
	infile = stdin;

    if ( decomp_flg == 0 )
	compress( infile );
    else
#ifdef DEBUG
	if ( debug == 0 )
#endif
	decompress( infile );
#ifdef DEBUG
    else
    {
	/* 
	 * Just print out codes from input file.  Mostly for debugging.
	 */
	long int code;
	int col = 0, bits = n_bits;

	free_ent = 255;
	while ( ( code = getcode( infile ) ) >= 0 )
	{
	    if ( free_ent < maxmaxcode )
		free_ent++;
	    if ( bits != n_bits )
	    {
		printf( "\nChange to %d bits\n", n_bits );
		bits = n_bits;
		col = 0;
	    }
	    printf( "%5d%c", code, (col+=6) >= 74 ? (col = 0, '\n') : ' ' );
	}
	putchar( '\n' );
	exit( 0 );
    }

    /* If requested (verbose), dump string table */
    if ( verbose )
	dump_tab();
#endif
}


/*****************************************************************
 * TAG( compress )
 * 
 * Actually does the compression.
 * Inputs:
 * 	infile:	    File to compress.
 * Outputs:
 * 	Writes compressed file to stdout.
 * Assumptions:
 *	[None]
 * Algorithm:
 * 	See above.
 */

compress( infile )
register FILE *infile;
{
    register c;
    register int ent, n_ent;
    register long int in_count = 1;
#ifdef STATS
    int out_count = 0;
#endif

    /* 
     * Initialize the compression table to contain all 8-bit values
     * initially.  These don't need to be chained, they can be looked
     * up directly.
     */
    for ( ent = 0; ent < 256; ent++ )
    {
	tab_next[ent] = tab_chain[ent] = NULL;
	tab_suffix[ent] = ent;
    }

    free_ent = 256;

    ent = getc( infile );		/* initial entry */
    while ( !feof( infile ) && (c = getc(infile)) != (unsigned)EOF )
    {
	in_count++;
	/* 
	 * Find the entry corresponding to the current entry suffixed
	 * with this char.  Since the entries are sorted, stop when
	 * the suffix >= c.
	 */
	for (n_ent = tab_chain[ent]; n_ent != NULL; n_ent = tab_next[n_ent])
	    if ( tab_suffix[n_ent] >= (unsigned)c )
		goto found;			/* found it */

not_found:
	/* 
	 * If no such entry, do 2 things:
	 * 1. Put out code for current prefix string.
	 * 2. Add the new string to the table.
	 *    If the table is full, just start over with current input
	 *    character.
	 */
	    output( (long)ent );
#ifdef STATS
	    out_count++;
#endif
	    if ( (n_ent=free_ent) < maxmaxcode )
	    {
		/* Chain the new entry in 'c' order, so the aborted check
		   above works */

		register p_ent;

		tab_chain[n_ent] = NULL;
		tab_suffix[n_ent] = c;
		if((p_ent=tab_chain[ent]) == NULL ||
		   ((unsigned) c) < tab_suffix[p_ent]) {
			tab_next[n_ent] = p_ent;
			tab_chain[ent] = n_ent;
		} else {
			for(;;) {
				ent = tab_next[p_ent];
				if(ent == NULL ||
				   ((unsigned) c) < tab_suffix[ent]) break;
				p_ent = ent;
			}
			tab_next[n_ent] = ent;
			tab_next[p_ent] = n_ent;
		}
		free_ent = n_ent+1;
	    }
	    n_ent = c;
cont:
	ent = n_ent;
    }
    /* 
     * Put out the final code.
     */
    output( (long)ent );
#ifdef STATS
    out_count++;
#endif
    output( -1L );			/* finish up output if necessary */

    /* 
     * Print out stats on stderr
     */
#ifdef STATS
    fprintf( stderr,
	"%ld chars in, %ld codes (%ld bytes) out, compression factor %g\n",
		in_count, out_count, bytes_out,
		(double)in_count / (double)bytes_out );
    fprintf( stderr, "\tCompression as in compact: %5.2f%%\n",
		100.0 * ( in_count - bytes_out ) / (double) in_count );
    fprintf( stderr, "\tLargest code was %d (%d bits)\n", free_ent - 1, n_bits );
#else
    fprintf( stderr, "Compression: %5.2f%%\n",
		100.0 * ( in_count - bytes_out ) / (double) in_count );
#endif
    return;

found: 
	/* either we found the entry, or we skipped past it */

	if(tab_suffix[n_ent] == (unsigned) c) {
		goto cont;
	} else {
		goto not_found;
	}
}


/*****************************************************************
 * TAG( output )
 * 
 * Output the given code.
 * Inputs:
 * 	code:	A n_bits-bit integer.  If == -1, then EOF.  This assumes
 *		that n_bits =< (long)wordsize - 1.
 * 
 * Outputs:
 * 	Outputs code to the file.
 * Assumptions:
 *	Chars are 8 bits long.
 * Algorithm:
 * 	Maintain a BITS character long buffer (so that 8 codes will
 * fit in it exactly).  Use the VAX insv instruction to insert each
 * code in turn.  When the buffer fills up empty it and start over.
 */

#if vax
    static char buf[BITS];
#else
    static int buf[(BITS+sizeof(int)-sizeof(char))/sizeof(int)];
#endif
    static int offset = 0;

output( code )
long int  code;
{
#ifdef DEBUG
    static int col = 0;
#endif

#ifdef vax
    /* 
     * On the VAX, it is important to have the register declarations
     * in exactly the order given, or the asm will break.
     */
    register int r_off = offset, bits= n_bits;
    register char * bp = buf;
#endif

    if ( code >= 0 )
    {
#ifdef vax
	/* VAX DEPENDENT!! Implementation on other machines may be
	 * difficult.
	 * 
	 * Translation: Insert BITS bits from the argument starting at
	 * offset bits from the beginning of buf.
	 */
	asm( "insv	4(ap),r11,r10,(r9)" );
#else
	insert_bit(code);
#endif
	offset += n_bits;
	if ( offset == (n_bits << 3) )
	{
	    fwrite( buf, 1, n_bits, stdout );
	    offset = 0;
	    bytes_out += n_bits;
	}
#ifdef DEBUG
	if ( debug )
	    fprintf( stderr, "%5d%c", code,
		    (col+=6) >= 74 ? (col = 0, '\n') : ' ' );
#endif

	/* 
	 * If the next entry is going to be too big for the code size,
	 * then increase it, if possible.
	 */
	if ( free_ent > maxcode )
	{
	    /* 
	     * Write the whole buffer, because the input side won't
	     * discover the size increase until after it has read it.
	     */
	    if ( offset > 0 )
	    {
		fwrite( buf, 1, n_bits, stdout );
		bytes_out += n_bits;
	    }
	    offset = 0;
		
	    n_bits++;
	    if ( n_bits == maxbits )
		maxcode = maxmaxcode;
	    else
		maxcode = (1 << n_bits) - 1;
#ifdef DEBUG
	    if ( debug )
	    {
		fprintf( stderr, "\nChange to %d bits\n", n_bits );
		col = 0;
	    }
#endif
	}
    }
    else
    {
	/* 
	 * At EOF, write the rest of the buffer.
	 */
	if ( offset > 0 )
	    fwrite( buf, 1, (offset + 7) / 8, stdout );
	bytes_out += (offset + 7) / 8;
	offset = 0;
	fflush( stdout );
#ifdef DEBUG
	if ( debug )
	    fprintf( stderr, "\n" );
#endif
    }
}


/*****************************************************************
 * TAG( decompress )
 * 
 * Decompress the input file.
 * 
 * Inputs:
 * 	infile:	    File to be decompressed.
 * Outputs:
 * 	Writes decompressed file to stdout.
 * Assumptions:
 * 	Input file was created with compress (could use a magic #, I
 *	guess).
 * Algorithm:
 * 	See article cited above.
 */

decompress( infile )
register FILE *infile;
{
    register int stack_top = MAXSTACK;
    register long int code, oldcode, incode;
    register int finchar;
    char stack[MAXSTACK];

    /* 
     * As above, initialize the first 256 entries in the table.  
     */
    for ( free_ent = 0; free_ent < 256; free_ent++ )
    {
	tab_next[free_ent] = tab_chain[free_ent] = NULL;
	tab_prefix[free_ent] = 0;
	tab_suffix[free_ent] = free_ent;
    }

    finchar = oldcode = getcode( infile );
    putchar( (char)finchar );		/* first code must be 8 bits = char */

    while ( ( code = getcode( infile ) ) != -1 )
    {
	incode = code;
	/* 
	 * Special case for KwKwK string.
	 */
	if ( code >= free_ent )
	{
	    stack[--stack_top] = finchar;
	    code = oldcode;
	}

	/* 
	 * Generate output characters in reverse order
	 */
	while ( ((unsigned long)code) >= ((unsigned long)256) )
	{
	    stack[--stack_top] = tab_suffix[code];
	    code = tab_prefix[code];
	}
	stack[--stack_top] = finchar = tab_suffix[code];

	/* 
	 * And put them out in forward order
	 */
	fwrite( &stack[stack_top], 1, MAXSTACK - stack_top, stdout );
	stack_top = MAXSTACK;

	/* 
	 * Generate the new entry.
	 */
	if ( (code=free_ent) < maxmaxcode )
	{
	    tab_prefix[code] = oldcode;
	    tab_suffix[code] = finchar;
	    free_ent = code+1;
	}
	/* 
	 * Remember previous code.
	 */
	oldcode = incode;
    }
    fflush( stdout );
}


/*****************************************************************
 * TAG( getcode )
 * 
 * Read one code from the input file.  If EOF, return -1.
 * Inputs:
 * 	infile:	    Input file.
 * Outputs:
 * 	code or -1 is returned.
 * Assumptions:
 *	[None]
 * Algorithm:
 * 	For now, use scanf to read ascii input.
 */

long int
getcode( infile )
FILE *infile;
{
#ifdef vax
    /* 
     * On the VAX, it is important to have the register declarations
     * in exactly the order given, or the asm will break.
     */
    register long int code;
    static int offset = 0, size = 0;
    static char buf[BITS];
    register int r_off, bits;
    register char * bp = buf;
#else
    register long code;
    static int size = 0;
#endif

    if ( offset >= size || free_ent > maxcode )
    {
	/* 
	 * If the next entry will be too big for the current code
	 * size, then we must increase the size.  This implies reading
	 * a new buffer full, too.
	 */
	if ( free_ent > maxcode )
	{
	    n_bits++;
	    if ( n_bits == maxbits )
		maxcode = maxmaxcode;	/* won't get any bigger now */
	    else
		maxcode = (1 << n_bits) - 1;
	}
	size = fread( buf, 1, n_bits, infile );
	if ( size <= 0 )
	    return -1;			/* end of file */
	offset = 0;
	/* Round size down to integral number of codes */
	size = (size << 3) - (n_bits - 1);
    }
#ifdef vax
    /* 
     * Get it into a register for the following VAX dependent code.
     */
    r_off = offset;
    bits = n_bits;
    asm( "extzv   r10,r9,(r8),r11" );
#else
    code = fetch();
#endif
    offset += n_bits;

    return code;
}

#ifdef DEBUG

/*****************************************************************
 * TAG( dump_tab )
 * 
 * Dump the string table.
 * Inputs:
 *	[None]
 * Outputs:
 *	[None]
 * Assumptions:
 *	[None]
 * Algorithm:
 *	[None]
 */

dump_tab()
{
    register int i;
    register ent;
    char stack[4 * MAXSTACK];	/* \nnn makes it 4 times bigger */
    int stack_top = 4 * MAXSTACK;

    for ( i = 0; i < free_ent; i++ )
    {
	ent = i;
	if ( isascii(tab_suffix[ent]) && isprint(tab_suffix[ent]) )
	    fprintf( stderr, "%5d: %5d/'%c'  \"",
			ent, tab_prefix[ent], tab_suffix[ent] );
	else
	    fprintf( stderr, "%5d: %5d/\\%03o \"",
			ent, tab_prefix[ent], tab_suffix[ent] );
	stack[--stack_top] = '\n';
	stack[--stack_top] = '"';
	for ( ; ent != NULL;
		ent = (ent >= 256 ? tab_prefix[ent] : NULL) )
	{
	    if ( isascii(tab_suffix[ent]) && isprint(tab_suffix[ent]) )
		stack[--stack_top] = tab_suffix[ent];
	    else
	    {
		switch( tab_suffix[ent] )
		{
		case '\n': stack[--stack_top] = 'n'; break;
		case '\t': stack[--stack_top] = 't'; break;
		case '\b': stack[--stack_top] = 'b'; break;
		case '\f': stack[--stack_top] = 'f'; break;
		case '\r': stack[--stack_top] = 'r'; break;
		default:
		    stack[--stack_top] = '0' + tab_suffix[ent] % 8;
		    stack[--stack_top] = '0' + (tab_suffix[ent] / 8) % 8;
		    stack[--stack_top] = '0' + tab_suffix[ent] / 64;
		    break;
		}
		stack[--stack_top] = '\\';
	    }
	}
	fwrite( &stack[stack_top], 1, 4 * MAXSTACK - stack_top, stderr );
	stack_top = 4 * MAXSTACK;
    }
}
#endif

static unsigned sizemask[] = { 0,
	0x00000001, 0x00000003, 0x00000007, 0x0000000F,
	0x0000001F, 0x0000003F, 0x0000007F, 0x000000FF,
	0x000001FF, 0x000003FF, 0x000007FF, 0x00000FFF,
	0x00001FFF, 0x00003FFF, 0x00007FFF, 0x0000FFFF,
	0x0001FFFF
	};

#undef DEBUG_INSERT

#define base buf

/* insert a value of "size" bits at "offset" bits from base */

insert_bit(value) register long int value; {

	register offs = offset;
	register size = n_bits;
	register  word, shift;

	word = offs >> 5;
	offs &= 31;

	if (((unsigned)offs + size) <= ((unsigned)32)) {
		shift = (32 - size) - offs;

#ifdef DEBUG_INSERT
fprintf(stderr,"offs=%2d size=%d value=%2x shift=%2d oword=%x",offs, size, value, shift, base[word]);
#endif
		base[word] = (base[word] & ~(sizemask[size] << shift)) |
				(value  << shift);
#ifdef DEBUG_INSERT
fprintf(stderr," nword=%x\n", base[word]);
#endif

	} else {

		register size1 = 32 - offs;

#ifdef DEBUG_INSERT
fprintf(stderr,"offs=%d size=%d value=%x oword1=%x",offs, size, value, base[word]);
#endif
		base[word] = (base[word] & ~sizemask[size1]) |
				((unsigned)value >> (size-size1));
		size -= size1;
		shift = 32 - size;
#ifdef DEBUG_INSERT
fprintf(stderr," nword1=%x oword2=%x", base[word], base[word+1]);
#endif
		base[word+1] = (base[word+1] & ~(sizemask[size] << shift)) |
				     (value << shift);
#ifdef DEBUG_INSERT
fprintf(stderr," nword2=%x\n", base[word+1]);
#endif
	}
}

#define DEBUG_FETCH
#undef DEBUG_FETCH

fetch() {

	register offs = offset;
	register size = n_bits;
	register word;

#ifdef slower
	word = offs / 32;
	offs %= 32;
#else
	word = offs >> 5;
	offs &= 31;
#endif
	if (((unsigned)offs + size) <= ((unsigned)32)) {

#ifdef DEBUG_FETCH
fprintf(stderr,"offs=%2d size=%d word=%x return=%x\n",offs, size, base[word], ((unsigned)base[word] >> ((32-size) - offs)) & sizemask[size]);
#endif
		return( ((unsigned)base[word] >> ((32-size) - offs)) & sizemask[size]);

	} else {
		register size2 = size - (32 - offs);

#ifdef DEBUG_FETCH
fprintf(stderr,"offs=%2d size=%d size2=%d word1=%x word2=%x return=%x\n", offs, size, size2, base[word], base[word+1], (((base[word] << size2) | ((unsigned)base[word+1] >> (32-size2))) & sizemask[size]) );
#endif
		return( ((base[word] << size2) | ((unsigned)base[word+1] >> (32-size2))) & sizemask[size]);
	}
}

------ EOF ------
ls -l compress.c
cat >compress.l <<'------ EOF ------'
.PU
.TH COMPRESS 1 local
.SH NAME
compress  \-  compress files
.SH SYNOPSIS
.B compress
.B [\-d]
[filename]
.br
.B uncompress
[filename]
.SH DESCRIPTION
Compresses the specified file or standard input.
The result is written to standard output.
Compressed files can be restored
to their original form by specifying the
.B \-d
option, or by running
.I uncompress.
.PP
.I Compress
uses the modified Lempel-Ziv algorithm described in
"A Technique for High Performance Data Compression",
Terry A. Welch,
.I IEEE Computer
Vol 17, No 6 (June 1984), pp 8-19.
.PP
The amount of compression obtained depends on the size of the
input file, and the distribution of character sequences.
Typically, text files, such as C programs,
are reduced to 40\-50% of their original size;
.SH BUGS
Even if the file gets bigger after compression, the output is still produced.
.SH SEE ALSO
Pack(l), Pcat(l), Unpack(l)
------ EOF ------
ls -l compress.l
cat >copystat.c <<'------ EOF ------'
/*  File   : copymode.c
    Author : Richard A. O'Keefe & Joseph M. Orost
    Updated: 28 April 1984

    copystat FromFile ToFile1 ... ToFilen

    copies the mode (access permission) bits, the dates, and the owner (if
    possible)  from FromFile to ToFile1 ... ToFilen.  The exit code is
    0) all went well
    1) there were too few parameters
    2) stat(2) wasn't happy with FromFile
    3) chmod(2) wasn't happy with one or more of the ToFiles.
    All the ToFiles which can be changed will be changed, even
    if a file earlier in the list cannot be.  E.g. suppose
	copystat from a		# would succeed
	copystat from b		# would fail
	copystat from c		# would succeed
    then
	copystat from a b c	# will change a and c and will fail.

    This program is for use with filters.  A filter cannot make its
    output have the same protection as its input, because it has no
    access to the files as such, only to their contents.  Given a
    filter, to make something which preserves permissions, define
    Filter =
	a=$1
	b=$2
	shift;shift
	filter <$a >$b $*
	copystat $a $b
*/

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

main(argc, argv)
    int argc;
    register char **argv;
    {
        struct stat statbuf;
	int mode;
	int errs = 0;

	if (argc < 3) {
	    fprintf(stderr, "Usage: copystat From To...\n");
	    exit(1);
	}
	if (stat(argv[1], &statbuf)) {
	    perror(argv[1]);
	    exit(2);
	}
	mode = statbuf.st_mode & 07777;
	argv++;
	while (*++argv) {
	    if (chmod(*argv, mode)) {
		perror(*argv);
		errs++;
	    }
	    chown(*argv, statbuf.st_uid, statbuf.st_gid);
	    utime(*argv, &statbuf.st_atime);
	}
	exit(errs ? 3 : 0);
    }
------ EOF ------
ls -l copystat.c
cat >copystat.l <<'------ EOF ------'
.TH COPYSTAT 1 local
.SH NAME
copystat \- copy stat info from file
.SH SYNOPSIS
.B copystat
FromFile ToFile ...
.SH DESCRIPTION
The mode, owner/group, along with the accessed and updated times of
each named
.I ToFile
is changed
to be exactly the same as that of the
.I FromFile.
.PP
The point of this command is that when you transform a file using
one or more filters, you would very often like the transformed
file to have the same access permissions, owner/group, and dates as the 
original.  But as
the filters never know what files they are working on (and indeed,
if you use pipes, there is no place to carry this information
from one stage to the next), it is not possible for the filters to
do this, and they create the transformed file with the same default
that output redirection would use.  With this command you can copy
the permissions, owner/group, and dates of the original file to the 
transformed file.
.PP
Only the owner of a file (or the super-user) may change its mode.
Only the super-user may change owner/group.
.SH EXAMPLES
.IP
crypt key <cypher >clear; copystat cypher clear
.PP
This is the case that copystat was created for.  The decrypted
file "clear" gets the same permissions as the encrypted file "crypt".
In addition, the ownership and the dates are set as in the original file,
causing it not no be touched.
.IP
split -250 bigfile piece
.br
copystat bigfile piece*
.PP
Splits a file into pieces and ensures that the pieces have the
same permissions, owner, and dates as the original file.  (Note that these must
be separate lines for filename expansion to work.)
.SH "SEE ALSO"
chmod(1),
ls(1),
chmod(2),
chown(2),
stat(2),
umask(2),
utime(2)
.SH BUGS
This should be another option ("-f FromFile") in chmod(1).
------ EOF ------
ls -l copystat.l
cat >uncompress <<'------ EOF ------'
compress -d $*
------ EOF ------
chmod +x uncompress
ls -l uncompress
cat >uncompress.l <<'------ EOF ------'
.so manl/compress.l
------ EOF ------
ls -l uncompress.l