[net.sources] SQU-PORT2.SHAR

bill@persci.UUCP (05/09/85)

Well, this time I should have it right! This utility has been tested on a 
number of systems, and works! It is the same as SQU-PORT which was
apparently posted last year, but with some syntactic modifications to allow
it to be compiled on more systems. Judging from the number of requests I
received for sq/usq which I posted earlier, a lot of you missed the previous
posting of SQU-PORT. For this reason, and to hopefully head off any further
use of last week's posting of sq/usq, I am posting SQU-PORT2.

Bill Swan 	{ihnp4|decvax|allegra|...}!uw-beaver!tikal!persci!bill
----------------------------------------------------------------------------
#!/bin/sh
# shar:	Shell Archiver
#	Run the following text with /bin/sh to create:
#	README
#	sq.1
#	makesq
#	makeusq
#	sq.c
#	sq.h
#	sqcom.h
#	sqdebug.c
#	sqio.c
#	tr1.c
#	tr2.c
#	usq.c
#	usq.h
#	utr.c
sed 's/^X//' << 'SHAR_EOF' > README
XIncluded in this distribution of 'sq' and 'usq' are:
X
X	1) sq.1			- man page
X	2) sqcom.h		- header file for sq and usq compilations
X	3) sq.h			- header file for sq compilations
X	4) usq.h		- header file for usq compilations
X	5) sq.c, tr1.c, tr2.c,
X	   sqio.c, sqdebug.c	- sources for sq
X	6) usq.c, utr.c		- sources for usq
X	7) makesq, makeusq	- 'make' files for sq and usq
X	8) README		-  this file
X
XWith the exception of providing 'make' files for Unix, Xenix, etc. systems,
XI have not included compilation instructions for other compilers.  The
Xonly thing you need to know is which modules are linked with which programs.
XTo link 'sq', compile:
X
X	sq.c
X	tr1.c
X	tr2.c
X	sqio.c
X
Xand then link the four object files with your standard libraries to produce
Xthe binary for 'sq'.  If you need to debug 'sq', compile sqdebug.c, set the
XDEBUG #define in tr2.c, recompile tr2.c, and relink the above four files
Xplus sqdebug.
X
XTo create 'usq', compile:
X
X	usq.c
X	utr.c
X
Xand then link the two object files with your standard libraries to produce
Xthe binary for 'usq'.
X
XThere are two system-dependent #define's in sq.c which may need to be
Xmodified.  They are at the beginning of the source (after the large block
Xof comments), and are FNM_LEN and UNIX.
X
XFNM_LEN should be set to the maximum length of a file name on the target
Xsystem.
X
XUNIX should be defined for Unix or a derivative, and undefined (i.e.,
Xcommented out) for CP/M, MP/M, MS-DOS, PC-DOS, or any system that
Xhas separate file "names" and "types" (delimited by a period).
X
XI have compiled and tested the code with the following compilers on
Xthe following systems:
X
Xsystem "C" compiler		Xenix (Altos 586)
Xsystem "C" compiler		Unix BSD 4.2 (Vax 780)
Xsystem "C" compiler		Zeus (Zilog Z8000)
XComputer Innovations C86	PC-DOS (MS-DOS) (IBM PC, etc.)
XComputer Innovations C86	CP/M-86 (Altos 586)
XDeSmet C88			CP/M-86 (Altos 586)
XAztec C II			CP/M-80 (CompuPro)
X
XIf some of the code looks funny (read: slow), remember that it has to be
Xportable accross all the above systems, and I get tired of too many
X#define's.  If you successfully port it to some system that requires
Xminor (but backwards compatible) changes, let me know, and I'll update
Xthem into my sources, and make sure they still work on the above systems.
X
XOh, and also, for those of you who argue that compact and uncompact should
Xbe used under Unix, consider the following:
X
X1)  These programs are compatible with the popular 'SQ' and 'USQ' programs
Xfound on many bulletin board systems.  Files squeezed by other programs
Xwill always be unsqueezable by any version of 'usq'.
X
X2)  You have the sources yourself.  PLEASE don't make changes which affect
Xthe internal format of the squeezed file.  This causes great portability
Xand compatibility headaches.
X
X3)  'sq' and 'usq' are faster than compact/uncompact, and since they use
Xan adaptive Huffman coding, rather than a fixed one, they compress files
Xas well or better.
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sq.1
X.TH SQ 1 Local
X.SH NAME
Xsq, usq \- squeeze (compress) files
X.SH SYNOPSYS
X.B sq
X[
X.I file1
X] [
X.I file2
X] ...
X.br
X.B usq
X[
X.I \-count
X]
X[
X.I \-fcount
X]
X[
X.I file1
X] [
X.I file2
X] ...
X.SH DESCRIPTION
X.I sq
Xcompresses one or more files, using a Huffman coding scheme.
X.I usq 
Xdecompresses or displays one or more files squeezed by
X.I sq.
XThe algorithm is identical to the one used by the
Xpopular public domain 'SQ' programs, originally authored by Richard
XGreenlaw.
X.PP
Xsq will squeeze each file passed on the argument line.  It will
Xappend `.SQ' to the original file name to create the output file
Xname.  If no file names are given, sq will prompt for file names
Xfrom the standard input.  A message is printed for each file,
Xtracing each pass of the compression process.
X.PP
X.I usq
Xwill unsqueeze or display the files requested on the command
Xline.  If no files are requested,  the file names are input from
Xthe standard input.  If no option is given, the file is unsqueezed
Xto its original name.
X.PP
XIf the \-count option is used, count lines are displayed
Xfrom the start of the file, with all unprintable characters
Xexcept CR, LF, TAB, and FF converted to periods.  The output is
Xsent to the standard output.  If the \-fcount option is used,
Xthe file is displayed with a formfeed appended to the preview
Xof each file.
X.SH AUTHOR
XRichard Greenlaw (original), Theo Pozzy (ported versions)
X.SH BUGS
XThe file naming convention is not a one-to-one mapping, so multiple
Xfiles may be squeezed to the same output file name.
XAlso, the output file name for usq cannot be overridden.
X.SH "SEE ALSO"
Xcompact(1)
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > makesq
XCFLAGS=-O
X
Xsq:	sq.o tr2.o tr1.o sqdebug.o sqio.o
X	cc $(CFLAGS) -o sq sq.o tr2.o tr1.o sqdebug.o sqio.o
Xsq.o tr2.o tr1.o sqdebug.o sqio.o: sq.h sqcom.h
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > makeusq
XCFLAGS=-O
Xusq:	usq.o utr.o
X	cc $(CFLAGS) -o usq usq.o utr.o
Xusq.o utr.o: usq.h sqcom.h
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sq.c
X/* This program compresses a file without loosing information.
X * The "usq" program is required to unsqueeze the file
X * before it can be used.
X *
X * Typical compression rates are between 30 and 50 percent for text files.
X *
X * Squeezing a really big file takes a few minutes.
X *
X * Useage:
X *	sq [file1] [file2] ... [filen]
X *
X * where file1 through filen are the names of the files to be squeezed.
X * The file type (under CP/M or MS-DOS) is changed to ".SQ"; under UN*X,
X * ".SQ" is appended to the file name. The original file name is stored
X * in the squeezed file.
X *
X * If no file name is given on the command line you will be
X * prompted for commands (one at a time). An empty command
X * terminates the program.
X *
X * The transformations compress strings of identical bytes and
X * then encode each resulting byte value and EOF as bit strings
X * having lengths in inverse proportion to their frequency of
X * occurrance in the intermediate input stream. The latter uses
X * the Huffman algorithm. Decoding information is included in
X * the squeezed file, so squeezing short files or files with
X * uniformly distributed byte values will actually increase size.
X */
X
X/* CHANGE HISTORY:
X * 1.3	Close files properly in case of error exit.
X * 1.4	Break up long introductory lines.
X * 1.4	Send introduction only to console.
X * 1.4	Send errors only to console.
X * 1.5  Fix BUG that caused a rare few squeezed files
X *	to be incorrect and fail the USQ crc check.
X *	The problem was that some 17 bit codes were
X *	generated but are not supported by other code.
X *	THIS IS A MAJOR CHANGE affecting TR2.C and SQ.H and
X *	requires recompilation of all files which are part
X *	of SQ. Two basic changes were made: tree depth is now
X *	used as a tie breaker when weights are equal. This
X *	makes the tree shallower. Although that may always be
X *	sufficient, an error trap was added to cause rescaling
X *	of the counts if any code > 16 bits long is generated.
X * 1.5	Add debugging displays option '-'.
X * 1.6  Fixed to work correctly under MP/M II.  Also shortened
X *      signon message.
X * 2.0	New version for use with CI-C86 compiler (CP/M-86 and MS-DOS)
X * 2.1  Converted for use in MLINK
X * 2.2  Converted for use with optimizing CI-C86 compiler (MS-DOS)
X * 3.0  Generalized for UN*X use, changed output file naming convention
X */
X
X/* ejecteject */
X
X/*
X * The following define MUST be set to the maximum length of a file name
X * on the system "sq" is being compiled for.  If not, "sq" will not be
X * able to check for whether the output file name it creates is valid
X * or not.
X */
X
X#define FNM_LEN 14
X#define UNIX				/* comment out for CP/M, MS-DOS versions */
X#define SQMAIN
X
X#define VERSION "3.1   12/19/84"
X
X#include <stdio.h>
X#include "sqcom.h"
X#include "sq.h"
X#define FALSE 0
X
Xmain(argc, argv)
Xint argc;
Xchar *argv[];
X{
X	int i,c;
X	char inparg[128];	/* parameter from input */
X
X	debug = FALSE;
X	printf("File squeezer version %s (original author: R. Greenlaw)\n\n", VERSION);
X
X	/* Process the parameters in order */
X	for(i = 1; i < argc; ++i)
X		obey(argv[i]);
X
X	if(argc < 2) {
X		printf("Enter file names, one line at a time, or type <RETURN> to quit.");
X		do {
X			printf("\n*");
X			for(i = 0; i < 16; ++i) {
X				if((c = getchar()) == EOF)
X					c = '\n';	/* fake empty (exit) command */
X				if((inparg[i] = c) == '\n') {
X					inparg[i] = '\0';
X					break;
X				}
X			}
X			if(inparg[0] != '\0')
X				obey(inparg);
X		} while(inparg[0] != '\0');
X	}
X}
X
X/* ejecteject */
X
Xobey(p)
Xchar *p;
X{
X	char *q;
X	char outfile[128];	/* output file spec. */
X
X	if(*p == '-') {
X		/* toggle debug option */
X		debug = !debug;
X		return;
X	}
X
X	/* Check for ambiguous (wild-card) name */
X	for(q = p; *q != '\0'; ++q)
X		if(*q == '*' || *q == '?') {
X			printf("\nAmbiguous name %s ignored", p);
X			return;
X	}
X	/* First build output file name */
X	strcpy(outfile, p);		/* copy input name to output */
X
X	/* Find and change output file suffix */
X
X	if (strlen(outfile) + 3 > FNM_LEN) {	/* check for long file name */
X		q = outfile + FNM_LEN - 3;
X		*q = '\0';		/* make room for suffix */
X	}
X	else {
X		q = outfile + strlen(outfile);
X#ifndef UNIX
X		for(; --q >= outfile;)
X			if (*q == '.') {
X				*q = '\0';	/* delete file type */
X				break;
X			}
X#else
X		--q;
X#endif
X	}
X
X	strcat(outfile, ".SQ");
X
X	squeeze(p, outfile);
X}
X
X/* ejecteject */
X
Xsqueeze(infile, outfile)
Xchar *infile, *outfile;
X{
X	int i, c,c2;
X	FILE *inbuff, *outbuff;		/* file buffers */
X
X	printf("%s -> %s: ", infile, outfile);
X
X	if(!(inbuff=fopen(infile, "rb"))) {
X		printf("Can't open %s for input pass 1\n", infile);
X		return;
X	}
X	if(!(outbuff=fopen(outfile, "wb"))) {
X		printf("Can't create %s\n", outfile);
X		fclose(inbuff);
X		return;
X	}
X
X	/* First pass - get properties of file */
X	crc = 0;	/* initialize checksum */
X	printf("analyzing, ");
X	init_ncr();
X	init_huff(inbuff);   
X	fclose(inbuff);
X
X	/* Write output file header with decoding info */
X	wrt_head(outbuff, infile);
X
X	/* Second pass - encode the file */
X	printf("squeezing,");
X	if(!(inbuff=fopen(infile, "rb"))) {
X		printf("Can't open %s for input pass 2\n", infile);
X		goto closeout;
X	}
X	init_ncr();	/* For second pass */
X
X	/* Translate the input file into the output file */
X	while((c = gethuff(inbuff)) != EOF)
X		putce(c, outbuff);
X	oflush(outbuff);
X	printf(" done.\n");
Xcloseall:
X	fclose(inbuff);
Xcloseout:
X	fclose(outbuff);
X}
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sq.h
X/* Definitions and external declarations */
X
XEXTERN char	debug;	/* Boolean flag */
X
X/* *** Stuff for first translation module *** */
X
XEXTERN int likect;	/*count of consecutive identical chars */
XEXTERN int lastchar, newchar;
XEXTERN char state;
X
X/* states */
X
X#define NOHIST	0 	/*don't consider previous input*/
X#define SENTCHAR 1 	/*lastchar set, no lookahead yet */
X#define SENDNEWC 2 	/*newchar set, previous sequence done */
X#define SENDCNT 3 	/*newchar set, DLE sent, send count next */
X
X/* *** Stuff for second translation module *** */
X
X#define NOCHILD -1	/* indicates end of path through tree */
X#define NUMNODES (NUMVALS + NUMVALS - 1)	/* nbr of nodes */
X
X#define MAXCOUNT 0xFFFF	/* biggest unsigned integer */
X
X/* The following array of structures are the nodes of the
X * binary trees. The first NUMVALS nodes becomethe leaves of the
X * final tree and represent the values of the data bytes being
X * encoded and the special endfile, SPEOF.
X * The remaining nodes become the internal nodes of the final tree.
X */
X
XEXTERN struct	nd {
X	unsigned int weight;	/* number of appearances */
X	char tdepth;		/* length on longest path in tre*/
X	int lchild, rchild;	/* indexes to next level */
X} node[NUMNODES];
X
XEXTERN int dctreehd;	/*index to head node of final tree */
X
X/* This is the encoding table:
X * The bit strings have first bit in  low bit.
X * Note that counts were scaled so code fits unsigned integer
X */
X
XEXTERN char codelen[NUMVALS];		/* number of bits in code */
XEXTERN unsigned int code[NUMVALS];	/* code itself, right adjusted */
XEXTERN unsigned int tcode;		/* temporary code value */
X
X
X/* Variables used by encoding process */
X
XEXTERN int curin;		/* Value currently being encoded */
XEXTERN char cbitsrem;		/* Number of code string bits remaining */
XEXTERN unsigned int ccode;	/* Current code shifted so next code bit is at right */
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sqcom.h
X#ifdef SQMAIN
X#define EXTERN
X#else
X#define EXTERN extern
X#endif
X
X/* Definitions and external declarations */
X
X#define RECOGNIZE 0xFF76	/* unlikely pattern */
X
X/* *** Stuff for first translation module *** */
X
X#define DLE 0x90
X
XEXTERN unsigned int crc;	/* error check code */
X
X/* *** Stuff for second translation module *** */
X
X#define SPEOF 256	/* special endfile token */
X#define NUMVALS 257	/* 256 data values plus SPEOF*/
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sqdebug.c
X/* Debugging aids for SQ and related modules */
X
X#include <stdio.h>
X#include "sqcom.h"
X#include "sq.h"
X
Xpcounts()
X{
X	int i;
X
X	if(debug) {
X		printf("\nCounts after 1st algorithm and maybe scaling");
X		for(i = 0; i < NUMVALS; ++i) {
X			if(i%8 == 0)
X				printf("\n%4x  ", i);
X			printf("%5u  ", node[i].weight);
X		}
X	printf("\n\n");
X	}
X}
X
Xphuff()
X{
X	int i;
X
X	if(debug) {
X		printf("\nEncoding tree - root=%3d\n", dctreehd);
X		for(i = 0; i < NUMNODES; ++i)
X			if(node[i].weight != 0)
X				printf("%3d  w=%5u d=%3d l=%3d r=%3d\n", i, node[i].weight, node[i].tdepth, node[i].lchild, node[i].rchild);
X
X		printf("\nHuffman codes\n");
X		for(i = 0; i < NUMVALS; ++i) {
X			if(codelen[i] > 0)
X				printf("%3d  %4x l=%2d c=%4x\n", i, i, codelen[i], code[i]);
X		}
X	}
X}
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > sqio.c
X#include <stdio.h>
X#include "sqcom.h"
X#include "sq.h"
X#define ERROR -1
X
X/* Get next byte from file and update checksum */
X
Xint
Xgetc_crc(ib)
XFILE *ib;
X{
X	int c;
X
X	c = getc(ib);
X	if (c != EOF)
X		crc += c;		/* checksum */
X	return c;
X}
X
X/* Output functions with error reporting */
X
Xstatic char obuf[128];
Xstatic int oblen = 0;
X
Xputce(c,  iob)
Xint c;
XFILE *iob;
X{
X	obuf[oblen++] = c;
X	if (oblen >= sizeof(obuf)) oflush(iob);
X}
X
Xputwe(w,  iob)
Xint w;
XFILE *iob;
X{
X	obuf[oblen++] = w;
X	if (oblen >= sizeof(obuf)) oflush(iob);
X	obuf[oblen++] = w >> 8;
X	if (oblen >= sizeof(obuf)) oflush(iob);
X}
X
Xoflush(iob)				/* flush output buffer */
XFILE *iob;
X{
X	if (oblen && !fwrite(obuf, oblen, 1, iob)) {
X		printf("Error writing output file\n");
X		exit(1);
X	}
X	oblen = 0;
X}
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > tr1.c
X#include <stdio.h>
X#include "sqcom.h"
X#include "sq.h"
X#define ERROR -1
X#define TRUE 1
X#define FALSE 0
X
X/* First translation - encoding of repeated characters
X * The code is byte for byte pass through except that
X * DLE is encoded as DLE, zero and repeated byte values
X * are encoded as value, DLE, count for count >= 3.
X */
X
Xinit_ncr()	/*initialize getcnr() */
X{
X	state = NOHIST;
X}
X
Xint
Xgetcnr(iob)
XFILE *iob;
X{
X	switch(state) {
X	case NOHIST:
X		/* No relevant history */
X		state = SENTCHAR;
X		return lastchar = getc_crc(iob);   
X	case SENTCHAR:
X		/* Lastchar is set, need lookahead */
X		switch(lastchar) {
X		case DLE:
X			state = NOHIST;
X			return 0;	/* indicates DLE was the data */
X		case EOF:
X			return EOF;
X		default:
X			for(likect = 1; (newchar = getc_crc(iob)) == lastchar && likect < 255; ++likect)
X				;
X			switch(likect) {
X			case 1:
X				return lastchar = newchar;
X			case 2:
X				/* just pass through */
X				state = SENDNEWC;
X				return lastchar;
X			default:
X				state = SENDCNT;
X				return DLE;
X			}
X		}
X	case SENDNEWC:
X		/* Previous sequence complete, newchar set */
X		state = SENTCHAR;
X		return lastchar = newchar;
X	case SENDCNT:
X		/* Sent DLE for repeat sequence, send count */
X		state = SENDNEWC;
X		return likect;
X	default:
X		puts("Bug - bad state\n");
X		exit(1);
X	}
X}
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > tr2.c
X/* #define DEBUG */
X#include <stdio.h>
X#include "sqcom.h"
X#include "sq.h"
X#define ERROR -1
X#define TRUE 1
X#define FALSE 0
X
X/******** Second translation - bytes to variable length bit strings *********/
X
X
X/* This translation uses the Huffman algorithm to develop a
X * binary tree representing the decoding information for
X * a variable length bit string code for each input value.
X * Each string's length is in inverse proportion to its
X * frequency of appearance in the incoming data stream.
X * The encoding table is derived from the decoding table.
X *
X * The range of valid values into the Huffman algorithm are
X * the values of a byte stored in an integer plus the special
X * endfile value chosen to be an adjacent value. Overall, 0-SPEOF.
X *
X * The "node" array of structures contains the nodes of the
X * binary tree. The first NUMVALS nodes are the leaves of the
X * tree and represent the values of the data bytes being
X * encoded and the special endfile, SPEOF.
X * The remaining nodes become the internal nodes of the tree.
X *
X * In the original design it was believed that
X * a Huffman code would fit in the same number of
X * bits that will hold the sum of all the counts.
X * That was disproven by a user's file and was a rare but
X * infamous bug. This version attempts to choose among equally
X * weighted subtrees according to their maximum depths to avoid
X * unnecessarily long codes. In case that is not sufficient
X * to guarantee codes <= 16 bits long, we initially scale
X * the counts so the total fits in an unsigned integer, but
X * if codes longer than 16 bits are generated the counts are
X * rescaled to a lower ceiling and code generation is retried.
X */
X
X/* Initialize the Huffman translation. This requires reading
X * the input file through any preceding translation functions
X * to get the frequency distribution of the various values.
X */
X
Xinit_huff(ib)          
XFILE *ib;
X{
X	int c, i;
X	int btlist[NUMVALS];	/* list of intermediate binary trees */
X	int listlen;		/* length of btlist */
X	unsigned int *wp;	/* simplifies weight counting */
X	unsigned int ceiling;	/* limit for scaling */
X
X	/* Initialize tree nodes to no weight, no children */
X	init_tree();
X
X	/* Build frequency info in tree */
X	do {
X		c = getcnr(ib);        
X		if(c == EOF)
X			c = SPEOF;
X		if(*(wp = &node[c].weight) !=  MAXCOUNT)
X			++(*wp);
X	} while(c != SPEOF);
X#ifdef DEBUG
X	pcounts();	/* debugging aid */
X#endif	
X	ceiling = MAXCOUNT;
X
X	do {	/* Keep trying to scale and encode */
X		if(ceiling != MAXCOUNT)
X			printf("*** rescaling ***, ");
X		scale(ceiling);
X		ceiling /= 2;	/* in case we rescale */
X#ifdef DEBUG
X		pcounts();	/* debugging aid */
X#endif
X
X		/* Build list of single node binary trees having
X		 * leaves for the input values with non-zero counts
X		 */
X		for(i = listlen = 0; i < NUMVALS; ++i)
X			if(node[i].weight != 0) {
X				node[i].tdepth = 0;
X				btlist[listlen++] = i;
X			}
X
X		/* Arrange list of trees into a heap with the entry
X		 * indexing the node with the least weight at the top.
X		 */
X		heap(btlist, listlen);
X
X		/* Convert the list of trees to a single decoding tree */
X		bld_tree(btlist, listlen);
X
X		/* Initialize the encoding table */
X		init_enc();
X
X		/* Try to build encoding table.
X		 * Fail if any code is > 16 bits long.
X		 */
X	} while(buildenc(0, dctreehd) == ERROR);
X#ifdef DEBUG
X	phuff();	/* debugging aid */
X#endif
X	/* Initialize encoding variables */
X	cbitsrem = 0;	/*force initial read */
X	curin = 0;	/*anything but endfile*/
X}
X/*  */
X/* The count of number of occurrances of each input value
X * have already been prevented from exceeding MAXCOUNT.
X * Now we must scale them so that their sum doesn't exceed
X * ceiling and yet no non-zero count can become zero.
X * This scaling prevents errors in the weights of the
X * interior nodes of the Huffman tree and also ensures that
X * the codes will fit in an unsigned integer. Rescaling is
X * used if necessary to limit the code length.
X */
X
Xscale(ceil)
Xunsigned int ceil;	/* upper limit on total weight */
X{
X	int c, ovflw, divisor, i;
X	unsigned int w, sum;
X	char increased;		/* flag */
X
X	do {
X		for(i = sum = ovflw = 0; i < NUMVALS; ++i) {
X			if(node[i].weight > (ceil - sum))
X				++ovflw;
X			sum += node[i].weight;
X		}
X
X		divisor = ovflw + 1;
X
X		/* Ensure no non-zero values are lost */
X		increased = FALSE;
X		for(i = 0; i < NUMVALS; ++i) {
X			w = node[i].weight;
X			if (w < divisor && w > 0) {
X				/* Don't fail to provide a code if it's used at all */
X				node[i].weight = divisor;
X				increased = TRUE;
X			}
X		}
X	} while(increased);
X
X	/* Scaling factor choosen, now scale */
X	if(divisor > 1)
X		for(i = 0; i < NUMVALS; ++i)
X			node[i].weight /= divisor;
X}
X/*  */
X/* heap() and adjust() maintain a list of binary trees as a
X * heap with the top indexing the binary tree on the list
X * which has the least weight or, in case of equal weights,
X * least depth in its longest path. The depth part is not
X * strictly necessary, but tends to avoid long codes which
X * might provoke rescaling.
X */
X
Xheap(list, length)
Xint list[], length;
X{
X	int i;
X
X	for(i = (length - 2) / 2; i >= 0; --i)
X		adjust(list, i, length - 1);
X}
X
X/* Make a heap from a heap with a new top */
X
Xadjust(list, top, bottom)
Xint list[], top, bottom;
X{
X	int k, temp;
X	char cmptrees();
X
X	k = 2 * top + 1;	/* left child of top */
X	temp = list[top];	/* remember root node of top tree */
X	if( k <= bottom) {
X		if( k < bottom && cmptrees(list[k], list[k + 1]))
X			++k;
X
X		/* k indexes "smaller" child (in heap of trees) of top */
X		/* now make top index "smaller" of old top and smallest child */
X		if(cmptrees(temp, list[k])) {
X			list[top] = list[k];
X			list[k] = temp;
X			/* Make the changed list a heap */
X			adjust(list, k, bottom); /*recursive*/
X		}
X	}
X}
X
X/* Compare two trees, if a > b return true, else return false
X * note comparison rules in previous comments.
X */
X
Xchar	/* Boolean */
Xcmptrees(a, b)
Xint a, b;	/* root nodes of trees */
X{
X	if(node[a].weight > node[b].weight)
X		return TRUE;
X	if(node[a].weight == node[b].weight)
X		if(node[a].tdepth > node[b].tdepth)
X			return TRUE;
X	return FALSE;
X}
X/*  */
X/* HUFFMAN ALGORITHM: develops the single element trees
X * into a single binary tree by forming subtrees rooted in
X * interior nodes having weights equal to the sum of weights of all
X * their descendents and having depth counts indicating the
X * depth of their longest paths.
X *
X * When all trees have been formed into a single tree satisfying
X * the heap property (on weight, with depth as a tie breaker)
X * then the binary code assigned to a leaf (value to be encoded)
X * is then the series of left (0) and right (1)
X * paths leading from the root to the leaf.
X * Note that trees are removed from the heaped list by
X * moving the last element over the top element and
X * reheaping the shorter list.
X */
X
Xbld_tree(list, len)
Xint list[];
Xint len;
X{
X	int freenode;		/* next free node in tree */
X	int lch, rch;		/* temporaries for left, right children */
X	struct nd *frnp;	/* free node pointer */
X	int i;
X	char maxchar();
X
X	/* Initialize index to next available (non-leaf) node.
X	 * Lower numbered nodes correspond to leaves (data values).
X	 */
X	freenode = NUMVALS;
X
X	while(len > 1) {
X		/* Take from list two btrees with least weight
X		 * and build an interior node pointing to them.
X		 * This forms a new tree.
X		 */
X		lch = list[0];	/* This one will be left child */
X
X		/* delete top (least) tree from the list of trees */
X		list[0] = list[--len];
X		adjust(list, 0, len - 1);
X
X		/* Take new top (least) tree. Reuse list slot later */
X		rch = list[0];	/* This one will be right child */
X
X		/* Form new tree from the two least trees using
X		 * a free node as root. Put the new tree in the list.
X		 */
X		frnp = &node[freenode];	/* address of next free node */
X		list[0] = freenode++;	/* put at top for now */
X		frnp->lchild = lch;
X		frnp->rchild = rch;
X		frnp->weight = node[lch].weight + node[rch].weight;
X		frnp->tdepth = 1 + maxchar(node[lch].tdepth, node[rch].tdepth);
X		/* reheap list  to get least tree at top*/
X		adjust(list, 0, len - 1);
X	}
X	dctreehd = list[0];	/*head of final tree */
X}
X/*  */
Xchar
Xmaxchar(a, b)
Xchar a, b;
X{
X	return a > b ? a : b;
X}
X/* Initialize all nodes to single element binary trees
X * with zero weight and depth.
X */
X
Xinit_tree()
X{
X	int i;
X
X	for(i = 0; i < NUMNODES; ++i) {
X		node[i].weight = 0;
X		node[i].tdepth = 0;
X		node[i].lchild = NOCHILD;
X		node[i].rchild = NOCHILD;
X	}
X}
X
Xinit_enc()
X{
X	int i;
X
X	/* Initialize encoding table */
X	for(i = 0; i < NUMVALS; ++i) {
X		codelen[i] = 0;
X	}
X}
X/*  */
X/* Recursive routine to walk the indicated subtree and level
X * and maintain the current path code in bstree. When a leaf
X * is found the entire code string and length are put into
X * the encoding table entry for the leaf's data value.
X *
X * Returns ERROR if codes are too long.
X */
X
Xint		/* returns ERROR or NULL */
Xbuildenc(level, root)
Xint level;/* level of tree being examined, from zero */
Xint root; /* root of subtree is also data value if leaf */
X{
X	int l, r;
X
X	l = node[root].lchild;
X	r = node[root].rchild;
X#ifdef DEBUG
X	if (debug) printf("level=%d,root=%d,l=%d,r=%d,tcode=%04x\n",level,root,l,r,tcode);
X#endif
X	if( l == NOCHILD && r == NOCHILD) {
X		/* Leaf. Previous path determines bit string
X		 * code of length level (bits 0 to level - 1).
X		 * Ensures unused code bits are zero.
X		 */
X		codelen[root] = level;
X		code[root] = tcode & ((unsigned int)(~0) >> ((sizeof(int) * 8) - level));
X#ifdef DEBUG
X		if (debug) printf("  codelen[%d]=%d,code[%d]=%02x\n",root,codelen[root],root,code[root]);
X#endif
X		return (level > 16) ? ERROR : NULL;
X	} else {
X		if( l != NOCHILD) {
X			/* Clear path bit and continue deeper */
X			tcode &= ~(1 << level);
X			/* NOTE RECURSION */
X			if(buildenc(level + 1, l) == ERROR)
X				return ERROR;
X		}
X		if(r != NOCHILD) {
X			/* Set path bit and continue deeper */
X			tcode |= 1 << level;
X			/* NOTE RECURSION */
X			if(buildenc(level + 1, r) == ERROR)
X				return ERROR;
X		}
X	}
X	return NULL;	/* if we got here we're ok so far */
X}
X/*  */
X/* Write out the header of the compressed file */
X
Xwrt_head(ob, infile)
XFILE *ob;
Xchar *infile;	/* input file name (w/ or w/o drive) */
X{
X	int i, k, l, r;
X	int numnodes;		/* nbr of nodes in simplified tree */
X
X	putwe(RECOGNIZE, ob);	/* identifies as compressed */
X	putwe(crc, ob);		/* unsigned sum of original data */
X
X	/* Record the original file name w/o drive */
X	if(*(infile + 1) == ':')
X		infile += 2;	/* skip drive */
X
X	do {
X		putce(*infile, ob);
X	} while(*(infile++) != '\0');
X
X
X	/* Write out a simplified decoding tree. Only the interior
X	 * nodes are written. When a child is a leaf index
X	 * (representing a data value) it is recoded as
X	 * -(index + 1) to distinguish it from interior indexes
X	 * which are recoded as positive indexes in the new tree.
X	 * Note that this tree will be empty for an empty file.
X	 */
X
X	numnodes = dctreehd < NUMVALS ? 0 : dctreehd - (NUMVALS -1);
X	putwe(numnodes, ob);
X
X	for(k = 0, i = dctreehd; k < numnodes; ++k, --i) {
X		l = node[i].lchild;
X		r = node[i].rchild;
X		l = l < NUMVALS ? -(l + 1) : dctreehd - l;
X		r = r < NUMVALS ? -(r + 1) : dctreehd - r;
X		putwe(l, ob);	/* left child */
X		putwe(r, ob);	/* right child */
X	}
X}
X/*  */
X/* Get an encoded byte or EOF. Reads from specified stream AS NEEDED.
X *
X * There are two unsynchronized bit-byte relationships here.
X * The input stream bytes are converted to bit strings of
X * various lengths via the static variables named c...
X * These bit strings are concatenated without padding to
X * become the stream of encoded result bytes, which this
X * function returns one at a time. The EOF (end of file) is
X * converted to SPEOF for convenience and encoded like any
X * other input value. True EOF is returned after that.
X *
X * The original gethuff() called a seperate function,
X * getbit(), but that more readable version was too slow.
X */
X
Xint				/*  Returns byte values except for EOF */
Xgethuff(ib)
XFILE *ib;
X{
X	unsigned int rbyte;	/* Result byte value */
X	char need, take;	/* numbers of bits */
X
X	rbyte = 0;
X	need = 8;		/* build one byte per call */
X
X	/* Loop to build a byte of encoded data
X	 * Initialization forces read the first time
X	 */
X
Xloop:
X	if(cbitsrem >= need) {
X		/* Current code fullfills our needs */
X		if(need == 0)
X			return rbyte & 0xff;
X		/* Take what we need */
X 		rbyte |= ccode << (8 - need);
X		/* And leave the rest */
X		ccode >>= need;
X		cbitsrem -= need;
X		return rbyte & 0xff;
X	}
X
X	/* We need more than current code */
X	if(cbitsrem > 0) {
X		/* Take what there is */
X		rbyte |= ccode << (8 - need);
X		need -= cbitsrem;
X	}
X	/* No more bits in current code string */
X	if(curin == SPEOF) {
X		/* The end of file token has been encoded. If
X		 * result byte has data return it and do EOF next time
X		 */
X		cbitsrem = 0;
X
X		return (need == 8) ? EOF : rbyte & 0xff;
X	}
X
X	/* Get an input byte */
X	if((curin = getcnr(ib)) == EOF)
X		curin = SPEOF;	/* convenient for encoding */
X
X	/* Get the new byte's code */
X	ccode = code[curin];
X	cbitsrem = codelen[curin];
X
X	goto loop;
X}
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > usq.c
X/* Program to unsqueeze files formed by sq.com
X *
X * Useage:
X *
X *	usq [-count] [-fcount] [file1] [file2] ... [filen]
X *
X * where file1 through filen represent one or more files to be compressed,
X * and the following options may be specified:
X *
X *	-count		Previewing feature: redirects output
X * 			files to standard output with parity stripped
X *			and unprintables except CR, LF, TAB and  FF
X *			converted to periods. Limits each file
X *			to first count lines.
X *			Defaults to console, but see below how
X *			to capture all in one file for further
X *			processing, such as by PIP.
X *			Count defaults to a very high value.
X *			No CRC check is performed when previewing.
X *			Use drive: to cancel this.
X *
X *	-fcount		Same as -count except formfeed
X *			appended to preview of each file.
X *			Example: -f10.
X *
X * If no such items are given on the command line you will be
X * prompted for commands (one at a time). An empty command
X * terminates the program.
X *
X * The unsqueezed file name is recorded in the squeezed file.
X * 
X */
X/* CHANGE HISTORY:
X * 1.3	Close inbuff to avoid exceeding maximum number of
X *	open files. Includes rearranging error exits.
X * 1.4	Add -count option to allow quick inspection of files.
X * 1.5  Break up long lines of introductory text
X * 1.5  -count no longer appends formfeed to preview of each file.
X *	-fcount (-f10, -F10) does append formfeed.
X * 1.6  Modified to work correctly under MP/M II (DIO.C change) and
X *      signon message shortened.
X * 2.0	Modified to work with CI-C86 compiler (CP/M-86 and MS-DOS)
X * 2.1  Modified for use in MLINK
X * 2.2  Modified for use with optimizing CI-C86 compiler (MS-DOS)
X * 3.0  Generalized for use under UNIX
X */
X
X/* ejecteject */
X
X#define SQMAIN
X
X#include <stdio.h>
X#include "sqcom.h"
X#include "usq.h"
X#define VERSION "3.1   12/19/84"
X
X/* This must follow all include files */
Xunsigned int dispcnt;	/* How much of each file to preview */
Xchar	ffflag;		/* should formfeed separate preview from different files */
X
Xmain(argc, argv)
Xint argc;
Xchar *argv[];
X{
X	int i,c;
X	char inparg[16];	/* parameter from input */
X
X	dispcnt = 0;	/* Not in preview mode */
X
X	printf("File unsqueezer version %s (original author: R. Greenlaw)\n\n", VERSION);
X
X	/* Process the parameters in order */
X	for(i = 1; i < argc; ++i)
X		obey(argv[i]);
X
X	if(argc < 2) {
X		printf("Enter file names, one line at a time, or type <RETURN> to quit.");
X		do {
X			printf("\n*");
X			for(i = 0; i < 16; ++i) {
X				if((c = getchar()) == EOF)
X					c = '\n';	/* force empty (exit) command */
X				if((inparg[i] = c) == '\n') {
X					inparg[i] = '\0';
X					break;
X				}
X			}
X			if(inparg[0] != '\0')
X				obey(inparg);
X		} while(inparg[0] != '\0');
X	}
X}
X
X/* ejecteject */
X
Xobey(p)
Xchar *p;
X{
X	char *q, cc;
X
X	if(*p == '-') {
X		if(ffflag = ((*(p+1) == 'F') || (*(p+1) == 'f')))
X			++p;
X		/* Set number of lines of each file to view */
X		dispcnt = 65535;	/* default */
X		if(*(p+1))
X			if((dispcnt = atoi(p + 1)) == 0)
X				printf("\nBAD COUNT %s", p + 1);
X		return;
X	}	
X
X	/* Check for ambiguous (wild-card) name */
X	for(q = p; *q != '\0'; ++q)
X		if(*q == '*' || *q == '?') {
X			printf("\nCan't accept ambiguous name %s", p);
X			return;
X		}
X
X	unsqueeze(p);
X}
X
X/* ejecteject */
X
Xunsqueeze(infile)
Xchar *infile;
X{
X	FILE *inbuff, *outbuff;	/* file buffers */
X	int i, c;
X	char cc;
X
X	char *p;
X	unsigned int filecrc;	/* checksum */
X	int numnodes;		/* size of decoding tree */
X	char outfile[128];	/* output file name */
X	unsigned int linect;	/* count of number of lines previewed */
X	char obuf[128];		/* output buffer */
X	int oblen;		/* length of output buffer */
X	static char errmsg[] = "ERROR - write failure in %s\n";
X
X	if(!(inbuff=fopen(infile, "rb"))) {
X		printf("Can't open %s\n", infile);
X		return;
X	}
X	/* Initialization */
X	linect = 0;
X	crc = 0;
X	init_cr();
X	init_huff();
X
X	/* Process header */
X	if(getx16(inbuff) != RECOGNIZE) {
X		printf("%s is not a squeezed file\n", infile);
X		goto closein;
X	}
X
X	filecrc = getw16(inbuff);
X
X	/* Get original file name */
X	p = outfile;			/* send it to array */
X	do {
X		*p = getc(inbuff);
X	} while(*p++ != '\0');
X
X	printf("%s -> %s: ", infile, outfile);
X
X
X	numnodes = getw16(inbuff);
X
X	if(numnodes < 0 || numnodes >= NUMVALS) {
X		printf("%s has invalid decode tree size\n", infile);
X		goto closein;
X	}
X
X	/* Initialize for possible empty tree (SPEOF only) */
X	dnode[0].children[0] = -(SPEOF + 1);
X	dnode[0].children[1] = -(SPEOF + 1);
X
X	/* Get decoding tree from file */
X	for(i = 0; i < numnodes; ++i) {
X		dnode[i].children[0] = getw16(inbuff);
X		dnode[i].children[1] = getw16(inbuff);
X	}
X
X	if(dispcnt) {
X		/* Use standard output for previewing */
X		putchar('\n');
X		while(((c = getcr(inbuff)) != EOF) && (linect < dispcnt)) {
X			cc = 0x7f & c;	/* strip parity */
X			if((cc < ' ') || (cc > '~'))
X				/* Unprintable */
X				switch(cc) {
X				case '\r':	/* return */
X					/* newline will generate CR-LF */
X					goto next;
X				case '\n':	/* newline */
X					++linect;
X				case '\f':	/* formfeed */
X				case '\t':	/* tab */
X					break;
X				default:
X					cc = '.';
X				}
X			putchar(cc);
X		next: ;
X		}
X		if(ffflag)
X			putchar('\f');	/* formfeed */
X	} else {
X		/* Create output file */
X		if(!(outbuff=fopen(outfile, "wb"))) {
X			printf("Can't create %s\n", outfile);
X			goto closeall;
X		}
X		printf("unsqueezing,");
X		/* Get translated output bytes and write file */
X		oblen = 0;
X		while((c = getcr(inbuff)) != EOF) {
X			crc += c;
X			obuf[oblen++] = c;
X			if (oblen >= sizeof(obuf)) {
X				if(!fwrite(obuf, sizeof(obuf), 1, outbuff)) {
X					printf(errmsg, outfile);
X					goto closeall;
X				}
X				oblen = 0;
X			}
X		}
X		if (oblen && !fwrite(obuf, oblen, 1, outbuff)) {
X			printf(errmsg, outfile);
X			goto closeall;
X		}
X
X		if((filecrc && 0xFFFF) != (crc && 0xFFFF))
X			printf("ERROR - checksum error in %s\n", outfile);
X		else	printf(" done.\n");
X
X	closeall:
X		fclose(outbuff);
X	}
X
Xclosein:
X	fclose(inbuff);
X}
X
Xgetw16(iob)			/* get 16-bit word from file */
XFILE *iob;
X{
Xint temp;
X
Xtemp = getc(iob);		/* get low order byte */
Xtemp |= getc(iob) << 8;
Xif (temp & 0x8000) temp |= (~0) << 15;	/* propogate sign for big ints */
Xreturn temp;
X
X}
X
X
Xgetx16(iob)			/* get 16-bit (unsigned) word from file */
XFILE *iob;
X{
Xint temp;
X
Xtemp = getc(iob);		/* get low order byte */
Xreturn temp | (getc(iob) << 8);
X
X}
X
X
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > usq.h
X#define LARGE 30000
X
X/* Decoding tree */
XEXTERN struct {
X	int children[2];	/* left, right */
X} dnode[NUMVALS - 1];
X
XEXTERN int bpos;	/* last bit position read */
XEXTERN int curin;	/* last byte value read */
X
X/* Variables associated with repetition decoding */
XEXTERN int repct;	/*Number of times to retirn value*/
XEXTERN int value;	/*current byte value or EOF */
SHAR_EOF
sed 's/^X//' << 'SHAR_EOF' > utr.c
X#include <stdio.h>
X#include "sqcom.h"
X#include "usq.h"
X#define ERROR -1
X
X/* initialize decoding functions */
X
Xinit_cr()
X{
X	repct = 0;
X}
X
Xinit_huff()
X{
X	bpos = 99;	/* force initial read */
X}
X
X/* Get bytes with decoding - this decodes repetition,
X * calls getuhuff to decode file stream into byte
X * level code with only repetition encoding.
X *
X * The code is simple passing through of bytes except
X * that DLE is encoded as DLE-zero and other values
X * repeated more than twice are encoded as value-DLE-count.
X */
X
Xint
Xgetcr(ib)
XFILE *ib;
X{
X	int c;
X
X	if(repct > 0) {
X		/* Expanding a repeated char */
X		--repct;
X		return value;
X	} else {
X		/* Nothing unusual */
X		if((c = getuhuff(ib)) != DLE) {
X			/* It's not the special delimiter */
X			value = c;
X			if(value == EOF)
X				repct = LARGE;
X			return value;
X		} else {
X			/* Special token */
X			if((repct = getuhuff(ib)) == 0)
X				/* DLE, zero represents DLE */
X				return DLE;
X			else {
X				/* Begin expanding repetition */
X				repct -= 2;	/* 2nd time */
X				return value;
X			}
X		}
X	}
X}
X
X/* ejecteject */
X
X/* Decode file stream into a byte level code with only
X * repetition encoding remaining.
X */
X
Xint
Xgetuhuff(ib)
XFILE *ib;
X{
X	int i;
X	int bitval;
X
X	/* Follow bit stream in tree to a leaf*/
X	i = 0;	/* Start at root of tree */
X	do {
X		if(++bpos > 7) {
X			if((curin = getc(ib)) == ERROR)
X				return ERROR;
X			bpos = 0;
X			/* move a level deeper in tree */
X			i = dnode[i].children[1 & curin];
X		} else
X			i = dnode[i].children[1 & (curin >>= 1)];
X	} while(i >= 0);
X
X	/* Decode fake node index to original data value */
X	i = -(i + 1);
X	/* Decode special endfile token to normal EOF */
X	i = (i == SPEOF) ? EOF : i;
X	return i;
X}
SHAR_EOF
exit


-- 
Bill Swan 	{ihnp4|decvax|allegra|...}!uw-beaver!tikal!persci!bill