[comp.graphics] LZW Summary

bangrazi@ctron.com (Anthony Bangrazi) (06/12/91)

Well, folks, thanks for all the responses. I found out a lot of people were
also interested in LZW decompression algorithms. Here's what I've got for you:

Sam Leffler has a really great program, xtiff, that displays a TIFF image in an
X window. It is really good at reading all kinds of TIFF images. You can FTP it
from:
	    okeeffe.berkeley.edu
	    retrieve: pub/v2.4.tar.Z
	    decompress v2.4.tar.Z
	    tar xvf v2.4.tar

There are images available as well. The source code is included. I found it
somewhat difficult to follow, but it is VERY complete. Thanks, Sam!

Dave Hertweck (hertweck@ucunix.san.uc.EDU) sent alng the following code:

/*----------------------------------------------------------------------*/
/* Copyright (c) 1987                                                   */
/* by CompuServe Inc., Columbus, Ohio.  All Rights Reserved             */
/*----------------------------------------------------------------------*/

/*
 * ABSTRACT:
 *      The compression algorithm builds a string translation table that maps
 *      substrings from the input string into fixed-length codes.  These codes
 *      are used by the expansion algorithm to rebuild the compressor's table
 *      and reconstruct the original data stream.  In it's simplest form, the
 *      algorithm can be stated as:
 *
 *              "if <w>k is in the table, then <w> is in the table"
 *
 *      <w> is a code which represents a string in the table.  When a new
 *      character k is read in, the table is searched for <w>k.  If this
 *      combination is found, <w> is set to the code for that combination
 *      and the next character is read in.  Otherwise, this combination is
 *      added to the table, the code <w> is written to the output stream and
 *      <w> is set to k.
 *
 *      The expansion algorithm builds an identical table by parsing each
 *      received code into a prefix string and suffix character.  The suffix
 *      character is pushed onto the stack and the prefix string translated
 *      again until it is a single character.  This completes the expansion.
 *      The expanded code is then output by popping the stack and a new entry
 *      is made in the table.
 *
 *      The algorithm used here has one additional feature.  The output codes
 *      are variable length.  They start at a specified number of bits.  Once
 *      the number of codes exceeds the current code size, the number of bits
 *      in the code is incremented.  When the table is completely full, a
 *      clear code is transmitted for the expander and the table is reset.
 *      This program uses a maximum code size of 12 bits for a total of 4096
 *      codes.
 *
 *      The expander realizes that the code size is changing when it's table
 *      size reaches the maximum for the current code size.  At this point,
 *      the code size in increased.  Remember that the expander's table is
 *      identical to the compressor's table at any point in the original data
 *      stream.
 *
 *      The compressed data stream is structured as follows:
 *              first byte denoting the minimum code size
 *              one or more counted byte strings. The first byte contains the
 *              length of the string. A null string denotes "end of data"
 *
 *      This format permits a compressed data stream to be embedded within a
 *      non-compressed context.
 *
 * AUTHOR: Steve Wilhite
 *
 * REVISION HISTORY:
 *
 */

#include <setjmp.h>
#include <alloc.h>                      /* MSC 4.0 */
#include "gif.h"

#define GetMem          malloc
#define FreeMem         free

#define far_null_ptr    0L
#define null_ptr        0
#define largest_code    4095
#define table_size      5003

jmp_buf recover;

struct code_entry
    {
    short prior_code;
    short code_id;
    unsigned char added_char;
    };

static short code_size;
static short clear_code;
static short eof_code;
static short min_code;
static short bit_offset;
static short byte_offset, bits_left;
static short max_code;
static short free_code;
static short prefix_code;
static short suffix_char;
static short hx, d;
static short (*get_byte)(),(*put_byte)();

static unsigned char code_buffer[256+3];
static struct code_entry far *code_table;

static void init_table(min_code_size)
    short min_code_size;
    {
    short i;

    code_size = min_code_size + 1;
    clear_code = 1 << min_code_size;
    eof_code = clear_code + 1;
    free_code = clear_code + 2;
    max_code = 1 << code_size;

    for (i = 0; i < table_size; i++)
        code_table[i].code_id = 0;
    }


static void flush(n)
    short n;
    {
    short i, status;

    status = (*put_byte)(n);

    if (status != 0)
        longjmp(recover, status);

    for (i = 0; i < n; i++)
        {
        status = (*put_byte)(code_buffer[i]);

        if (status != 0)
            longjmp(recover, status);
        }
    }


static void write_code(code)
    short code;
    {
    long temp;

    byte_offset = bit_offset >> 3;
    bits_left = bit_offset & 7;

    if (byte_offset >= 254)
        {
        flush(byte_offset);
        code_buffer[0] = code_buffer[byte_offset];
        bit_offset = bits_left;
        byte_offset = 0;
        }

    if (bits_left > 0)
        {
        temp = ((long) code << bits_left) | code_buffer[byte_offset];
        code_buffer[byte_offset] = temp;
        code_buffer[byte_offset + 1] = temp >> 8;
        code_buffer[byte_offset + 2] = temp >> 16;
        }
    else
        {
        code_buffer[byte_offset] = code;
        code_buffer[byte_offset + 1] = code >> 8;
        }

    bit_offset += code_size;
    }


short Compress_Data(min_code_size, get_byte_routine, put_byte_routine)
/*
 * Function:
 *      Compress a stream of data bytes using the LZW algorithm.
 *
 * Inputs:
 *      min_code_size
 *              the field size of an input value.  Should be in the range from
 *              1 to 9.
 *
 *      get_byte_routine
 *              address of the caller's "get_byte" routine:
 *
 *              status = get_byte();
 *
 *              where "status" is
 *                      0 .. 255        the byte just read
 *                      -1              logical end-of-file
 *                      < -3            error codes
 *
 *      put_byte_routine
 *              address the the caller's "put_byte" routine:
 *
 *              status = put_byte(value)
 *
 *              where
 *                      value   the byte to write
 *                      status  0 = OK, else < -3 means some error code
 *
 * Returns:
 *       0      normal completion
 *      -1      (not used)
 *      -2      insufficient dynamic memory
 *      -3      bad "min_code_size"
 *      < -3    error status from either the get_byte or put_byte routine
 */
    short (*get_byte_routine)();
    short (*put_byte_routine)();
    {
    short status;

    get_byte = get_byte_routine;
    put_byte = put_byte_routine;

    if (min_code_size < 2 || min_code_size > 9)
        if (min_code_size == 1)
            min_code_size = 2;
        else
            return -3;

    code_table = (struct code_entry far *)
        GetMem(sizeof(struct code_entry)*table_size);

    if (code_table == null_ptr)
        return -2;

    status = setjmp(recover);

    if (status != 0)
        {
        FreeMem((char *) code_table);
        return status;
        }

    (*put_byte)(min_code_size);         /* record the minimum code size */
    bit_offset = 0;
    init_table(min_code_size);
    write_code(clear_code);
    suffix_char = (*get_byte)();

    if (suffix_char >= 0)
        {
        prefix_code = suffix_char;

        while ((suffix_char = (*get_byte)()) >= 0)
            {
            hx = (prefix_code ^ suffix_char << 5) % table_size;
            d = 1;

            for (;;)
                {
                if (code_table[hx].code_id == 0)
                    {
                    write_code(prefix_code);

                    d = free_code;

                    if (free_code <= largest_code)
                        {
                        code_table[hx].prior_code = prefix_code;
                        code_table[hx].added_char = suffix_char;
                        code_table[hx].code_id = free_code;
                        free_code++;
                        }

                    if (d == max_code)
                        if (code_size < 12)
                            {
                            code_size++;
                            max_code <<= 1;
                            }
                        else
                            {
                            write_code(clear_code);
                            init_table(min_code_size);
                            }

                    prefix_code = suffix_char;
                    break;
                    }

                if (code_table[hx].prior_code == prefix_code &&
                    code_table[hx].added_char == suffix_char)
                    {
                    prefix_code = code_table[hx].code_id;
                    break;
                    }

                hx += d;
                d += 2;
                if (hx >= table_size)
                    hx -= table_size;
                }
            }

        if (suffix_char != -1)
            longjmp(recover, suffix_char);

        write_code(prefix_code);
        }
    else if (suffix_char != -1)
        longjmp(recover, suffix_char);

    write_code(eof_code);

    /* Make sure the code buffer is flushed */

    if (bit_offset > 0)
        flush((bit_offset + 7)/8);

    flush(0);                           /* end-of-data */
    FreeMem((char *) code_table);
    return 0;
   }


/* ********************************************************************
   file 2   expand.c
*/
/*----------------------------------------------------------------------*/
/* Copyright (c) 1987                                                   */
/* by CompuServe Inc., Columbus, Ohio.  All Rights Reserved             */
/*----------------------------------------------------------------------*/

/*
 * ABSTRACT:
 *      Please see COMPRESS.C
 *
 * AUTHOR: Steve Wilhite
 *
 * REVISION HISTORY:
 *
 */

#include <malloc.h>
#include <setjmp.h>

#define null_ptr        0
#define largest_code    4095

jmp_buf recover;

struct code_entry
    {
    short prefix;                       /* prefix code */
    char suffix;                        /* suffix character */
    char stack;
    };

static short code_size;
static short clear_code;
static short eof_code;
static short first_free;
static short bit_offset;
static short byte_offset, bits_left;
static short max_code;
static short free_code;
static short old_code;
static short input_code;
static short code;
static short suffix_char;
static short final_char;
static short bytes_unread;
static unsigned char code_buffer[64];
static struct code_entry *code_table;
static short (*get_byte)();
static short (*put_byte)();

static short mask[12] =
    {0x001, 0x003, 0x007, 0x00F,
     0x01F, 0x03F, 0x07F, 0x0FF,
     0x1FF, 0x3FF, 0x7FF, 0xFFF};


static init_table(min_code_size)
    short min_code_size;
    {
    code_size = min_code_size + 1;
    clear_code = 1 << min_code_size;
    eof_code = clear_code + 1;
    first_free = clear_code + 2;
    free_code = first_free;
    max_code = 1 << code_size;
    }

static read_code()
    {
    short bytes_to_move, i, ch;
    long temp;

    byte_offset = bit_offset >> 3;
    bits_left = bit_offset & 7;

    if (byte_offset >= 61)
        {
        bytes_to_move = 64 - byte_offset;

        for (i = 0; i < bytes_to_move; i++)
            code_buffer[i] = code_buffer[byte_offset + i];

        while (i < 64)
            {
            if (bytes_unread == 0)
                {
                /* Get the length of the next record. A zero-length record
                 * denotes "end of data".
                 */

                bytes_unread = (*get_byte)();

                if (bytes_unread < 1)
                    if (bytes_unread == 0)      /* end of data */
                        break;
                    else
                        longjmp(recover, bytes_unread);
                }

            ch = (*get_byte)();

            if (ch < 0)
                longjmp(recover, ch);

            code_buffer[i++] = ch;
            bytes_unread--;
            }

        bit_offset = bits_left;
        byte_offset = 0;
        }

    bit_offset += code_size;
    temp = (long) code_buffer[byte_offset]
        | (long) code_buffer[byte_offset + 1] << 8
        | (long) code_buffer[byte_offset + 2] << 16;

    if (bits_left > 0)
        temp >>= bits_left;

    return temp & mask[code_size - 1];
    }

short Expand_Data(get_byte_routine, put_byte_routine)
/*
 * Function:
 *      Decompress a LZW compressed data stream.
 *
 * Inputs:
 *      get_byte_routine - address of the caller's "get_byte" routine.
 *      put_byte_routine - address of the caller's "put_byte" routine.
 *
 * Returns:
 *      0       OK
 *      -1      expected end-of-file
 *      -2      cannot allocate memory
 *      -3      bad "min_code_size"
 *      < -3    error status from the get_byte or put_byte routine
 */
    short (*get_byte_routine)();
    short (*put_byte_routine)();
    {
    short i;
    short sp;                           /* stack ptr */
    short status;
    short min_code_size;

    get_byte = get_byte_routine;
    put_byte = put_byte_routine;

    code_table = (struct code_entry *)
        malloc(sizeof(struct code_entry)*(largest_code + 1));

    if (code_table == null_ptr)
        return -2;

    status = setjmp(recover);

    if (status != 0)
        {
        free((char *) code_table);
        return status;
        }

    /* Get the minimum code size (2 to 9) */

    min_code_size = (*get_byte)();

    if (min_code_size < 0)
        longjmp(recover, min_code_size);
    else if (min_code_size < 2 || min_code_size > 9)
        longjmp(recover, -3);

    init_table(min_code_size);
    sp = 0;
    bit_offset = 64*8;                  /* force "read_code" to start a new */
    bytes_unread = 0;                   /* record */

    while ((code = read_code()) != eof_code)
        {
        if (code == clear_code)
            {
            init_table(min_code_size);
            code = read_code();
            old_code = code;
            suffix_char = code;
            final_char = code;
            if ((status = (*put_byte)(suffix_char)) != 0)
                longjmp(recover, status);
            }
        else
            {
            input_code = code;

            if (code >= free_code)
                {
                code = old_code;
                code_table[sp++].stack = final_char;
                }

            while (code >= first_free)
                {
                code_table[sp++].stack = code_table[code].suffix;
                code = code_table[code].prefix;
                }

            final_char = code;
            suffix_char = code;
            code_table[sp++].stack = final_char;

            while (sp > 0)
                if ((status = (*put_byte)(code_table[--sp].stack)) != 0)
                    longjmp(recover, status);

            code_table[free_code].suffix = suffix_char;
            code_table[free_code].prefix = old_code;
            free_code++;
            old_code = input_code;

            if (free_code >= max_code)
                if (code_size < 12)
                    {
                    code_size++;
                    max_code <<= 1;
                    }
            }
        }

    free((char *) code_table);
    return 0;
    }

*****************************************************************************

Bill Heelan <wheelan@quiche.cs.mcgill.ca> sent along the following note:

> Hi,
>    I have a program I wrote to display GIFs, if you want the source I can make
>    it ftp'able.  Another alternative is to use the code from 'xgif'.

*****************************************************************************

I hope this helps out. Thanks to all the people that helped put this together.

Anthony Bangrazi
bangrazi@ctron.com
Cabletron Systems, Inc.
Rochester, NH

samkho@athena.mit.edu (Samuel P Kho) (06/13/91)

In article <1642@balrog.ctron.com> bangrazi@ctron.com writes:
>Sam Leffler has a really great program, xtiff, that displays a TIFF image in an
>X window. It is really good at reading all kinds of TIFF images. You can FTP it
>from:
>	    okeeffe.berkeley.edu
>	    retrieve: pub/v2.4.tar.Z
>	    decompress v2.4.tar.Z
>	    tar xvf v2.4.tar
>
>There are images available as well. The source code is included. I found it

i tried ftping the stuff.  there were only two files left in /pub, namely,
tiff.tar.Z and tiffpics.tar.Z.  however, one is 53 bytes long, the other
57 bytes.  i did ftp in binary mode.  something seems to be wrong.  anyone
know where v2.4.tar.Z are now?

thanx in advance.

	Samuel P. Kho
	samkho@athena.mit.edu