[comp.compression] mailable binary

ariel@prime.com (Rob Ullmann) (03/27/91)

The following is the present draft of a one-pass compression and
encoding of an arbitrary (octet-aligned) binary object, to make
it mailable. It is a combination of LZJU90, xxencode, and a CRC
check.

Please direct comments to both co-authors.

Rob Ullmann



Network Working Group                                R. Jung, R. Ullmann
Request for Comments: DRAFT                         Prime Computer, Inc.
                                                            January 1991






              LZJU90: Compressed Encoding for Binary Mail

1. Status of this Memo

This memo describes an encoding [1] for a binary object to be sent in an
Internet mail message. The encoding provides both compression and
representation in a text format that will successfully survive
transmission through the many different mailers and gateways that
comprise the Internet and connected mail networks.

Distribution of this memo is unlimited.

2. Introduction

The encoding first compresses the binary object, using a modified LZ77
algorithm, called LZJU90. It then encodes each 6 bits of the output of
the compression as a text character, using a character set chosen to
survive any translations between codes, such as ASCII to EBCDIC.

The 64 six-bit strings 000000 through 111111 are represented by the
characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z".

The output text begins with a line identifying the encoding. This is for
visual reference only, the Encoding: field in the header identifies the
section to the user program. It also names the object that was encoded,
usually by a file name. The format of this line is:

    * LZJU90 <name>

where <name> is optional. For example:

    * LZJU90 vmunix

This is followed by the compressed and encoded data, broken into lines
where convenient. It is recommended that lines be broken every 78
characters, to survive mailers than restrict line length. The decoder
must accept lines with 1 to 1000 characters on each line.

After this, there is one final line that gives the number of bytes in
the original data and a CRC of the original data. This should match the
byte count and CRC found during decompression. This line has the format:

    * <count> <CRC>



Jung, Ullmann                                                   [Page 1]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


where <count> is a decimal number, and CRC is 8 hexadecimal digits.  For
example:

    * 4128076 5AC2D50E

The count used in the Encoding: field in the message header is the total
number of lines, including the start and end lines that begin with *.

A complete example is given in section 6.

3. Specification of the LZJU90 compression

This data compression specification uses the Lempel-Ziv-Storer-Szymanski
model of mixing pointers and literal characters.

The data compression is defined by the decoding algorithm. Any encoder
that emits symbols which cause the decoder to produce the original input
is defined to be valid.

There are many possible strategies for the maximal-string matching that
the encoder does, section 5 gives the code for one such algorithm.
Regardless of which algorithm is used, and what tradeoffs are made
between compression ratio and execution speed or space, the result can
always be decoded by the simple decoder.

The compressed data consists of a mixture of unencoded literal
characters and copy pointers which point to an earlier occurrence of the
string to be encoded.

Compressed data contains two types of codewords:

LITERAL              pass the literal directly to the uncompressed
                     output

COPY length, offset  go back offset characters in the output and copy
                     length characters forward to the current position.

To distinguish between codewords, the copy length is used.  A copy
length of zero indicates that the following codeword is a literal
codeword.  A copy length greater than zero indicates that the following
codeword is a copy codeword.

To improve copy length encoding, a threshold value of 2 has been
subtracted from the original copy length for copy codewords, because the
minimum copy length is 3 in this compression scheme.

The maximum offset value is set at 32255.  Larger offsets offer
extremely low improvements in compression (less than 1 percent,
typically).

No special encoding is done on the LITERAL characters.  However, unary

Jung, Ullmann                                                   [Page 2]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


encoding is used for the copy length and copy offset values to improve
compression.  A start-step-stop unary code is used.

A (start, step, stop) unary code of the integers is defined as follows:
The Nth codeword has N ones followed by a zero followed by a field of
size START + (N * STEP).  If the field width is equal to STOP then the
preceding zero can be omitted.  The integers are laid out sequentially
through these codewords.  For example, (0, 1, 4) would look like:

    Codeword      Range

    0             0
    10x           1-2
    110xx         3-6
    1110xxx       7-14
    1111xxxx     15-30


Below are the actual values used for copy length and copy offset:

The copy length is encoded with a (0, 1, 7) code leading to a maximum
copy length of 256 by including the THRESHOLD value of 2.

    Codeword         Range

    0                 0
    10x               3-4
    110xx             5-8
    1110xxx           9-16
    11110xxxx        17-32
    111110xxxxx      33-64
    1111110xxxxxx    65-128
    1111111xxxxxxx  129-256



The copy offset is encoded with a (9, 1, 14) code leading to a maximum
copy offset of 32255.  Offset 0 is reserved as an end of compressed data
flag.

    Codeword               Range

    0xxxxxxxxx               0-511
    10xxxxxxxxxx           512-1535
    110xxxxxxxxxxx        1536-3583
    1110xxxxxxxxxxxx      3485-7679
    11110xxxxxxxxxxxxx    7680-15871
    11111xxxxxxxxxxxxxx  15872-32255




Jung, Ullmann                                                   [Page 3]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


The 0 has been chosen to signal the start of the field for ease of
encoding.

The stop values are useful in the encoding to prevent out of range
values for the lengths and offsets, as well as shortening some codes by
one bit.

The worst case compression using this scheme is a 1/8 increase in size
of the encoded data. (One zero bit followed by 8 character bits). After
the character encoding, the worst case ratio is 3/2 to the original
data.

The minimum copy length of 3 has been chosen because the worst case copy
length and offset is 3 bits (3) and 19 bits (32255) for a total of 22
bits to encode a 3 character string (24 bits).





































Jung, Ullmann                                                   [Page 4]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


4. The Decoder

As mentioned previously, the compression is defined by the decoder.  Any
encoder that produced output that is correctly decoded is by definition
correct.

The following is an implementation of the decoder, written more for
clarity and as much portability as possible, rather than for maximum
speed.

When optimized for a specific environment, it will run significantly
faster.

    /* LZJU 90 Decoding program */

    #include <stdio.h>

    typedef unsigned char uchar;
    typedef unsigned int  uint;

    #define N          32255
    #define THRESHOLD      3

    #define STRTP          9
    #define STEPP          1
    #define STOPP         14
    #define STRTL          0
    #define STEPL          1
    #define STOPL          7

    static FILE *in;
    static FILE *out;

    static int   getbuf;
    static int   getlen;
    static long  in_count;
    static long  out_count;
    static long  crc;
    static long  crctable[256];
    static uchar xxcodes[] =
    "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\
    abcdefghijklmnopqrstuvwxyz";
    static uchar ddcodes[256];

    static uchar text[N];







Jung, Ullmann                                                   [Page 5]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    #define CRCPOLY         0xEDB88320
    #define CRC_MASK        0xFFFFFFFF
    #define UPDATE_CRC(crc, c)  \
            crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \
                  ^ (crc >> 8)
    #define START_RECD      "* LZJU90"

    void MakeCrctable()     /* Initialize CRC-32 table */
    {
    uint i, j;
    long r;
        for (i = 0; i <= 255; i++) {
            r = i;
            for (j = 8; j > 0; j--) {
                if (r & 1)
                    r = (r >> 1) ^ CRCPOLY;
                else
                    r >>= 1;
                }
            crctable[i] = r;
            }
    }

    int GetXX()             /* Get xxcode and translate */
    {
    int c;
        do {
            if ((c = fgetc(in)) == EOF)
                c = 0;
            } while (c == '\n');
        in_count++;
        return ddcodes[c];
    }

    int GetBit()            /* Get one bit from input buffer */
    {
    int c;
        while (getlen <= 0) {
            c = GetXX();
            getbuf |= c << (10-getlen);
            getlen += 6;
            }
        c = (getbuf & 0x8000) != 0;
        getbuf <<= 1;
        getbuf &= 0xFFFF;
        getlen--;
        return(c);
    }




Jung, Ullmann                                                   [Page 6]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    int GetBits(len)        /* Get len bits */
    int len;
    {
    int c;
        while (getlen <= 10) {
            c = GetXX();
            getbuf |= c << (10-getlen);
            getlen += 6;
            }
        if (getlen < len) {
            c = (uint)getbuf >> (16-len);
            getbuf = GetXX();
            c |= getbuf >> (6+getlen-len);
            getbuf <<= (10+len-getlen);
            getbuf &= 0xFFFF;
            getlen -= len - 6;
            }
        else {
            c = (uint)getbuf >> (16-len);
            getbuf <<= len;
            getbuf &= 0xFFFF;
            getlen -= len;
            }
        return(c);
    }

    int DecodePosition()    /* Decode offset position pointer */
    {
    int c;
    int width;
    int plus;
    int pwr;
        plus = 0;
        pwr = 1 << STRTP;
        for (width = STRTP; width < STOPP; width += STEPP) {
            c = GetBit();
            if (c == 0)
                break;
            plus += pwr;
            pwr <<= 1;
            }
        if (width != 0)
            c = GetBits(width);
        c += plus;
        return(c);
    }






Jung, Ullmann                                                   [Page 7]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    int DecodeLength()      /* Decode code length */
    {
    int c;
    int width;
    int plus;
    int pwr;
        plus = 0;
        pwr = 1 << STRTL;
        for (width = STRTL; width < STOPL; width += STEPL) {
            c = GetBit();
            if (c == 0)
                break;
            plus += pwr;
            pwr <<= 1;
            }
        if (width != 0)
            c = GetBits(width);
        c += plus;
    return(c);
    }

    void InitCodes()        /* Initialize decode table */
    {
    int i;
        for (i = 0; i < 256; i++) ddcodes[i] = 0;
        for (i = 0; i < 64; i++) ddcodes[xxcodes[i]] = i;
    return;
    }

    main(ac, av)            /* main program */
    int ac;
    char **av;
    {
    int r;
    int j, k;
    int c;
    int pos;
    char buf[80];
    char name[3];
    long num, bytes;

        if (ac < 3) {
            fprintf(stderr, "usage: judecode in out\n");
            exit(1);
            }

        in = fopen(av[1], "r");
        if (!in){
            fprintf(stderr, "Can't open %s\n", av[1]);
            exit(1);
            }

Jung, Ullmann                                                   [Page 8]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


        out = fopen(av[2], "w");
        if (!out) {
            fprintf(stderr, "Can't open %s\n", av[2]);
            fclose(in);
            exit(1);
            }

        while (1) {
            if (fgets(buf, sizeof(buf), in) == NULL) {
                fprintf(stderr, "Unexpected EOF\n");
                exit(1);
                }
            if (strncmp(buf, START_RECD, strlen(START_RECD)) == 0)
                break;
            }

        in_count = 0;
        out_count = 0;
        getbuf = 0;
        getlen = 0;

        InitCodes();
        MakeCrctable();

        crc = CRC_MASK;
        r = 0;

        while (feof(in) == 0) {
            c = DecodeLength();
            if (c == 0) {
                c = GetBits(8);
                UPDATE_CRC(crc, c);
                out_count++;
                text[r] = c;
                fputc(c, out);
                if (++r >= N)
                    r = 0;
                }

            else {
                pos = DecodePosition();
                if (pos == 0)
                    break;
                pos--;
                j = c + THRESHOLD - 1;
                pos = r - pos - 1;
                if (pos < 0)
                    pos += N;




Jung, Ullmann                                                   [Page 9]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


                for (k = 0; k < j; k++) {
                    c = text[pos];
                    text[r] = c;
                    UPDATE_CRC(crc, c);
                    out_count++;
                    fputc(c, out);
                    if (++r >= N)
                        r = 0;
                    if (++pos >= N)
                        pos = 0;
                    }
                }
            }

        fgetc(in); /* skip newline */

        if (fscanf(in, "* %ld %lX", &bytes, &num) != 2) {
            fprintf(stderr, "CRC record not found\n");
            exit(1);
            }

        else if (crc != num) {
            fprintf(stderr,
                 "CRC error, expected %lX, found %lX\n",
                 crc, num);
            exit(1);
            }

        else if (bytes != out_count) {
            fprintf(stderr,
                 "File size error, expected %lu, found %lu\n",
                 bytes, out_count);
            exit(1);
            }

        else
            fprintf(stderr,
                 "File decoded to %lu bytes correctly\n",
                 out_count);

        fclose(in);
        fclose(out);
    return;
    }








Jung, Ullmann                                                  [Page 10]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


5. An example of an Encoder

Many algorithms are possible for the encoder, with different tradeoffs
between speed, size, and complexity. The following is a simple example
program which is fairly efficient; more sophisticated implementations
will run much faster, and in some cases produce somewhat better
compression.

This example also shows that the encoder need not use the entire window
available. Not using the full window costs a small amount of
compression, but can greatly increase the speed of some algorithms.

    /* LZJU 90 Encoding program */

    #include <stdio.h>

    typedef unsigned char uchar;
    typedef unsigned int  uint;

    #define N           8192    /* Size of window buffer */
    #define F            256    /* Size of look-ahead buffer */
    #define THRESHOLD      3
    #define NIL            N    /* End  of tree's node  */

    #define STRTP          9
    #define STEPP          1
    #define STOPP         14

    #define STRTL          0
    #define STEPL          1
    #define STOPL          7

    #define CHARSLINE     78

    static FILE *in;
    static FILE *out;

    static int   putlen;
    static int   putbuf;
    static int   char_ct;
    static long  in_count;
    static long  out_count;
    static long  crc;
    static long  crctable[256];
    static uchar xxcodes[] =
    "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\
    abcdefghijklmnopqrstuvwxyz";





Jung, Ullmann                                                  [Page 11]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    static uchar text[N + F - 1];
    static int   match_position;
    static int   match_length;
    static int   lson[N + 1];
    static int   rson[N + 257];
    static int   dad[N + 1];

    #define CRCPOLY         0xEDB88320
    #define CRC_MASK        0xFFFFFFFF
    #define UPDATE_CRC(crc, c)  \
      crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \
      ^ (crc >> 8)

    void MakeCrctable()     /* Initialize CRC-32 table */
    {
    uint i, j;
    long r;
        for (i = 0; i <= 255; i++) {
            r = i;
            for (j = 8; j > 0; j--) {
                if (r & 1)
                    r = (r >> 1) ^ CRCPOLY;
                else
                    r >>= 1;
            }
            crctable[i] = r;
        }
    }

    void PutXX(c)           /* Translate and put xxcode */
    int c;
    {
        c = xxcodes[c & 0x3F];
        if (++char_ct > CHARSLINE) {
            char_ct = 1;
            fputc('\n', out);
        }
        fputc(c, out);
        out_count++;
    }












Jung, Ullmann                                                  [Page 12]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    void PutBits(c, len)    /* Put rightmost "len" bits of "c" */
    int c, len;
    {
        c <<= 16 - len;
        c &= 0xFFFF;
        putbuf |= (uint) c >> putlen;
        c <<= 16 - putlen;
        c &= 0xFFFF;
        putlen += len;
        while (putlen >= 6) {
            PutXX(putbuf >> 10);
            putlen -= 6;
            putbuf <<= 6;
            putbuf &= 0xFFFF;
            putbuf |= (uint) c >> 10;
            c = 0;
            }
    }

    void EncodePosition(ch) /* Encode offset position pointer */
    int ch;
    {
    int width;
    int prefix;
    int pwr;
        pwr = 1 << STRTP;
        for (width = STRTP; ch >= pwr; width += STEPP, pwr <<= 1)
            ch -= pwr;
        if ((prefix = width - STRTP) != 0)
            PutBits(0xffff, prefix);
        if (width < STOPP)
            width++;
        else if (width > STOPP)
            abort();
        PutBits(ch, width);
    }
















Jung, Ullmann                                                  [Page 13]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    void EncodeLength(ch)   /* Encode code length */
    int ch;
    {
    int width;
    int prefix;
    int pwr;
        pwr = 1 << STRTL;
        for (width = STRTL; ch >= pwr; width += STEPL, pwr <<= 1)
            ch -= pwr;
        if ((prefix = width - STRTL) != 0)
            PutBits(0xffff, prefix);
        if (width < STOPL)
            width++;
        else if (width > STOPL)
            abort();
        PutBits(ch, width);
    }

    void InitTree()
    {
    int i;
        for (i = N + 1; i <= N + 256; i++)
            rson[i] = NIL;
        for (i = 0; i < N; i++)
            dad[i] = NIL;
    }


























Jung, Ullmann                                                  [Page 14]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    /* Insert string of length F, text[r..r+F-1], into one of
    the trees (text[r]'th tree) and return the longest-match
    position and length via the global variables match_position
    and match_length. If match_length = F, then remove the
    old node in favor of the new one, because the old one will
    be deleted sooner. Note r plays double role, as tree node
    and position in buffer.
    */
    void InsertNode(r)
    int r;
    {
    int i, cmp, p, c;
    uchar *key, *keyp, *txtp;
        cmp = 1;
        key = &text[r];
        p = N + 1 + key[0];
        rson[r] = lson[r] = NIL;
        match_length = 0;
        for ( ; ; ) {
            if (cmp >= 0) {
                if (rson[p] != NIL)
                    p = rson[p];
                else {
                    rson[p] = r;
                    dad[r] = p;
                    return;
                    }
                }
            else {
                if (lson[p] != NIL)
                    p = lson[p];
                else {
                    lson[p] = r;
                    dad[r] = p;
                    return;
                    }
                }

            txtp = &text[p];
            keyp = key;
            if ((cmp = *++keyp - *++txtp) != 0)
                continue;
            for (i = 2; i < F; i++)
                if ((cmp = *++keyp - *++txtp) != 0)
                    break;







Jung, Ullmann                                                  [Page 15]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


            if (i > match_length) {
                match_position = ((r - p) & (N - 1));
                if ((match_length = i) >= F)
                    break;
                }
            else if (i == match_length) {
                if ((c = ((r - p) & (N - 1))) < match_position)
                    match_position = c;
                }
            }

        dad[r] = dad[p];
        lson[r] = lson[p];
        rson[r] = rson[p];
        dad[lson[p]] = r;
        dad[rson[p]] = r;
        if (rson[dad[p]] == p)
            rson[dad[p]] = r;
        else
            lson[dad[p]] = r;
        dad[p] = NIL;
    }






























Jung, Ullmann                                                  [Page 16]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


    void DeleteNode(p)      /* Delete node p from tree */
    int p;
    {
    int q;
        if (dad[p] == NIL)
            return;
        if (rson[p] == NIL)
            q = lson[p];
        else if (lson[p] == NIL)
            q = rson[p];
        else {
            q = lson[p];
            if (rson[q] != NIL) {
                do {
                    q = rson[q];
                    } while (rson[q] != NIL);
                rson[dad[q]] = lson[q];
                dad[lson[q]] = dad[q];
                lson[q] = lson[p];
                dad[lson[p]] = q;
                }
            rson[q] = rson[p];
            dad[rson[p]] = q;
            }
        dad[q] = dad[p];
        if (rson[dad[p]] == p)
            rson[dad[p]] = q;
        else
            lson[dad[p]] = q;
        dad[p] = NIL;
    }

    main(ac, av)            /* main program */
    int ac;
    char **av;
    {
    int r, s, i, c;
    int last_match_length;
    int len;

        if (ac < 3) {
            fprintf(stderr, "usage: juencode in out\n");
            exit(1);
            }

        in = fopen(av[1], "r");
        if (!in) {
            fprintf(stderr, "Can't open %s\n", av[1]);
            exit(1);
            }


Jung, Ullmann                                                  [Page 17]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


        out = fopen(av[2], "w");
        if (!out) {
            fprintf(stderr, "Can't open %s\n", av[2]);
            fclose(in);
            exit(1);
            }

        char_ct = 0;
        in_count = 0;
        out_count = 0;
        putbuf = 0;
        putlen = 0;

        MakeCrctable();
        crc = CRC_MASK;

        fprintf(out, "* LZJU90 %s\n", av[1]);

        InitTree();
        r = 0;
        s = 0;

        /* Fill lookahead buffer */

        for (len = 0; len < F && (c = fgetc(in)) != EOF; len++) {
            UPDATE_CRC(crc, c);
            in_count++;
            text[s++] = c;
            }

        while (len > 0) {
            InsertNode(r);
            if (match_length > len)
                match_length = len;
            if (match_length < THRESHOLD) {
                EncodeLength(0);
                PutBits(text[r], 8);
                match_length = 1;
                }
            else {
                EncodeLength(match_length - THRESHOLD + 1);
                EncodePosition(match_position);
                }









Jung, Ullmann                                                  [Page 18]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


            last_match_length = match_length;
            for (i = 0; i < last_match_length &&
                        (c = fgetc(in)) != EOF; i++) {
                UPDATE_CRC(crc, c);
                in_count++;
                DeleteNode(s);
                text[s] = c;
                if (s < F - 1)
                    text[s + N] = c;
                s = (s + 1) & (N - 1);
                }
            while (i++ < last_match_length) {
                DeleteNode(s);
                s = (s + 1) & (N - 1);
                len--;
                }
            r = (r + last_match_length) & (N - 1);
            }

        /* end compression indicator */
        EncodeLength(1);
        EncodePosition(0);
        PutBits(0, 7);

        fprintf(out, "\n* %lu %08lX\n", in_count, crc);
        fprintf(stderr, "Encoded %lu bytes to %lu symbols\n",
                in_count, out_count);

        fclose(in);
        fclose(out);
    }

6. LZJU90 example

The following is an example of an LZJU90 compressed object. Using this
as source for the program in section 4 will reveal what it is.

    * LZJU90 example
    8-mBtWA7WBVZ3dEBtnCNdU2WkE4owW+l4kkaApW+o4Ir0k33Ao4IE4kk
    bYtk1XY618NnCQl+OHQ61d+J8FZBVVCVdClZ2-LUI0v+I4EraItasHbG
    VVg7c8tdk2lCBtr3U86FZANVCdnAcUCNcAcbCMUCdicx0+u4wEETHcRM
    7tZ2-6Btr268-Eh3cUAlmBth2-IUo3As42laIE2Ao4Yq4G-cHHT-wCEU
    6tjBtnAci-I++
    * 190 081E2601

References

[1]   David Robinson, Robert L. Ullmann.
      Encoding Header Field for Internet Messages.
      RFC 1154, Prime Computer, April, 1990.


Jung, Ullmann                                                  [Page 19]

RFC DRAFT     LZJU90: Compressed Encoding for Binary Mail   January 1991


Author's Address

   Robert Jung
   2606 Village Road West
   Norwood, MA 02062
   USA

   Phone: +1 617 769 5999
   Email: robjung@world.std.com

   Robert Ullmann 10-30
   Prime Computer, Inc.
   500 Old Connecticut Path
   Framingham, MA 01701
   USA

   Phone: +1 508 620 2800 x1736
   Email: Ariel@Relay.Prime.COM


































Jung, Ullmann                                                  [Page 20]

madler@nntp-server.caltech.edu (Mark Adler) (03/27/91)

In response to Rob Ullmann's LZJU90 RFC draft which suggests a
combined compression, xxencode, and crc checked format:

In my humble opinion, the compression utility and the "expansion"
utility used to make the binary result of the compression mailable
should be separate utilities.  The reason is that compression is
a rapidly advancing field, and better methods will continue to
arise and supplant old ones, whereas making a binary mailable is
well understood, and very nearly optimal base 85 or base 95
(the former handles EBCDIC, the latter being ASCII only) methods
can be used.  xxencode, by the way, is not a nearly optimal
method.

Mark Adler
madler@pooh.caltech.edu

brad@looking.on.ca (Brad Templeton) (03/27/91)

I also agree that this makes little sense, unless you are running on
DOS and the software tools approach is more difficult.

Keep the compressor and the asciifyer independent.  You want to be
able to switch compressors at will, or not compress at all.  You also
might want to switch ascii convertors at will.

ABE was designed to do just about everything you want in an ASCII
encoder.  It can take inut from a pipe, such as your compression program.
Why write two programs when "compressor <args> | abe <args>" will do
it as a one-line shell script.

Among ABE's features by the way are:

	Choice of ABE1 (94 characters) or ABE2 (86 characters) or
	UUENCODE encoding.

	Automatic blocking, with each block directed to a pipe
	(like "|mail user") if desired.

	Files are often smaller, and compress well.

	Most printable characters map to themselves, so strings in
		binaries are readable right in the encoding.

	All lines are indexed, so sort(1) can repair any random
		scrambling of lines or files. (This can be turned off.)

	Extraneous lines (news headers, comments, signatures etc.) are
		ignored, even in the middle of encodings.

	A PD tiny decoder is available to include with files for first
		time users.

	Files can be split up automatically into equal sized blocks.

	Blocks can contain redundant information so that the decoder
		can handle blocks in any order, even with reposted duplicates
		and extraneous articles.

	Files with blank regions can be constructed from multi-part encodings
		with damaged blocks.

	Multiple files can be placed in one encoding.

	The decoder is extremely general and configurable, and supports many
	features not currently found in the encoder, but which other encoder
	writers might fight useful.

For example, if a multi part abe encoding was sent to you in random order,
you can often just say "dabe <mailbox>" and get the files out.

Abe was posted to comp.sources.misc some time ago, it's also on UUNET in
the 'clarinet' ftp directory, I think.  It's free.


BTW ABE uses 86 characters in the ABE2 encoding format, which is
EBCDIC proof.   The characters to avoid are:

	! ` [ \ ] ^ { | } ~

Plus spaces, newlines, DEL, and all control characters, of course.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

madler@nntp-server.caltech.edu (Mark Adler) (03/28/91)

Brad Templeton writes:

>> BTW ABE uses 86 characters in the ABE2 encoding format, which is
>> EBCDIC proof.   The characters to avoid are:
>> 
>>         ! ` [ \ ] ^ { | } ~

But this only leaves 84 characters.  Perhaps you didn't mean to
include the braces ('{' and '}')--I believe those are EBCDIC-safe.
That would give you 86.

Mark Adler
madler@pooh.caltech.edu