[comp.sys.amiga.graphics] New IFF compression technique

drtiller@uokmax.ecn.uoknor.edu (Donald Richard Tillery Jr) (02/17/91)

I was playing around with IFF ILBM files and began toying with the idea of
improving a bit on the standard ByteRun1 compression technique by including
a primitive sort of vertical compression.
 
Before I describe it, I better ask if anyone can send me a list of the
registered compression schemes.  I know 0x00 = None, and 0x01 = ByteRun1.
I've also heard that all the other values up to 9 are used as well.  I
even heard that 0x02 was some type of Huffman encoding (like LZH).
 
I achieve anywhere from 8 to 20% improvement over ByteRun1 compression with
this technique and the decompression is from 20 to 50% slower (extreme cases
can get up to 80%+ improvement and the file is NEVER larger than ByteRun1).
 
Anyway, for anyone interested, here's how it works:
 
(It turns out, purely unintentionally, that the decompression code for this
new scheme will also correctly decode a properly and efficiently encoded normal
ByteRun1 compressed image...)
 
I call the new technique (is anything really "new", I'm just adding to another)
ByteRun1Plus and the key byte is 0x2B ('+').
 
(For those unfamiliar with ByteRun1 I'll go into excruciating detail on this
so the rest of you bear with me).
 
If you assume that there are X horizontal pixels, Rows vertical pixels, and
the image is Depth bitplanes deep, then BytesPerRow = INT( X / 8 ).  There
are BytesPerRow x Rows x Depth bytes in the entire image.
 
ILBM encoding starts at the top of the image in bitplane 0 and writes the
first BytesPerRow bytes of each plane (encoding if necessary).  Then the
next row is done in a similar manner until the entire image is written to
the storage device. 

ByteRun1 (and ByteRun1Plus) compression scheme encodes each BytesPerRow bytes
in the following manner:  If there is a run of _3_ or more bytes in this row
of the plane, then this run is encoded as a byte with the value of the number
of duplicate bytes subtracted from 257 followed by the byte's value.  A limit 
of 128 run bytes is set to keep the value in the 129-254 range for that byte,
thus distinguishing it from the encoding of a length of unique data.  This data
is encoded by a value representing the number of unique bytes minus 1 (also
limited to a string of 128 to limit its value to 0-127) followed by that number
of data bytes with those values.   This represents a horizontal compression.
 
This leaves out 128 and 255.  Normally 255 is used for a run of length 2.  A
properly efficient encoder will never use this however, since is results in
an additional byte being needed to return to the unique data string following
the run.  It is more efficient in this case to ignore this 2 byte run and 
encode it as if it were unique data.  The 128 was denoted a NOP (no operation)
and of course no efficient encoder would throw in useless bytes when trying
to compress something.
 
I have thus found two byte values to steel for my own use in a primitive
vertical compression technique. 
 
The ByteRun1Plus scheme uses the 255 value to denote a repeat of the entire
previous BytesPerRow bytes (row) of the same bitplane(!).  The 128 value
represents a flag indicating that there were sufficient repeats from the
previous line to warrant this vertical compression (as determined at compress
time by the brute force method of compressing the row in both ways and
and comparing their respective lengths).  The following BytesPerRow / 8 
(rounded UP!) bytes are a bit encoded set of flags which determine which
bytes are duplicates of the one BytesPerRow before it.  A 1 indicates that
the corresponding byte is a duplicate and a 0 indicates that it is not.
Whenever a 0 is encountered the next byte of those immediately following the
last flag byte is taken as the delta byte with the new value of the byte.
 
For example(values in decimal unless preceeded by $ [hex] or % [binary]):
 
 235 100 165 207  92  34  38  42
 235  98 165 207 242  34 223  42
 
would encode as:  128 %10110101 98 242 223
 
 
That's it.  Any comments or suggestions are welcome.  And anyone who knows
how I can register this please let me know.  
 
If anyone cares, I'll make the translater and viewing programs available when
they are debugged (trans is done, but the viewer will be a while...school you
know :-)
 
Rick Tillery (drtiller@uokmax.ecn.uoknor.edu)

ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) (02/18/91)

In article <1991Feb17.015223.25784@uokmax.ecn.uoknor.edu> drtiller@uokmax.ecn.uoknor.edu (Donald Richard Tillery Jr) writes:
>I've also heard that all the other values up to 9 are used as well.  I
>even heard that 0x02 was some type of Huffman encoding (like LZH).

No offense, but if you're going ti play about with compression, you'd
better study a it bit more.  Huffman and LZW are entirely different
systems.  Huffman maps a fixed-length character set into a variable
length character set on the provision that the more commonly used
characters in the source are the smaller ones in the destination, and
LZW builds a dictionary of recently used character streams which can be
referenced to reduce redundancy.  BTW, LZW is under a bit of a legal
cloud, owing to a patent reportly obtained upon it.  I have not heard
the outcome of the case.

[ Description of the system deleted ]

>That's it.  Any comments or suggestions are welcome.  And anyone who knows
>how I can register this please let me know.  

One of the areas where IFF will get into real trouble, and this is
starting to happen now, is if standardised formats end up with an
enormous number of variants that require continual modification of the
reader program.  One such danger is if ILBM starts to have a large
number of different compression systems assigned to it.

Now I would be the first to agree that the current compression system in
IFF is inefficient and poorly thought out, but I would strongly resist
any attempt to improve it without a careful consideration of the options
available (ie. huffman, lzw, jpeg etc.) to choose a *SMALL* subset of
efficient and suitable compression systems.  I just don't think that
the system described above has enough advantages to warrant its official
registration.

--
Ian Farquhar                      Phone : + 61 2 805-9400
Office of Computing Services      Fax   : + 61 2 805-7433
Macquarie University  NSW  2109   Also  : + 61 2 805-7420
Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au