[comp.compression] *New* *Improved* LZRW1

ross@spam.ua.oz.au (Ross Williams) (06/25/91)

RELEASE OF THE LZRW1-A ALGORITHM
================================
Author : Ross Williams.
Date   : 25-Jun-1991.

Abstract
--------
This file announces  the release of my LZRW1-A  algorithm. The LZRW1-A
algorithm is a direct descendant  of the LZRW1 algorithm, bettering it
a little  in both  speed and  compression. The  only reason  for using
LZRW1 now  that LZRW1-A exists  is backwards compatibility  (LZRW1 and
LZRW1-A  cannot  decompress  each   other's  compressed  files).  This
document specifies the difference between  LZRW1 and LZRW1-A and gives
details  of the  LZRW1-A  algorithm's implementation  in  C and  68000
assembly language.

The LZRW1 Algorithm
-------------------
In April 1991, I published my LZRW1 algorithm by presenting it at the
data compression conference,

   "An Extremely Fast Ziv-Lempel Data Compression Algorithm", IEEE
   Data Compression Conference, Snowbird, Utah, 8-11 April 1991.

and by advertising it on the internet:

   FTP Archive Access:
   Machine   : sirius.itd.adelaide.edu.au   [IP=129.127.40.3]
   Directory : ~pub/misc
   Files     : lzrw1.tex   - My conference paper describing the algorithm.
               lzrw1.c     - An implementation in C.
               lzrw1.68000 - An implementation in 68000 assembly language.

The LZRW1 algorithm  gives not-good compression but  is fast, requires
only 16K of memory and is patent free.

Suggested Improvements to LZRW1
-------------------------------
After publication of  LZRW1, at least the following  people noted that
the 4-bit copy length range field was not being used to the full.

   Dean Long    dlong@oak.ucsc.edu
   Kurt Haenen  ghgaea8@cc1.kuleuven.ac.be

(I received  some other  suggested improvements  to the  algorithm but
most of  them were  ideas that I  had tried  and discarded  during the
algorithm's  development  (e.g.  deep hash  tables,  different  coding
schemes).

The following table  shows that LZRW1 was wasting two  elements of its
four bit length space. A simple revision changed the range from [3,16]
to [3,18].

Field Value  :  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
--------------------------------------------------------------
LZRW1        :  X  X  3  4  5  6  7  8  9 10 11 12 13 14 15 16
Revision     :  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18

The  missing values  were  intentionally designed  into the  algorithm
because  of  misguided  concerns  for time  efficiency.  The  original
version  of  the  algorithm  was  in 68000  machine  code  and  I  was
optimizing like  crazy. At some  stage I  decided that I  could save a
subtraction instruction in the  compressor and an addition instruction
in the decompressor  by shifting the range by only  one (the decrement
and jump  instruction on the  68000 stops at  -1, not 0).  This design
decision was  reflected in  the C version  with which  I wanted  to be
compatible (so as not to confuse the meaning of the name "LZRW1").

Things changed when Dean Long not  only pointed out the deficiency (of
which I  was aware), but how  removing it could actually  speed up the
code. When I had a look, I saw that he was right and that I had missed
an optimization. I  then decided to create a  new algorithm containing
the extended  length range  and the optimizations.  The result  is the
LZRW1-A  algorithm which  I have  implemented in  C and  68000 machine
code.

LZRW1-A Implementations
-----------------------
Implementations of LZRW1-A in C and  in 68000 assembly language can be
found in the following archive (files  will appear within a day or two
of 25-Jun-1991):

   FTP Archive Access:
   Machine   : sirius.itd.adelaide.edu.au   [IP=129.127.40.3]
   Directory : ~pub/misc
   Files     : lzrw1-a.txt    - Release notes on the file.
               lzrw1-a.c      - An implementation in C.
               lzrw1-a.68000  - An implementation in 68000 assembly language.
               lzrw_headers.h - Useful header files.

68000 Assembly Language Version of LZRW1-A
------------------------------------------
The  68000 code  for  LZRW1-A was  obtained  by  making the  following
changes to the 68000 code for LZRW1:

Compressor:
1. The value of GROUPRAW was changed from 256 to 288.
2. Two extra COMPARE_BYTE macro calls were inserted in the list of calls.
3. Two extra DO_COPY macro calls were inserted in the list of calls.
4. A constant of two was subtracted from the second parameter of all calls
   to DO_COPY.

Decompressor:
5. The code:

          COPYLOOP:
             move.b (A_T1)+,  (A_DST)+   ;Perform the copy operation.
             dbra   D_COPLEN, COPYLOOP

was replaced by the more unrolled version:

          move.b (A_T1)+,  (A_DST)+   ;Copy first  byte of copy item.
          move.b (A_T1)+,  (A_DST)+   ;Copy second byte of copy item.
          COPYLOOP:
             move.b (A_T1)+,  (A_DST)+   ;Perform the copy operation.
             dbra   D_COPLEN, COPYLOOP

This last modification was Dean Long's  idea, and made it efficient to
use the shifted range. Thanks  Dean --- your modification improved the
decompression speed by about 10%.

6. Some minor cosmetic changes such as updating comments were made.

When compared  to the  68000 version  of LZRW1,  the 68000  version of
LZRW1-A gives  about 0.5%  absolute better compression,  compresses at
the same speed, and decompresses 10% faster. The following table gives
the results  of applying the 68000  code to the corpus  WITH A CONTEXT
BLOCK  SIZE  OF  16K.  These results  correspond  to  Figure-8  of  my
conference paper on LZRW1. I  did not bother with instruction counting
this time.

--<Start of Table>--

+--------------------+---------------+---------+
| File       K  /16K | CmpLen   %Rem |   K/s   |
+--------------------+---------------+---------+
| bib      109     7 |  67600   60.8 | 43  162 |
| book1    751    47 | 534086   69.5 | 40  148 |
| book2    597    38 | 368621   60.3 | 45  157 |
| geo      100     7 |  87618   85.6 | 31  146 |
| news     368    24 | 236432   62.7 | 41  164 |
| obj1      21     2 |  13240   61.6 | 41  173 |
| obj2     241    16 | 127885   51.8 | 50  171 |
| paper1    52     4 |  31520   59.3 | 45  159 |
| paper2    80     6 |  51130   62.2 | 44  153 |
| pic      501    32 | 126828   24.7 | 89  202 |
| progc     39     3 |  21932   55.4 | 48  164 |
| progl     70     5 |  31474   43.9 | 58  173 |
| progp     48     4 |  21536   43.6 | 59  173 |
| trans     91     6 |  44158   47.1 | 54  176 |
+--------------------+---------------+---------+
| Average  219    14 |          56.3 | 49  166 |
+--------------------+---------------+---------+

+--------------------+---------------+---------+
| zeros     78     5 |   9508   11.9 |144  219 |
| noise     78     5 |  80030  100.0 | 23 1178 |
+--------------------+---------------+---------+

This table  gives the performance  of a hand-optimized  68000 assembly
language implementation  of LZRW1-A  running on  an 8MHz,  24-bit bus,
Macintosh-SE computer.  The input files  were divided into  16K blocks
which  were  compressed  independently.  The  %Rem  column  gives  the
compression as  a percentage  remaining.. The  K/Sec column  gives the
compression and  decompression speeds in kilobytes  (1024) per second.
Decompression  speeds  are  given relative  to  OUTPUT  (uncompressed)
bytes. The  speeds are for execution  only and do not  include IO. All
results were rounded to the given accuracy.

--<End of Table>--

The  following table  gives the  performance of  the 68000  version of
LZRW1-A  when applied  to the  corpus using  context block  lengths of
256K. The longer blocks improves speed and compression a little.

Details: The  run is  on a Mac-SE  (8MHz 68000  with 24-bit  bus). The
average values  are the average of  exactly the values printed  in the
column above.  The times are for  compression only and do  not include
any IO.

BENCHMARKERS: THIS TABLE HOLDS THE "OFFICIAL" RESULTS FOR LZRW1-A.
+----------------------------------------------------------------+
| DATA COMPRESSION TEST                                          |
| =====================                                          |
| Time of run     : Tue 25-Jun-1991 08:21PM                      |
| Timing accuracy : One part in 100                              |
| Context length  : 262144 bytes (= 256.0000K)                   |
| Test suite      : Calgary Corpus Suite                         |
| Files in suite  : 14                                           |
| Algorithm       : LZRW1-A (68000 assembly language)            |
+----------------------------------------------------------------+
| File Name   Length  CxB  ComLen  %Remn  Bits  Com K/s  Dec K/s |
| ----------  ------  ---  ------  -----  ----  -------  ------- |
| rpus:Bib.D  111261    1   65801   59.1  4.73    44.35   162.17 |
| us:Book1.D  768771    3  522797   68.0  5.44    40.58   147.21 |
| us:Book2.D  610856    3  359715   58.9  4.71    45.74   156.85 |
| rpus:Geo.D  102400    1   86432   84.4  6.75    31.91   144.93 |
| pus:News.D  377109    2  230298   61.1  4.89    42.57   163.84 |
| pus:Obj1.D   21504    1   13189   61.3  4.91    41.14   173.39 |
| pus:Obj2.D  246814    1  124666   50.5  4.04    50.61   171.89 |
| s:Paper1.D   53161    1   30626   57.6  4.61    46.35   158.72 |
| s:Paper2.D   82199    1   50048   60.9  4.87    44.84   152.90 |
| rpus:Pic.D  513216    2  125718   24.5  1.96    89.43   202.04 |
| us:Progc.D   39611    1   21494   54.3  4.34    48.52   164.22 |
| us:Progl.D   71646    1   30634   42.8  3.42    59.97   173.47 |
| us:Progp.D   49379    1   20935   42.4  3.39    60.49   173.60 |
| us:Trans.D   93695    1   42484   45.3  3.63    55.36   177.09 |
+----------------------------------------------------------------+
| Average     224401    1  123202   55.1  4.41    50.13   165.88 |
+----------------------------------------------------------------+

C Version of LZRW1-A
--------------------
The C version of LZRW1-A was  obtained by making the following changes
to the C version of LZRW1:

Compressor:
1. The overrun specification in the header comments was changed from 256
   (=16x16) to 288 (=16x18).
2. The value of ITEMMAX was changed from 16 to 18.
3. The unrolled matching loop was unrolled a bit more. Two more "PS ||"
   were appended to the list of them.
4. The term (len-1) was replaced by (len-3) in the line "*p_dst...".
Decompressor:
5. "len=1+.." was changed to "len=3+.."

At this  stage I had  a working version  of LZRW1-A whose  code looked
very like the code for LZRW1. However, I continued and:

6. Performed extensive optimizations on  both the compressor and
   decompressor. The same code structure is present, but a lot has
   changed.

The  C version  runs about  two to  three times  more slowly  than the
assembler version.  The results  are given  below. The  output lengths
vary by a  few bytes from the  256K run of the 68000  version; this is
almost certainly a result of the non-determinism of the algorithm.

Details: The  run is  on a Mac-SE  (8MHz 68000  with 24-bit  bus). The
average values  are the average of  exactly the values printed  in the
column above.  The times are for  compression only and do  not include
any IO. The compiler used was THINK C V4.0.

+----------------------------------------------------------------+
| DATA COMPRESSION TEST                                          |
| =====================                                          |
| Time of run     : Tue 25-Jun-1991 08:59PM                      |
| Timing accuracy : One part in 100                              |
| Context length  : 262144 bytes (= 256.0000K)                   |
| Test suite      : Calgary Corpus Suite                         |
| Files in suite  : 14                                           |
| Algorithm       : LZRW1-A (C version)                          |
+----------------------------------------------------------------+
| File Name   Length  CxB  ComLen  %Remn  Bits  Com K/s  Dec K/s |
| ----------  ------  ---  ------  -----  ----  -------  ------- |
| rpus:Bib.D  111261    1   65813   59.2  4.73    14.95    67.67 |
| us:Book1.D  768771    3  522800   68.0  5.44    13.68    62.96 |
| us:Book2.D  610856    3  359715   58.9  4.71    15.40    66.59 |
| rpus:Geo.D  102400    1   86432   84.4  6.75    10.81    59.55 |
| pus:News.D  377109    2  230298   61.1  4.89    14.33    67.52 |
| pus:Obj1.D   21504    1   13189   61.3  4.91    13.70    69.36 |
| pus:Obj2.D  246814    1  124666   50.5  4.04    17.07    71.36 |
| s:Paper1.D   53161    1   30626   57.6  4.61    15.65    67.09 |
| s:Paper2.D   82199    1   50048   60.9  4.87    15.12    65.34 |
| rpus:Pic.D  513216    2  125718   24.5  1.96    30.01    84.19 |
| us:Progc.D   39611    1   21494   54.3  4.34    16.29    69.28 |
| us:Progl.D   71646    1   30634   42.8  3.42    20.18    73.43 |
| us:Progp.D   49379    1   20944   42.4  3.39    20.23    73.56 |
| us:Trans.D   93695    1   42501   45.4  3.63    18.61    73.53 |
+----------------------------------------------------------------+
| Average     224401    1  123205   55.1  4.41    16.86    69.39 |
+----------------------------------------------------------------+

Nomenclature
------------
I am  keen to  retain a  sharp definitions  for the  algorithms that I
produce.

The  term  "LZRW1" should  be  used  only  to  refer to  the  original
algorithm published in my data compression conference paper. This term
does  not refer  to  a generic  family of  algorithms  but rather  the
specific algorithm described in the paper.

The term "LZRNW1" I have used accidentally in the past in referring to
"LZRW1". It should never be used but  if you see it used, it means the
same as "LZRW1".

The term "LZRW1-A" refers to  the particular algorithm described above
which is based on the "LZRW1" algorithm.

--<End of Report on LZRW1-A>--

sean@ms.uky.edu (Sean Casey) (06/26/91)

What is the patent status of said algorithm?

(He said, holding his breath)
-- 
** Sean Casey  <sean@s.ms.uky.edu>
** Recent subject line in comp.sys.handhelds:  Printing BIG GROBS