[comp.compression] Some compression results

Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) (03/26/91)

Hi all,
    I've been building up a collection of statistics for various commercial
archiving programs run on the data from the book "Text Compression" by Tim
Bell, John Cleary, and Ian Witten (an excellent book on data compression BTW).
Anyway, in case you ever wanted to compare the results for the schemes in the
book with those for generally-used archiving utilities, here are the results:
The PC archivers were run on a 386/25, the Mac ones on a Mac SE30.  Figures
are:  Output data size; runtime in seconds (or minutes); compression ratio in
bits/byte (ie 2.85 means 2.85 bits output for every 8 bits input):

                     PKZIP 1.10          LHARC 1.13c          LHARC 2.10
---------------+--------------8((<q_51D2MV3--------------+--------------------+
Bib  : 111,261 |  41,354 03.0s 2.97 |  46,501 04.8s 3.34 |  40,740 04.8s 2.93 |
Book1: 768,771 | 350,560 29.0s 3.65 | 369,512 30.5s 3.85 | 339,074 35.1s 3.53 |
Book2: 610,856 | 232,589 19.1s 3.04 | 252,561 24.3s 3.31 | 228,447 26.6s 2.99 |
Geo  : 102,400 |  76,172 09.9s 5.95 |  70,945 05.0s 5.53 |  68,274 04.3s 5.36 |
News : 377,109 | 157,326 10.9s 3.34 | 166,062 16.7s 3.52 | 155,084 16.1s 3.29 |
Obj1 :  21,504 |  10,546 01.5s 3.92 |  10,748 01.4s 4.00 |  10,310 00.9s 3.84 |
Obj2 : 246,814 |  90,130 07.7s 2.92 |  90,848 11.8s 2.94 |  84,981 10.0s 2.75 |
Paper1: 53,161 |  20,041 01.7s 3.01 |  21,749 02.3s 3.27 |  19,676 02.3s 2.96 |
Paper2: 82,199 |  32,867 02.8s 3.20 |  35,278 03.4s 3.43 |  32,096 03.7s 3.12 |
Pic  : 513,216 |  63,805 46.7s 0.99 |  61,394 1m22s 0.96 |  52,221 17.8s 0.81 |
Progc:  39,611 |  14,161 01.2s 2.86 |  15,400 01.9s 3.11 |  13,941 01.7s 2.82 |
Progl:  71,646 |  17,255 01.9s 1.93 |  18,759 04.1s 2.09 |  16,914 03.0s 1.89 |
Progp:  49,379@id-bB6u\;ZCI.5s 1.92 |  12,792 02.5s 2.07 |  11,507 02.0s 1.86 |
Trans:  93,695 |  23,135 02.2s 1.98 |  28,094 04.4s 2.40 |  22,578 03.8s 1.93 |
---------------+--------------------+--------------------+--------------------+
 14 Files        Avg:          2.98   Avg:          3.13   Avg:          2.86
                 Book2 sp.=32.0 K/s   Book2 sp.=25.1 K/s   Book2 sp.=23.0 K/s
                 Size:    1,141,818   Size:    1,200,643   Size:    1,095,843
                 
                      PAK 1.00             PAK 2.51             PKARC 3.61     
---------------+--------------------+--------------------+--------------------+
Bib  : 111,261 |  47,757 03.1s 3.43 |  42,382 04.2s 3.05 |  49,192 00.8s 3.54 |
Book1: 768,771 | 346,495 23.0s 3.61 | 347,567 34.4s 3.62 | 370,512 05.3s 3.86 |
Book2: 610,856 | 263,671 17.7s 3.45 | 234,934 24.0s 3.08 | 299,490 04.2s 3.92 |
Geo  : 102,400 |  72,246 04.5s 5.64 |  68,853 05.2s 5.38 |  79,433 01.0s 6.21 |
News : 377,109 | 189,775 11.8s 4.03 | 158,456 13.6s 3.36 | 214,213 02.8s 4.54 |
Obj1 :  21,504 |  12,769 00.7s 4.72 |  11,236 01.2s 4.14 |  13,500 00.3s 5.02 |
Obj2 : 246,814 | 123,189 07.9s 3.99 |  89,530 10.2s 2.90 | 132,337 01.8s 4.29 |
Paper1: 53,161 |  24,838 01.5s 3.74 |  20,405 02.2s 3.04 |  25,814 00.4s 3.88 |
Paper2: 82,199 |  35,839 02.3s 3.49 |  33,162 03.6s 3.23 |  38,303 00.7s 3.73 |
Pic  : 513,216 |  59,841 05.7s 0.93 |  67,195 15.1s 1.05 |  64,938 02.4s 1.00 |
Progc:  39,611 |  18,751 01.2s 3.79 |  14,630 01.7s 2.95 |  20,041 00.4s 4.05 |
Progl:  71,646 |  26,746 01.8s 2.99 |  18,083 02.8s 2.02 |  28,050 00.6s 3.13 |
Progp:  49,379 |  18,972 01.3s 3.07 |  12,437 02.1s 2.01 |  20,197 00.4s 3.27 |
Trans:  93,695 |  38,091 02.5s 3.25 |  24,113 03.1s 2.06 |  42,286 00.7s 3.61 |
---------------+--------------------+--------------------+--------------------+
 14 Files        Avg:          3.52   Avg:          2.99   Avg:          3.86 
                 Book2 sp.=34.4 K/s   Book2 sp.=25.5 K/s   Book2sp.=146.1 K/s 
                 Size:    1,278,800   Size:    1,142,983   Size:    1,398,306

                        BRENT              COMPRESS
---------------+--------------------+--------------------+
Bib  : 111,261 |  59,711 21.0s 4.29 |  46,258 01.6s 3.33 |
Book1: 768,771 | 466,187 2m58s 4.85 | 335,033 11.8s 3.87 |
Book2: 610,856 | 318,979 1m58s 4.18 | 256,378 09.6s 3.36 |
Geo  : 102,400 |  78,485 15.6s 6.13 |  77,777 01.9s 6.08 |
News : 377,109 | 208,270 1m08s 4.42 | 185,241 06.0s 3.93 |
Obj1 :  21,504 |  11,654 08.6s 4.34 |  14,048 00.5s 5.23 |
Obj2 : 246,814 | 103,120 33.5s 3.34 | 130,574 04.5s 4.23 |
Paper1: 53,161 |  27,081 09.6s 4.08 |  25,077 00.9s 3.77 |
Paper2: 82,199 |  44,393 16.7s 4.32 |  36,161 01.3s 3.52 |
Pic  : 513,216 | Crash-no free(0.90)|  62,215 05.7s 0.97 |
Progc:  39,611 |  18,967 06.4s 3.83 |  19,143 00.7s 3.87 |
Progl:  71,646 |  24,106 09.8s 2.69 |  27,148 01.1s 3.03 |
Progp:  49,379 |  16,214 06.5s 2.63 |  19,209 00.8s 3.11 |
Trans:  93,695 |  36,028 14.4s 3.08 |  38,240 01.4s 3.27 |
---------------+--------------------+--------------------+
 14 Files        Avg:          3.79   Avg:          3.68
                 Book2 sp.= 5.2 K/s   Book2 sp.=63.9 K/s
                 Size:    1,470,945   Size:    1,272,502

                   StuffIt 1.5.1      Compactor 1.21    Disk Doubler 3.0A
---------------+------------------+--------------------+------------------+
Bib  : 111,261 |  42,329 11s 3.04 |  46,948    7s 3.38 |  46,612 14s 3.35 |
Book1: 768,771 | 357,518 42s 3.72 | 352,581 1m31s 3.67 | 332,140 41s 3.46 |
Book2: 610,856 | 281,485 32s 3.67 | 237,187 1m06s 3.11 | 250,843 27s 3.29 |
Geo  : 102,400 |  73,010 17s 5.70 |  69,891   15s 5.46 |  77,861  6s 6.08 |
News : 377,109 | 202,725 22s 5.38 | 158,627   37s 3.37 | 182,205 23s 3.87 |
Obj1 :  21,504 |  14,179  3s 5.27 |  10,789    4s 4.01 |  14,132  2s 5.26 |
Obj2 : 246,814 | 138,871 16s 4.50 |  88,485   30s 2.87 | 128,965 23s 4.18 |
Paper1: 53,161 |  25,208  6s 3.79 |  20,419    7s 3.07 |  25,161  3s 3.77 |
Paper2: 82,199 |  37,327  5s 3.63 |  33,431    9s 3.25 |  36,245  4s 3.53 |
Pic  : 513,216 |  63,408 14s 0.99 |  53,404   22s 0.83 |  62,299 14s 0.97 |
Progc:  39,611 |  19,274  3s 3.89 |  14,438    5s 2.92 |  19,227  5s 3.88 |
Progl:  71,646 |  27,247  6s 3.04 |  17,840    7s 1.99 |  27,232  4s 3.04 |
Progp:  49,379 |  19,340  3s 3.13 |  12,110    5s 1.96 |  19,293  7s 3.13 |
Trans:  93,695 |  39,736  6s 3.39 |  23,832    9s 2.03 |  38,324  6s 3.27 |
---------------+------------------+--------------------+------------------+
 14 Files        Avg:        3.80   Avg:          2.99   Avg:        3.65  
                 Book2sp.=19.1 K/s  Book2 sp.= 9.3 K/s   Book2sp.=22.6 K/s
                 Size:   1,341,663  Size:    1,139,982   Size:  1,261,828
                 
                      ZOO 2.01             Arithmetic
---------------+--------------------+--------------------+
Bib  : 111,261 |  55,049 02.1s 3.96 |  72,920 0m05s 5.24 |
Book1: 768,771 | 391,581 15.8s 4.07 | 437,633 0m33s 4.55 |
Book2: 610,856 | 302,143 11.8s 3.96 | 365,389 0m27s 4.78 |
Geo  : 102,400 |  79,466 02.9s 6.21 |  72,521 0m16s 5.+SW/2.g1e]lB,77,109 | 217,194 08.4s 4.61 | 244,852 0m18s 5.19 |
Obj1 :  21,504 |  13,688 00.6s 5.09 |  16,087 0m02s 5.98 |
Obj2 : 246,814 | 132,837 05.2s 4.31 | 187,544 0m16s 6.08 |
Paper1: 53,161 |  26,789 01.2s 4.03 |  33,196 0m03s 5.00 |
Paper2: 82,199 |  39,905 01.7s 3.88 |  47,641 0m04s 4.64 |
Pic  : 513,216 |  64,952 05.9s 1.01 |  75,332 0m14s 1.17 |
Progc:  39,611 |  20,058 00.9s 4.05 |  25,982 0m02s 5.25 |
Progl:  71,646 |  29,045 01.3s 3.24 |  42,716 0m03s 4.77 |
Progp:  49,379 |  20,231 00.9s 3.28 |  30,385 0m02s 4.92 |
Trans:  93,695 |  43,511 01.8s 3.68 |  64,433 0m05s 5.50 |
---------------+--------------------+---------------------+
 14 Files        Avg:          3.96    Avg:          4.91
                 Book2 sp.=51.7 K/s    Book2 sp.=22.6 K/s
                 Size:    1,436,449    Size:    1,716,631

                      ARJ 0.15a         ARJ 0.20, 1.00    
---------------+--------------------+--------------------+
Bib  : 111,261 |  40,503 0m11s 2.91 |  40,741 0m10s 2.93 |
Book1: 768,771 | 337,632 1m18s 3.51 | 339,078 1m16s 3.53 |
Book2: 610,856 | 227,566 1m00s 2.98 | 228,444 0m58s 2.99 |
Geo  : 102,400 |  68,276 0m09s 5.33 |  68,574 0m09s 5.36 |
News : 377,109 | 154,421 0m36s 3.28 | 155,086 0m35s 3.29 |
Obj1 :  21,504 |  10,309 0m03s 3.84 |  10,311 0m02s 3.84 |
Obj2 : 246,814 |  85,008 0m23s 2.76 |  84,982 0m22s 2.75 |
Paper1: 53,161 |  19,613 0m05s 2.95 |  19,676 0m05s 2.96 |
Paper2: 82,199 |  31,988 0m08s 3.11 |  32,097 0m08s 3.12 |
Pic  : 513,216 |  52,008 1m40s 0.81 |  52,221 1m39s 0.81 |
Progc:  39,611 |  13,876 0m04s 2.80 |  13,942 0m04s 2.82 |
Progl:  71,646 |  16,840 0m07s 1.88 |  16,915 0m07s 1.89 |
Progp:  49,379 |  11,490 0m05s 1.86 |  11,507 0m05s 1.86 |
Trans:  93,695 |  22,483 0m09s 1.92 |  22,578 0m09s 1.93 |
---------------+--------------------+--------------------+
 14 Files        Avg:          2.85   Avg:          2.86
                 Book2 sp.=10.2 K/s   Book2 sp.=10.5 K/s
                 Size:    1,092,013   Size:    1,096,152

Comments:
---------

  The three archivers given in lower-case are the Mac archivers.  I used the
speed results for Book2 for no logically plausible reason.  'BRENT' is an
implementation of R.P.Brent kindly provided by Robert Jung (author of the ARJ
archiver).  'Arithmetic' is the arithmetic compressor given in "Arithmetic
Coding for Data Compression", Communications of the ACM June 1987, p.520.
'Compress' is COMP430D, an implementation of 16-bit compress for the PC.  All
archivers were run with the default compression mode.

  The results themselves are interesting:  In particular, the arithmetic
compressor is only marginally better than an adaptive Huffman compressor (4.91
vs 4.99 bits/byte) for the order-0 model - its only with higher-order models
that arithmetic compressors attain their outstanding performance.  Also, it
appears that there are now several archivers which outperform the long-time
standard PKZIP (Lharc 2.10 and ARJ), and most of them outperform the LZ-based
compressors cited in "Text Compression", though none of them come close to tw[Sk)bB,S2NFDe attained by pure arithmetic compressors.

Disclaimer:
-----------

  These results were obtained by running the archivers under exactly identical
conditions (temperature, pressure, air humidity etc :-) on identical data.
I've posted these results merely to waste net.bandwidth.....er, to provide
information.  I have no affiliation with any of authors of the programs, nor
do I have an axe to grind with any of the authors, etc etc waffle waffle.  Make
what you will of these results.....

Finally:
--------

  If anyone has any compressors/archivers they've been working on, I'd be
interested in running them on the compression corpus so I can add the results 
to this list.  The corpus itself is about 3.5M in size and is FTP'able as:

  fsa.cpsc.ucalgary.ca:/pub/text.compression.corpus/ [136.159.2.1]

Anyway, if you want to send me MSDOS executables (or if you trust me with 
source code :-) I'd be interested in checking them out....

Peter.

 Peter_Gutmann@kcbbs.gen.nz || peter@nacjack.gen.nz || pgut1@cs.aukuni.ac.nz
                     (In order of decreasing reliability)
Warning!
  Something large, scaly, and with fangs a foot long lives between <yoursite>
and <mysite>.  Every now and then it kills and eats messages.  If you don't
receive a reply within a week, try resending...