xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (04/10/91)
It pays off a lot when trying to prove the efficiency of your favorite
compression algorithm to carefully select the data you want to use to
show it off to best effect. Said another way, nothing succeeds like a
rigged demo.
I was developing a program for rec.games.programmer, and needed a debug
file for a little logic problem I knew was there, but couldn't quite
visualize.
[The problem was to create a maze to represent a town of the
sort used in, e.g., the commercial Bard's Tale game series.
(As a side light, the program eventually succeeded to some
nice degree, and examples are posted to r.g.p). The debug
output was the (21x43 character) maze image, repeated once
for each step of the drawing algorithm.]
In the current example, that is 123 blocks of text, 947 bytes long each
(counting linefeeds and one blank line), differing in about seven
character positions from one block to the next.
Watching the file crawl by at 1200 baud was aging me too fast, so I
decided to pack it up, download it home, and view it at 38,000 baud
on my local hardware. I got a very pleasent surprise when I chose
to do this with lharc.
This is severely regular data, and various compression algorithms take
advantage of that regularity with varying degrees of success:
bytes file name algorithm
114106 townmaze.out original data
32422 townmaze.out.aed CACM 6/87 Arithmetic Data Compression code
24582 townmaze.out.S Recently mentioned splay tree code
15143 townmaze.out.zoo zoo 2.01
14363 townmaze.out.Z standard BSD compress -b16
4801 townmaze.out.lzh lharc 1.02A as posted to alt.sources late last
year by me.
That last is less than 5% of the original data volume, and yes, it is
real; I unpacked the original file from it twice to replace the files
replaced by splay and compress, which do not normally leave an original
copy lying about.
It is thus not all that awful an idea to consider reverting to the ARC
program's idea of trying eight or more compression methods with each
having various favorite data types, and saving only the best result in
the archive, if time and cpu cycles are cheap and storage/transmission
bandwidth is dear.
Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>