xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (04/10/91)
It pays off a lot when trying to prove the efficiency of your favorite compression algorithm to carefully select the data you want to use to show it off to best effect. Said another way, nothing succeeds like a rigged demo. I was developing a program for rec.games.programmer, and needed a debug file for a little logic problem I knew was there, but couldn't quite visualize. [The problem was to create a maze to represent a town of the sort used in, e.g., the commercial Bard's Tale game series. (As a side light, the program eventually succeeded to some nice degree, and examples are posted to r.g.p). The debug output was the (21x43 character) maze image, repeated once for each step of the drawing algorithm.] In the current example, that is 123 blocks of text, 947 bytes long each (counting linefeeds and one blank line), differing in about seven character positions from one block to the next. Watching the file crawl by at 1200 baud was aging me too fast, so I decided to pack it up, download it home, and view it at 38,000 baud on my local hardware. I got a very pleasent surprise when I chose to do this with lharc. This is severely regular data, and various compression algorithms take advantage of that regularity with varying degrees of success: bytes file name algorithm 114106 townmaze.out original data 32422 townmaze.out.aed CACM 6/87 Arithmetic Data Compression code 24582 townmaze.out.S Recently mentioned splay tree code 15143 townmaze.out.zoo zoo 2.01 14363 townmaze.out.Z standard BSD compress -b16 4801 townmaze.out.lzh lharc 1.02A as posted to alt.sources late last year by me. That last is less than 5% of the original data volume, and yes, it is real; I unpacked the original file from it twice to replace the files replaced by splay and compress, which do not normally leave an original copy lying about. It is thus not all that awful an idea to consider reverting to the ARC program's idea of trying eight or more compression methods with each having various favorite data types, and saving only the best result in the archive, if time and cpu cycles are cheap and storage/transmission bandwidth is dear. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>