csc@watmath.UUCP (Computer Sci Club) (11/06/86)
I (with some others) am in the process of building a printable character archiver, called "earthpig" (it's an aarchiver, :-)). The following is an example of the compression its printable character encoding algorithm gets. The "test" file is /bin/vi. I have shown the compression ratios for the various interesting files. A file ending in "pig" is an earthpig encoded file. A file ending in "uue" is a uuencoded file. A file ending in 'Z' is a "compress"ed file. test 131338 test.Z 70103 (0.533 of test) test.pig 143087 (1.090 of test) test.uue 180979 (1.378 of test) test.pig.Z 73175 test.uue.Z 94691 test.Z.pig 96698 (0.736 of test) test.Z.uue 96611 (0.735 of test) For compressed data, it does about as well as uuencode. For uncompressed data it's quite a lot better. Small binaries (under 20K) shrink slightly. C programs and text files shrink slightly or stay about the same. Earthpig uses only the characters +-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ @.,;:=?*"'/!()_%& This character set should make it through almost anything unchanged. The algorithm only uses table lookup: no bit masking or shifting. When we finish the archiver we will post various versions of it to the net. What it does: - can generate correction requests from errors - can generate patches from correction requests - CRC checking on two levels - supports os independent hierchical file names - high immunity to format changes and noise characters (space/control) - close to 1:1 encoding on uncompressed data The basic goal of earthpig is to provide a single tool that will allow the transfer of arbitrary data and software around the network while providing a very high level of confidence that the data arrived correctly. Tracy Tims mail to ihnp4!watmath!unit36!tracy