davidsen@steinmetz.ge.com (Wm. E. Davidsen Jr) (04/13/89)
I have often noted the poor performance of compress on uuencoded files when sending news. I therefore took an arc file, converted it to text via uuencode, and applied compress and lzhuf to it. I repeated the test using the more efficient btoa routine. Size Modify time File name 7028 6 Apr 89 13:38 lzhsrc10.arc The original archive file 9715 13 Apr 89 11:43 arcuu arc->uuencode 7502 13 Apr 89 11:44 arcuu.L arc->uuencode->lzhuf 9523 13 Apr 89 11:44 arcuu.Z arc->uuencode->compress 8956 13 Apr 89 11:45 arcb2a arc->btoa 7472 13 Apr 89 11:46 arcb2a.L arc->btoa->lzhuf 10007 13 Apr 89 11:46 arcb2a.Z arc->btoa->compress Conclusions: 1) lzhuf performs better than compress on uuencoded files 2) lzhuf performs far better than compress on btoa'd files 3) since most site compress news before sending, uuencode is better than the more efficient btoa, in that it doesn't break compress as badly. 4) if lzhuf becomes widly used for news compression btoa becomes better for representing binaries, but only a little. -- bill davidsen (wedu@crd.GE.COM) {uunet | philabs}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
wcs@cbnewsh.ATT.COM (william.clare.stewart) (04/18/89)
In article <13603@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: > I have often noted the poor performance of compress on uuencoded > files when sending news. I therefore took an arc file, converted it to > text via uuencode, and applied compress and lzhuf to it. I repeated the > test using the more efficient btoa routine. I hate to say it, but you're misunderstanding what's happening here, which means your numbers are probably all bogus. Uuencode hardly bothers compress at all - the problem is arc. Compress, and other LZW-based programs, compress things by taking advantage of redundancy character sequences. uuencode doesn't affect this redundacy much - it mostly lengthens the sequences making it take a but longer to get up to speed. Arc, on the other hand, *compresses* the file - it has several compression algorithms available to it, including "don't", but I think it commonly uses LZW. So compressing an arc file doesn't gain much, because arc has already used up most of the redundancy that compress wants to use. lzhuf apparently uses a different algorithm so it has to do better. What you need to use as your input file is the stuff that went *into* the arc - after all, what you're trying to accomplish is using lzhuf or compress to replace arc. You also need to do statistical sampling on a variety of inputs - large & small (startup vs longrun effects), text/exe/source (each will have different amounts & kinds of redundancy), and input that has already been hacked through other processors, such as uuencode/btoa, compress, huffman, lzhuf! (how do doubly-lzhuf'd files behave?) > Conclusions: > 1) lzhuf performs better than compress on uuencoded files You can't tell because your data was arced first. > 2) lzhuf performs far better than compress on btoa'd files potentially interesting > 3) since most site compress news before sending, uuencode is better > than the more efficient btoa, in that it doesn't break compress as > badly. The reason this happens is that btoa scrambles the data a bit more - it's working on groups of 4 bytes instead of 2 - and this reduces the double-LZW effect. To draw any conclusions, you need a lot more data points though - try taking all the files in comp.binaries.* and see what happens. > 4) if lzhuf becomes widly used for news compression btoa becomes > better for representing binaries, but only a little. At least for now, lzhuf implementations are too slow and CPU-intensve to use for news compression, though it has more potential for compressing the original code. One thing about LZW - uncompressing is much faster than compressing. How do the speeds compare for lzhuf? -- # Bill Stewart, AT&T Bell Labs 2G218 Holmdel NJ 201-949-0705 ho95c.att.com!wcs # "If it weren't for us, American troops would be invading exotic places like # Lebanon and Grenada, and the Air Force would do stuff like bombing Libya" # Abbie Hoffman, R.I.P