[news.misc] Expansion after compression for MS-DOS arc files

cmaag@csd4.milw.wisc.edu (Christopher N Maag) (11/02/87)

In article <2218@mcdchg.UUCP> heiby@mcdchg.UUCP (Ron Heiby) writes:
[...]
>I took the uuencoded .ARC file and ran it through "compress" on my
>UNIX system.  The resulting file was 153,259 bytes long.  Then, I
>took the originally typed in document (the product of running the
>file through uudecode and ARC) and ran *it* through "compress".
>The resulting file was 96,373 bytes long.  So, for those news links
>where the administrators care about costs, we went through all kinds
>of extra work to COST THEM AN EXTRA 55K BYTES!
>
>Come on, people.  If it's ASCII text, like a document or source code,
>POST IT IN CLEAR TEXT!
>
>Thank you very much.
>-- 
>Ron Heiby, heiby@mcdchg.UUCP	Moderator: comp.newprod & comp.unix

I have seen people complain (with good reason) about this problem
before.  I would like to suggest that "whoever is in charge" add a
line or two to the newuser announcements that would explain the problem
described above.  For instance: 

"If you submit a file to one of the newsgroups and you wish to uuencode
 the file, _do not_ perform any type of file compression to this
 file before uuencoding it.  This means don't arc the file, (insert
 other popular compression schemes for other computer systems here).
 If you do compress the file, it will actually get _larger_ when it is
 sent than it was originally.  This costs us all money."

Is this group the right place to suggest something like this?  If not,
please direct me to the correct one.

Chris.
=======================================================================
   Path: uwmcsd1!csd4.milw.wisc.edu!cmaag
   From: cmaag@csd4.milw.wisc.edu 
 bitnet: cmaag%csd4.milw.wisc.edu@wiscvm.bitnet
{seismo|nike|ucbvax|harvard|rutgers!ihnp4}!uwvax!uwmcsd1!uwmcsd4!cmaag 
=======================================================================

heiby@mcdchg.UUCP (Ron Heiby) (11/03/87)

Christopher N Maag (cmaag@csd4.milw.wisc.edu.UUCP) writes:
> "If you submit a file to one of the newsgroups and you wish to uuencode
>  the file, _do not_ perform any type of file compression to this
>  file before uuencoding it.  This means don't arc the file, (insert
>  other popular compression schemes for other computer systems here).
>  If you do compress the file, it will actually get _larger_ when it is
>  sent than it was originally.  This costs us all money."

I think Chris is going further than I suggested.  I have no evidence that
the problem is compressing before uuencoding, and I suspect that it has
little to do with it.  I was talking about the difference between sending
clear text and sending compressed/uuencoded text.  I think it would be
interesting to check on what Chris is suggesting and get some numbers on
the difference between sending uuencoded binary files vs uuencoded compressed
binary files.  I suspect that a uuencoded compressed binary file would
actually be smaller, but the further impact of the news software's compress
on the resulting files is unknown.
-- 
Ron Heiby, heiby@mcdchg.UUCP	Moderator: comp.newprod & comp.unix
"I know engineers.  They love to change things."  McCoy

usenet@delrio.cc.umich.edu (Usenet News) (11/12/87)

In article <2255@mcdchg.UUCP> heiby@mcdchg.UUCP (Ron Heiby) writes:
%Christopher N Maag (cmaag@csd4.milw.wisc.edu.UUCP) writes:
%> "If you submit a file to one of the newsgroups and you wish to uuencode
%>  the file, _do not_ perform any type of file compression to this
%>  file before uuencoding it.  This means don't arc the file, (insert
%>  other popular compression schemes for other computer systems here).
%>  If you do compress the file, it will actually get _larger_ when it is
%>  sent than it was originally.  This costs us all money."
%
%I think Chris is going further than I suggested.  I have no evidence that
%the problem is compressing before uuencoding, and I suspect that it has
%little to do with it.  I was talking about the difference between sending
%clear text and sending compressed/uuencoded text.  I think it would be
%interesting to check on what Chris is suggesting and get some numbers on
%the difference between sending uuencoded binary files vs uuencoded compressed
%binary files.  I suspect that a uuencoded compressed binary file would
%actually be smaller, but the further impact of the news software's compress
%on the resulting files is unknown.
%-- 
%Ron Heiby, heiby@mcdchg.UUCP	Moderator: comp.newprod & comp.unix
%"I know engineers.  They love to change things."  McCoy

In fact the results *are* known... This comes up every couple months, and the
plain fact is that running the compress algorithm twice on a piece of data
*WILL* generate a larger file. Generally 30% larger. This will happen with
both ARC files and files compressed by compress (4.0). They don't use identical
algorithms, but both use modified Lempel-Ziv encoding schemes, and both react
in the same way to being 'run over themselves.' Note - this is on the binary
data itself. If you uuencode a compressed file, you will probably win in the
long run. Figure about 40% compression, and 25% expansion, and *then* on
some sites you'll get more compression during actual transit. Since the Lempel-Ziv
scheme works so well on strings of printable text, in fact, it might be the
optimal solution to post binaries as uuencoded compressed data...