randy@umn-cs.cs.umn.edu (Randy Orrison) (05/04/88)
This whole conversation has me puzzled. Witness: pkarc: If compressing a file (by any method) does not result in a saving of space, it is stored verbatim in the archive. zoo: putting a zoo arhive into another zoo archive results in "0%" compression and the file getting only marginally larger (index information?) compress: compressing a compressed file (e.g. a zoo archive) results only in wasting some time. A .Z file is not created. (my test resulted in "-29.30% compression, file not compressed") Summary: all these methods of compression DO NOT COMPRESS if the result would be larger than the original. There is NO HARM in compressing a zoo archive containing a pkarc of a binary file (except for the added index information at each step). The ONLY problem I can see in doing this is if the news software that does compression isn't smart enough to check if the result is larger than the original. If it isn't smart enough, it should be. There's no excuse for not checking that simple condition. -randy -- Randy Orrison, Control Data, Arden Hills, MN randy@ux.acss.umn.edu (Anyone got a Unix I can borrow?) {ihnp4, seismo!rutgers, sun}!umn-cs!randy The best book on programming for the layman is "Alice in Wonderland"; but that's because it's the best book on anything for the layman.
jerry@oliveb.olivetti.com (Jerry Aguirre) (05/06/88)
In article <5198@umn-cs.cs.umn.edu> randy@umn-cs.UUCP (Randy Orrison) writes: >The ONLY problem I can see in doing this is if the news software that does >compression isn't smart enough to check if the result is larger than the >original. If it isn't smart enough, it should be. There's no excuse for >not checking that simple condition. Yes, there is an excuse. The "news software" in question is "sendbatch" which is just a shell script. (No, that is not the excuse.) It winds up doing a pipe of: batch batch_file | compress | uux bla!rnews Now compress is perfectly capable of telling you whether it's output was compressed or not. But in this case by the time it tells you the input is already gone! Doing this would require writing batch and compress output to files. Then you could decide what to send based on whether compress helped or not. If you are willing to put up with the extra overhead of creating temp files it is trivial to modify the script to compress only when effective. I guess that someone who was paying lots of money for the transmission line would find this worthwhile. Actually you could avoid the extra overhead. You have batch run the compress and check the exit status. Batch already checkpoints itself so it could re-create the batch if the compress failed. Compress writes the output to the same file system as /usr/spool/uucp and executes "uux" with the "-l" option so it makes a hard link to the file instead of copying it. You can then delete the original link to the temp file. All this starts to get a little non-portable.....
kent@happym.UUCP (Kent Forschmiedt) (05/07/88)
In article <5198@umn-cs.cs.umn.edu> randy@umn-cs.UUCP (Randy Orrison) writes:
[ says that compress and its cousins know better than to mess with a
file if it won't get smaller. Suggests that news software won't
compress, or shouldn't, if stuff is already compressed, so what's
the problem? ]
News articles are usually collected into batches before
transmission.
Some sites eat long articles, so there is a practical limit of
around 50 or 60k on article size.
So, the problem is:
Since many sites transmit batches that are several times that size,
even the longest article will generally get collected into a file
with other articles. When the file is compressed before sending to
another site, a file which is already 40% compressed and uuencoded
will get enough smaller to satisfy compress, but it will not be as
small as if the big article were only uuencoded. This is true
whether the original is random binary or 6 bit text. Compressing
twice makes it worse.
--
--
Kent Forschmiedt -- kent@happym.UUCP, tikal!camco!happym!kent
Happy Man Corporation 206-282-9598
linhart@topaz.rutgers.edu (Mike Threepoint) (05/13/88)
randy@umn-cs.UUCP (Randy Orrison) writes:
-=> This whole conversation has me puzzled. Witness:
-=> pkarc: If compressing a file (by any method) does not result in
-=> a saving of space, it is stored verbatim in the archive.
-=> zoo: putting a zoo arhive into another zoo archive results in
-=> "0%" compression and the file getting only marginally larger
-=> (index information?)
-=> compress: compressing a compressed file (e.g. a zoo archive)
-=> results only in wasting some time. A .Z file is not created.
-=> (my test resulted in "-29.30% compression, file not compressed")
If compress can't make any given file smaller, it won't. Really short
files also fall into this category. PKARC and zoo won't either, and
just store it verbatim with a index header containing file name,
date/time, CRC, and other vitals. So they would both store other
archives uncompressed under most circumstances (sometimes PKARC
squeaks out a 2% Huffman on other ARC files).
Nested ARC's occur often in large systems like BBS software, usually
with a batch file to reproduce the directory structure and extract the
enclosed ARC's into the appropriate directories. I wouldn't expect
anyone to do this in zoo, since the directory structure can be stored
much more simply. (zoo is a much better choice for these systems, but
it's not the standard and PKARC compresses better.)
-=> Summary: all these methods of compression DO NOT COMPRESS if the result
-=> would be larger than the original. There is NO HARM in compressing a zoo
-=> archive containing a pkarc of a binary file (except for the added index
-=> information at each step).
Exactly. (But see above parenthetic about PKARC.)
-=> The ONLY problem I can see in doing this is if the news software that does
-=> compression isn't smart enough to check if the result is larger than the
-=> original. If it isn't smart enough, it should be. There's no excuse for
-=> not checking that simple condition.
As I understand it, compress is generally used, and compress checks
unless -f is specified. Along those lines, though, I've seen PKARC
crunch a small file at 0% savings. That's ridiculous. Just storing
it uncompressed would obviously be more efficient (and faster to
extract).
--
"...billions and billions..." | Mike Threepoint (D-ro 3)
-- not Carl Sagan | linhart@topaz.rutgers.edu
"...hundreds if not thousands..." | FidoNet 1:107/513
-- Pnews | AT&T +1 (201)878-0937
hyc@math.lsa.umich.edu (Howard Chu) (05/25/88)
Just thought I'd play around a bit and see what all this meant... The following summarizes a few minutes of messing around with uuencode, compress, and compact on a Sun 3/260. While I'm only testing a single file, I'm sure it makes a pretty convincing worst case test... For reference, compress uses a 16 bit Lempel-Ziv-Welch compression scheme, and compact uses an optimized Huffman Squeeze algorithm (which doesn't store the decoding tree in the compacted file). This can almost be directly related to the ARC program, with the exception that ARC performs run-length-encoding on input data before feeding to any of the other compression algorithms. (PKARC doesn't do this, by the way.) 582556 May 24 15:28 vmunix plain binary file 472774 May 24 15:36 vmunix.C compacted. (huffman squeeze) 661987 May 24 15:46 vmunix.C.uue compacted, then uuencoded 571778 May 24 15:46 vmunix.C.uue.Z compacted, uuencoded, compressed 365675 May 24 15:28 vmunix.Z compressed (16 bit) 358631 May 24 15:39 vmunix.Z.C compressed, then compacted 502186 May 24 16:01 vmunix.Z.C.uue compressed, compacted, uuencoded 449395 May 24 16:01 vmunix.Z.C.uue.Z " " , compressed 512047 May 24 15:30 vmunix.Z.uue uuencoded after compression 445229 May 24 15:30 vmunix.Z.uue.Z compressed, uuencoded, compressed again 815678 May 24 15:28 vmunix.uue uuencoded, no compression 462100 May 24 15:31 vmunix.uue.Z compressed after uuencoding 460239 May 24 15:43 vmunix.uue.Z.C uuencoded, compressed, then compacted A few things worth noting: - while the results aren't always dramatic, (and they certainly aren't, in this case) a Huffman Squeeze will always reduce the size of a file already compressed by some form of Lempel-Ziv compression. - compressing, then uuencoding, is obviously better than just uuencoding. - since Lempel-Ziv compression typically yields 50% compression, and uuencoding gives about 33% expansion, the result will still be smaller than the original file. - if your news software also tries to perform compression, it's still a good idea to compress, then uuencode. Compare: 445229 May 24 15:30 vmunix.Z.uue.Z 462100 May 24 15:31 vmunix.uue.Z - there is no vmunix.Z.Z or vmunix.C.Z in the above list. Immediately recompressing a compressed file is always a bad idea. Your mileage will vary.... -- / /_ , ,_. Howard Chu / /(_/(__ University of Michigan / Computing Center College of LS&A ' Unix Project Information Systems