rmpinchback@watmum.waterloo.edu (Reid M. Pinchback) (11/05/88)
I've found a nasty bug in GSARC. Its the sort of bug you come up against accidentally when converting archives from an old archiver (PKARC) to take advantage of a (supposedly better) new archiver. Luckily I was working on a copy of the old archive. Actually, I've found something else, but the second observation is merely disappointing, not damaging. Nasty bug found as follows: - de-arc an archive. Lets say it was the archive TEST.ARC - tell GSARC to make an archive of the same name, ie: GSARC m TEST.ARC *.* - notice that (accidentally or purposely) the same archive name is to be used, and the old archive was neither renamed nor deleted. - GSARC now does NOT attempt to make an archive. It just deletes all the files in the current directory, including the old archive! ACK! - Solution: be VERY careful when converting archives, and only use the GSARC conversion option, ie: GSARC c TEST.ARC The second item is pretty simple. When using either SEA's ARC or Katz's PKARC, both of them will use the most effective compression method to suit each file being compressed. GSARC has this nice new Crushing method. In fact, GSARC thinks it is SO nice, it is the ONLY method it will use to compress a file, unless you tell it specifically to create either an ARC or PKARC compatible archive. As a result, I often end up with updated archives that are 20-25% LARGER than they were with ARC or PKARC. To avoid this you would have to add a file to an archive one at a time, experimenting with three different compression options to see which yielded better results. ICK! Oh well, so much for a nice new archive tool. Time to go play with Zoo. :-) Reid M. Pinchback CS/C&O Undergraduate University of Waterloo
spolsky-joel@CS.YALE.EDU (Joel Spolsky) (11/06/88)
In article <6627@watcgl.waterloo.edu> rmpinchback@watmum.waterloo.edu (Reid M. Pinchback) writes: >As a result, I often end up with updated >archives that are 20-25% LARGER than they were with ARC or PKARC. > > Reid M. Pinchback Compressing what kind of files? +----------------+---------------------------------------------------+ | Joel Spolsky | bitnet: spolsky@yalecs uucp: ...!yale!spolsky | | | arpa: spolsky@yale.edu voicenet: 203-436-1483 | +----------------+---------------------------------------------------+ #include <disclaimer.h>
rmpinchback@watmum.waterloo.edu (Reid M. Pinchback) (11/07/88)
In article <42285@yale-celray.yale.UUCP> spolsky-joel@CS.YALE.EDU (Joel Spolsky) writes: >In article <6627@watcgl.waterloo.edu> rmpinchback@watmum.waterloo.edu (Reid M. Pinchback) writes: >>As a result, I often end up with updated >>archives that are 20-25% LARGER than they were with ARC or PKARC. >> >> Reid M. Pinchback > >Compressing what kind of files? > >+----------------+---------------------------------------------------+ >| Joel Spolsky | bitnet: spolsky@yalecs uucp: ...!yale!spolsky | >| | arpa: spolsky@yale.edu voicenet: 203-436-1483 | >+----------------+---------------------------------------------------+ First: All kinds of files are compressed the same way. Second: The kinds of files where this seems to be a problem, appear to be text files, but I'm not yet sure exactly what kind of content causes the lousy compression. I first noticed it when trying to archive a uuencoded archive (please don't ask why, its a long story) :-) Since then, I've noticed it cropping up often when I've been re-arcing some of the various archives i've had laying around on my hard disk for eons, these archives primarily being mixed text and executable binaries. Reid M. Pinchback
miken@wybbs.UUCP (Michael Neuhaus) (11/11/88)
In article <6627@watcgl.waterloo.edu>, rmpinchback@watmum.waterloo.edu (Reid M. Pinchback) writes: > I've found a nasty bug in GSARC. Its the sort of bug you come up against > accidentally when converting archives from an old archiver (PKARC) to ... I'd like to make you aware that GSARC is now PAK, to avoid infringement of SEA's trademark on the letters ARC. Thanks for finding the bug in the Move command, which occurs when moving all of the files in a directory to an existing archive in the same directory. You are, however, incorrect as to the nature of the bug. You state: > GSARC does NOT attempt to make an archive. It just deletes all the files > in the current directory. PAK (formerly GSARC) does update the archive, but when it then follows this by deleting the files, it doesn't check that the newly-updated archive is one of those files! Future releases will correct this, of course, but in the meantime avoid this situation by keeping the destination archive in another directory when Moving entire directories. This bug won't show up if you are Moving files to a newly created archive, but it's better to be safe. It's too bad this didn't come out in the six months of beta-test. > archives are 20%-25% LARGER than they were with ARC or PKARC. I'd seriously like to see your data, since everything we've seen has usually been the reverse. It is true that on certain rare text files between 1K and 5K in final size, Crushing is marginally larger than PKARC's Crunching, but never more than 1%. In a sample of 20 text files in this size range, only 3 exhibited this characteristic: Name Original size PAK Size PKARC size difference EVAL2.DOC 2560 1567 1551 1.02% SUBMIT.DOC 8704 4714 4709 0.11% WRITE.DOC 5600 3186 3180 0.19% Incidently, the final sizes were 34177 for the PAK archive, and 34422 for the PKARC archive, for a net savings of 245 bytes. Not much, but excellent considering that PAK has the most difficulty with files in this range. On all other test archives, PAK averaged about 10% smaller than PKARC, and 15% smaller than ARC. Michael Neuhaus NoGate Consulting
wtm@neoucom.UUCP (Bill Mayhew) (11/16/88)
I don't know what GSARC uses for its coding scheme. If it uses huffman coding to compress ASCII files, the output can be bigger than the input on a short file. Huffman coding uses a variable number of bits to encode characters based upon the frequency with which characters appear. Characters that appear frequently are encoded with short bit patters, which infrequently encountered characters are longer. A table is prepended to such a file so that the decoding algorithm knows which is which. On a short file where all characters appear with about the same freuquency, huffman coding is inefficient. You are also penalized by the fact that the lookup table takes some space. Arcing a uuencoded file of a few K in length would possibly present such a situation. Unix compress, for instance, uses huffman coding. --Bill
spolsky-joel@CS.YALE.EDU (Joel Spolsky) (11/17/88)
In article <1413@neoucom.UUCP> wtm@neoucom.UUCP (Bill Mayhew) writes: | | I don't know what GSARC uses for its coding scheme. If it uses | huffman coding to compress ASCII files, the output can be bigger | than the input on a short file. Don't be absurd. Huffman encoding went out with pet rocks :-) | Huffman coding uses a variable | number of bits to encode characters based upon the frequency with | which characters appear. Characters that appear frequently are | encoded with short bit patters, which infrequently encountered | characters are longer. A table is prepended to such a file so that | the decoding algorithm knows which is which. OK, you get an A+ in CS100 :-) | On a short file where all characters appear with about the same | frequency, huffman coding is inefficient. You are also penalized | by the fact that the lookup table takes some space. Well, I guess that's why nobody uses Huffman encoding :-) | Unix compress, for instance, uses huffman coding. False. Unix compress uses Ziv-Lempel-Welch encoding, which achieves much higher compression rates than (snicker) Huffman encoding. ARC also uses ZLW. GSARC uses a modified form of ZLW with variable length codes, which makes GSARC perform better on very short files. +----------------+----------------------------------------------------------+ | Joel Spolsky | bitnet: spolsky@yalecs.bitnet uucp: ...!yale!spolsky | | | internet: spolsky@cs.yale.edu voicenet: 203-436-1483 | +----------------+----------------------------------------------------------+ #include <disclaimer.h>