kim@amdahl.uts.amdahl.com (Kim DeVaughn) (02/03/88)
In a recent article, whose number I could care less about, Mike Shawaluk writes: > Just a note in regards to the differences between ARC and ZOO compression, > as well as the new PKAX ARC extractor for the Amiga... First of all, the reason > that ZOO does better on some files, while it does MUCH poorer on others (i.e., > no compression at all!) is because ZOO only uses Ziv-Lempel 13-bit compression, > while the ARC programs will dynamically choose between 8 different compression > algorythms. Well, I should be fair, ZOO *does* have two alternatives; 13 bit > "crunch", or nothing at all! For some reason, IFF pictures & sound files seem > to ARC more efficiently if they're Squeezed (Huffman compression), which is why > ARC wins this one over ZOO. I don't really want to perpetuate a "holy war" on which is better (ARC or ZOO), but having seen several posting on which archiver produces the smallest files, I decided to perform a (limited) test. As we all know, "there are lies, damn lies, and benchmarks", and I'm sure that one can come up with any number of file-sets that will produce different results than these, but for what it's worth ... For my 1st test, I picked a medium-to-large set of files that seemed to me to be representative of a "typical" file-set ... the recently posted MRBackup program (source, binary, and docs [with duplicated files removed]). Size of the 42 native files, which included an Announcement file, and an ExecuteMe (since ARC can't handle filenames longer than 12 chars) were 257,059 bytes. ARC (v0.23) produced a .arc file of 148,101 bytes, for a total reduction in size of 42.4%. ZOO (v1.71) produced a .zoo file of 144,984 bytes, for a total reduction in size of 43.6%. Since there is no PKARC available for the Amiga (yet), I can't say what it would do in this example. So, in *THIS* example, we have ZOO beating ARC by 3,117 bytes, or 1.2%. For the 2nd test, I used the S370 Emulator package, recently posted to the comp.binaries.ibm.pc group (in PKARC format). This file-set has 3 PClone binaries (27K-44K), a 35K document, and a bunch of small files (0-6K). Coincidentally, there were 42 files in this package also, and the size of the native files were 236,395 bytes. The files were extracted by the PKAX extractor for the Amiga. ARC (v0.23) produced a .arc file of 143,806 bytes, for a total reduction in size of 39.2%. ZOO (v1.71) produced a .zoo file of 137,067 bytes, for a total reduction in size of 42.0%. PKARC (v??) produced a .arc file of 133,463 bytes, for a total reduction in size of 43.5%. Again, in *THIS* example, ZOO beat out ARC; this time by 2.9%. As expected, PKARC did the best, beating ARC by 4.4%, and ZOO by 1.5%. Are these results "typical"? I dunno ... they sure aren't "special" in any way, and were arbitrarily selected. They may be slightly atypical of some Amiga file sets, in that there aren't any IFF files, etc. though. In any case, these limited experiments certainly don't support the hypothesis that ZOO doesn't compress as well as ARC! In fact, ZOO averages about 2% *better* than ARC, which translates into turning an 880K floppy into an 898K floppy. On the other hand, since the results are quite close, one should consider other factors as well, like speed, size of executable, ease of use, extra features, robustness, and availability. I didn't keep a record of the timings on the 1st experiment (with the MRBackup files), but I did for the 2nd one (the S370 emulator files). For these tests, *eveything* was in vd0: (love my ASDG 8MI board with 6 Meg's), and my 2000 has a 68010 in it ... your mileage may vary. All times are in seconds (+/- 0.5 sec): Function ARC ZOO PKAX Notes -------- --- --- ---- ------------------------------------ add 230 268 -- - create archive (no PXARC available) list 10/7 9 * - 10 for v option, 7 for l option test 101 57 37 - PKAX time was same on both ARC'd and PKARC'd archive extract 127 87 48 - PKAX time was same on both ARC'd and PKARC'd archive * - PKAX consistently Guru'd the machine at the point where it should have listed a 0 byte length file (but vd0: hung in there ... thanks, Perry). So, ARC wins the creation test over ZOO by 14.2%, while ZOO beats ARC by 31.5% in extraction, and by 43.6% in testing. PKAX is obviously the speediest of the three, but since it is an incomplete implementation (like "unarc", etc.), a direct comparison isn't really fair. Hopefully, PKARC for the Amiga will retain the speed of PKAX. [ BTW ... Mike Shawaluk, if you are reading this, would you please pass the "listing a 0-length file" bug I ran into, along to Phil Katz? Thanks! ] Program size? For the versions I use, I get: ---- 50328 Apr 15 1987 arc ---- 18652 Dec 8 00:00 pkax ---- 35668 Dec 28 15:04 zoo ZOO wins this one too (over ARC), being 29.1% smaller. Again, PKAX is only a partial implementation, so a direct comparison is pretty meaningless (but I like the trend). Ease of use? Well, this is a pretty subjective thing, so you'll have to decide the winner here. Let me just point out that the command syntax for all three is just about the same: arc x archfile zoo x archfile pkax -x archfile all do the same thing on an appropriate "archfile". This particular command is the one most people use the most, especially those just starting out. So, I don't see why some sysop's feel having more than one format of archive file online will confuse anyone (if the file ends in .zoo, use the zoo program; if it ends in .arc, use arc/pkarc). It's true that ZOO's syntax *looks* more complicated than it really is, since it has several more options, but the basic ones are the same as ARC; also ZOO has the more mnemonic "novice syntax" (like "-add" or "-extract") if you want. A gripe here on PKAX ... I think it's unfortunate that Phil Katz chose to change the traditional meaning of "l" or "-l" from "list" to one that blasts his shareware plea across the tube, but I digress ... Extra features? Again, somewhat subjective, but it is a *fact* that of the three, only ZOO will handle filenames longer than 12 characters without having to use some auxiliary file renaming/executeme kludge ... now, today. To my mind, explaining to a novice how to get the file names back to what they ought to be is a harder thing to do (since there are alot of different schemes used to fit things into ARC), than explain ZOO's syntax (see above). Also, ZOO archives/rebuilds a directory tree if you ask it to ... now, today. And though I don't use them, I think ZOO supports "filenotes"; or, if not filenotes per se, it does allow you to associate a comment with a file inside the .zoo archive. The winner of the "features criterion" is overwhelmingly clear today. ZOO. If, however, Phil Katz adds long-name and subdirectory support to PKARC for the Amiga, *and can keep it backward compatible*, it would make it much closer to a tie in this category. Same thing goes for the ARC program. Robustness? Neither ARC (v0.23) nor ZOO (v1.71) have ever outright crashed or munged a floppy on me (before I got expansion ram) that I can recall. It does seem that my 1000 system is prone to crash after having used ARC, if memory got tight (but not exhausted) during the unarc'ing process. This is after having gone to do things totally unrelated to arc'ing. I suspect that ARC is not handling memory fragmentation in exactly the right way in all circumstances. I was never able to pin this down exactly, but nothing similar has happened on the 2000, with all that ram. ARC seems to use alot more ram to do it's job than ZOO does (perhaps that's why it is faster creating an archive file ?) I don't have any numbers, but the little memory gauges in the menubar clock tell the tail. ARC also creates some pretty good sized temp file(s) whilst doing it's compression analysis. I don't know how ZOO does it's job (temp files, etc). Until Wayne Davison posted his fix to VT100's over-zealous autochopper [yes, Tony *did* put a variant them in vt100 v2.8], it was a rare ARC file that I could download and unarc using Xmodem without getting an error on the last file in the .arc ... usually the last byte of the last file would get chopped off (not too hard to fixup, but a pain to do, nonetheless). This problem has never occurred with a .zoo file, from which I conclude that the ZOO file's format is more immune to damage from the existing tools we have, and therefore more "robust". As an aside, with the new VT100 autochop code, a small percentage of .arc files still give an error message immediately *after* testing/extracting/listing the last file in the .arc, but it (the last file) gets extracted without any damage. Actually, ARC blasts out 20 error messages (from the "test" function) when this does happen. Maybe Wayne will improve the autochopper one more notch? Finally, I've noticed that when I ARC a group of files that include some .arc files (as a substitute for subdirectory handling), those embedded .arc files get squeezed a little bit smaller by ARC. I'm left wondering why (since further compression was possible with the existing algorithms in ARC), these .arc files weren't fully compressed in the first place? Am I missing something here ...? The only real problem I ever had with ZOO, was that it didn't correctly restore the date for files inside a subdirectory (fixed in v1.71). I'd guess that PKARC will be limited to the same file *format* that ARC uses, thus won't be any more robust than is ARC on that score. It is (presumably) alot of new code, and should therefore be fairly "clean", though there will undoubtedly be some bugs in early releases (I know of one in PKAX, already :-)). I give the nod to ZOO over ARC on this one, too. Lastly (if anyone is still with me), availability. ARC, of course, can be found *everywhere*. And on many kinds of systems ... Amiga, MS-DOS, UNIX(R) SysV and BSD, ULTRIX, probably the ST, you name it (except for a Mac, so-far-as-I-know)! ZOO I first found on the Lattice BBS, whilst looking for fixes to 3.03 (or was it 3.02 ?). Then I saw a newer version on GEnie. Now I see it on most BBS's I log on to. And the latest version was just recently posted in the binaries (v1.71). And it is on the Fish Disks, and the FAUG disks (just about everywhere except Compu$erve ... dunno about BIX). It too, is on many machines (though not as many as ARC). Amiga, MS-DOS, UNIX(R) (not sure for both SysV and BSD, though). Dunno about others. PKARC? For the Amiga, RSN (hopefully). It's on MS-DOS machines, and UNIX(R) (though I have yet to snag a copy that works on SysV). Dunno about any other systems (Mike?), but it can be found on all the PCish BBS's and commercial services I've seen. > he [Phil Katz] is currently preparing to port the other > half of his PC offering to the Amiga, depending on the fate of PKAX as regards > to "shareware" receipts (actually, in his case, it's more like User Supported > Software than "shareware", since he had to pay someone to complete the Amiga > port, and wants to recoup his investment like any other businessperson). The $25/$50 that he is asking seems a little on the high side to me, especially for a partial product (only PKAX at present), and without a firm committment to handle long filenames and subdirectories in a nearterm future release. Might I suggest to him (through you) that $10/$20 or so, is more appropriate for the existing level of product (IMHO, of course)? So, what's the bottom line? Well ... as our competitor's commercials during football games say, "You make the call." ** For *myself*, I am bloody well tired of going through a bunch of contortions in order to deal with filenames longer than 12 characters [ ARC is like SysV, where ZOO is more like BSD :-) ]. And I'm nearly as tired of having to do a alot of extra work to archive subdrectories. Especially when my experiments indicate no clear advantage in "staying with the standard". So, where appropriate, I'll be making postings/submissions using ZOO (you *did* save your copy from comp.binaries,amiga, didn't you?) And, since ARC is still "The Standard" (like it or not), where appropriate, I'll continue to make postings using ARC. But, I will NOT use the renaming/executeme klugdes anymore. When PKARC shows up, I will re-evaluate ... but only if it supports long names and subdirectories, or offers a *significant* reduction in compression size or execution time (gotta keep pushing on compression algorithm technology, don't ya know). Now a question. There have seen some recent postings by Bryan Ford and others, proposing an IFF File Archive Format (and presumably archiving programs that would follow that standard). I haven't followed the discussion too closely, but I am curious about the rationale. Why is it that we need another archival format? What will it buy us that we don't already have? This isn't a flame, just a request for information, as the benefit(s) are not obvious to me. Whew ... this started out to be a simple comparison of compression efficiency! Somewhere along the way, my VERBOSE_MODE got #define'd! Those of you who stayed with me to the bitter end deserve a cookie, or two, so ... Earlier today, I sent off the latest offering from the AmigaDOS Replacement Project (ARP) to the moderators. A ZOO'd version of ARP v1.04 should be coming to a tube near you soon. Also mailed them the floppy gulping audio hack "muncho". ARC'd. > Well, that's enough for now from me. > > - Mike Shawaluk Me too! /kim ** ["You make the call." is probably copyrighted by IBM; IBM *is* a trademark of International Business Machines, Inc.] -- UUCP: kim@amdahl.amdahl.com or: {sun,decwrl,hplabs,pyramid,ihnp4,uunet,oliveb,cbosgd,ames}!amdahl!kim DDD: 408-746-8462 USPS: Amdahl Corp. M/S 249, 1250 E. Arques Av, Sunnyvale, CA 94086 CIS: 76535,25