[comp.sys.amiga] ARC vs. ZOO vs. PKARC - an analysis long

kim@amdahl.uts.amdahl.com (Kim DeVaughn) (02/03/88)

In a recent article, whose number I could care less about, Mike Shawaluk writes:
> Just a note in regards to the differences between ARC and ZOO compression,
> as well as the new PKAX ARC extractor for the Amiga...  First of all, the reason
> that ZOO does better on some files, while it does MUCH poorer on others (i.e.,
> no compression at all!) is because ZOO only uses Ziv-Lempel 13-bit compression,
> while the ARC programs will dynamically choose between 8 different compression
> algorythms.  Well, I should be fair, ZOO *does* have two alternatives; 13 bit
> "crunch", or nothing at all!  For some reason, IFF pictures & sound files seem
> to ARC more efficiently if they're Squeezed (Huffman compression), which is why
> ARC wins this one over ZOO.

I don't really want to perpetuate a "holy war" on which is better (ARC or ZOO),
but having seen several posting on which archiver produces the smallest files,
I decided to perform a (limited) test.

As we all know, "there are lies, damn lies, and benchmarks", and I'm sure that
one can come up with any number of file-sets that will produce different results
than these, but for what it's worth ...


For my 1st test, I picked a medium-to-large set of files that seemed to me to
be representative of a "typical" file-set ... the recently posted MRBackup
program (source, binary, and docs [with duplicated files removed]).

Size of the 42 native files, which included an Announcement file, and an
ExecuteMe (since ARC can't handle filenames longer than 12 chars) were
257,059 bytes.

ARC (v0.23) produced a .arc file of 148,101 bytes, for a total reduction in
size of 42.4%.

ZOO (v1.71) produced a .zoo file of 144,984 bytes, for a total reduction in
size of 43.6%.

Since there is no PKARC available for the Amiga (yet), I can't say what it
would do in this example.

So, in *THIS* example, we have ZOO beating ARC by 3,117 bytes, or 1.2%.



For the 2nd test, I used the S370 Emulator package, recently posted to the
comp.binaries.ibm.pc group (in PKARC format).  This file-set has 3 PClone
binaries (27K-44K), a 35K document, and a bunch of small files (0-6K).

Coincidentally, there were 42 files in this package also, and the size of
the native files were 236,395 bytes.  The files were extracted by the PKAX
extractor for the Amiga.

ARC (v0.23) produced a .arc file of 143,806 bytes, for a total reduction in
size of 39.2%.

ZOO (v1.71) produced a .zoo file of 137,067 bytes, for a total reduction in
size of 42.0%.

PKARC (v??) produced a .arc file of 133,463 bytes, for a total reduction in
size of 43.5%.

Again, in *THIS* example, ZOO beat out ARC; this time by 2.9%.  As expected,
PKARC did the best, beating ARC by 4.4%, and ZOO by 1.5%.



Are these results "typical"?  I dunno ... they sure aren't "special" in any
way, and were arbitrarily selected.  They may be slightly atypical of some
Amiga file sets, in that there aren't any IFF files, etc. though.

In any case, these limited experiments certainly don't support the hypothesis
that ZOO doesn't compress as well as ARC!  In fact, ZOO averages about 2%
*better* than ARC, which translates into turning an 880K floppy into an 898K
floppy.

On the other hand, since the results are quite close, one should consider
other factors as well, like speed, size of executable, ease of use, extra
features, robustness, and availability.


I didn't keep a record of the timings on the 1st experiment (with the MRBackup
files), but I did for the 2nd one (the S370 emulator files).  For these tests,
*eveything* was in vd0: (love my ASDG 8MI board with 6 Meg's), and my 2000 has
a 68010 in it ... your mileage may vary.


All times are in seconds (+/- 0.5 sec):

Function     ARC     ZOO     PKAX                    Notes
--------     ---     ---     ----       ------------------------------------
add          230     268      --   - create archive  (no PXARC available)

list        10/7      9       *    - 10 for v option, 7 for l option

test         101      57      37   - PKAX time was same on both ARC'd and
                                      PKARC'd archive
extract      127      87      48   - PKAX time was same on both ARC'd and
                                      PKARC'd archive

  * - PKAX consistently Guru'd the machine at the point where it should have
      listed a 0 byte length file (but vd0: hung in there ... thanks, Perry).


So, ARC wins the creation test over ZOO by 14.2%, while ZOO beats ARC by 31.5%
in extraction, and by 43.6% in testing.  PKAX is obviously the speediest of
the three, but since it is an incomplete implementation (like "unarc", etc.),
a direct comparison isn't really fair.  Hopefully, PKARC for the Amiga will
retain the speed of PKAX.

[ BTW ... Mike Shawaluk, if you are reading this, would you please pass the
  "listing a 0-length file" bug I ran into, along to Phil Katz?  Thanks!  ]



Program size?  For the versions I use, I get:

----   50328  Apr 15 1987    arc
----   18652  Dec  8 00:00   pkax
----   35668  Dec 28 15:04   zoo

ZOO wins this one too (over ARC), being 29.1% smaller.  Again, PKAX is only a
partial implementation, so a direct comparison is pretty meaningless (but I
like the trend).



Ease of use?  Well, this is a pretty subjective thing, so you'll have to
decide the winner here.  Let me just point out that the command syntax for
all three is just about the same:

        arc  x archfile
        zoo  x archfile
       pkax -x archfile

all do the same thing on an appropriate "archfile".  This particular command
is the one most people use the most, especially those just starting out.  So,
I don't see why some sysop's feel having more than one format of archive file
online will confuse anyone (if the file ends in .zoo, use the zoo program; if
it ends in .arc, use arc/pkarc).

It's true that ZOO's syntax *looks* more complicated than it really is, since
it has several more options, but the basic ones are the same as ARC; also
ZOO has the more mnemonic "novice syntax" (like "-add" or "-extract") if you
want.

A gripe here on PKAX ... I think it's unfortunate that Phil Katz chose to
change the traditional meaning of "l" or "-l" from "list" to one that blasts
his shareware plea across the tube, but I digress ...



Extra features?  Again, somewhat subjective, but it is a *fact* that of the
three, only ZOO will handle filenames longer than 12 characters without having
to use some auxiliary file renaming/executeme kludge ... now, today.  To my
mind, explaining to a novice how to get the file names back to what they ought
to be is a harder thing to do (since there are alot of different schemes used
to fit things into ARC), than explain ZOO's syntax (see above).

Also, ZOO archives/rebuilds a directory tree if you ask it to ... now, today.

And though I don't use them, I think ZOO supports "filenotes"; or, if not
filenotes per se, it does allow you to associate a comment with a file inside
the .zoo archive.

The winner of the "features criterion" is overwhelmingly clear today.  ZOO.
If, however, Phil Katz adds long-name and subdirectory support to PKARC for
the Amiga, *and can keep it backward compatible*, it would make it much closer
to a tie in this category.  Same thing goes for the ARC program.



Robustness?  Neither ARC (v0.23) nor ZOO (v1.71) have ever outright crashed
or munged a floppy on me (before I got expansion ram) that I can recall.  It
does seem that my 1000 system is prone to crash after having used ARC, if
memory got tight (but not exhausted) during the unarc'ing process.  This is
after having gone to do things totally unrelated to arc'ing.  I suspect that
ARC is not handling memory fragmentation in exactly the right way in all
circumstances.  I was never able to pin this down exactly, but nothing similar
has happened on the 2000, with all that ram.

ARC seems to use alot more ram to do it's job than ZOO does (perhaps that's
why it is faster creating an archive file ?)  I don't have any numbers, but
the little memory gauges in the menubar clock tell the tail.  ARC also
creates some pretty good sized temp file(s) whilst doing it's compression
analysis.  I don't know how ZOO does it's job (temp files, etc).

Until Wayne Davison posted his fix to VT100's over-zealous autochopper [yes,
Tony *did* put a variant them in vt100 v2.8], it was a rare ARC file that I
could download and unarc using Xmodem without getting an error on the last
file in the .arc ... usually the last byte of the last file would get chopped
off (not too hard to fixup, but a pain to do, nonetheless).  This problem has
never occurred with a .zoo file, from which I conclude that the ZOO file's
format is more immune to damage from the existing tools we have, and therefore
more "robust".

As an aside, with the new VT100 autochop code, a small percentage of .arc files
still give an error message immediately *after* testing/extracting/listing
the last file in the .arc, but it (the last file) gets extracted without any
damage.  Actually, ARC blasts out 20 error messages (from the "test" function)
when this does happen.  Maybe Wayne will improve the autochopper one more notch?

Finally, I've noticed that when I ARC a group of files that include some .arc
files (as a substitute for subdirectory handling), those embedded .arc files
get squeezed a little bit smaller by ARC.  I'm left wondering why (since further
compression was possible with the existing algorithms in ARC), these .arc files
weren't fully compressed in the first place?  Am I missing something here ...?

The only real problem I ever had with ZOO, was that it didn't correctly restore
the date for files inside a subdirectory (fixed in v1.71).

I'd guess that PKARC will be limited to the same file *format* that ARC uses,
thus won't be any more robust than is ARC on that score.  It is (presumably)
alot of new code, and should therefore be fairly "clean", though there will
undoubtedly be some bugs in early releases (I know of one in PKAX, already :-)).

I give the nod to ZOO over ARC on this one, too.



Lastly (if anyone is still with me), availability.  ARC, of course, can be found
*everywhere*.  And on many kinds of systems ... Amiga, MS-DOS, UNIX(R) SysV and
BSD, ULTRIX, probably the ST, you name it (except for a Mac, so-far-as-I-know)!

ZOO I first found on the Lattice BBS, whilst looking for fixes to 3.03 (or was
it 3.02 ?).  Then I saw a newer version on GEnie.  Now I see it on most BBS's
I log on to.  And the latest version was just recently posted in the binaries
(v1.71).  And it is on the Fish Disks, and the FAUG disks (just about everywhere
except Compu$erve ... dunno about BIX).

It too, is on many machines (though not as many as ARC).  Amiga, MS-DOS, UNIX(R)
(not sure for both SysV and BSD, though).  Dunno about others.

PKARC?  For the Amiga, RSN (hopefully).  It's on MS-DOS machines, and UNIX(R)
(though I have yet to snag a copy that works on SysV).  Dunno about any other
systems (Mike?), but it can be found on all the PCish BBS's and commercial
services I've seen.

> he [Phil Katz] is currently preparing to port the other
> half of his PC offering to the Amiga, depending on the fate of PKAX as regards
> to "shareware" receipts (actually, in his case, it's more like User Supported
> Software than "shareware", since he had to pay someone to complete the Amiga
> port, and wants to recoup his investment like any other businessperson).

The $25/$50 that he is asking seems a little on the high side to me, especially
for a partial product (only PKAX at present), and without a firm committment to
handle long filenames and subdirectories in a nearterm future release.  Might I
suggest to him (through you) that $10/$20 or so, is more appropriate for the
existing level of product (IMHO, of course)?



So, what's the bottom line?  Well ... as our competitor's commercials during
football games say, "You make the call." **

For *myself*, I am bloody well tired of going through a bunch of contortions in
order to deal with filenames longer than 12 characters [ ARC is like SysV, where
ZOO is more like BSD :-) ].  And I'm nearly as tired of having to do a alot of
extra work to archive subdrectories.  Especially when my experiments indicate
no clear advantage in "staying with the standard".

So, where appropriate, I'll be making postings/submissions using ZOO (you *did*
save your copy from comp.binaries,amiga, didn't you?)  And, since ARC is still
"The Standard" (like it or not), where appropriate, I'll continue to make
postings using ARC.  But, I will NOT use the renaming/executeme klugdes anymore.

When PKARC shows up, I will re-evaluate ... but only if it supports long names
and subdirectories, or offers a *significant* reduction in compression size
or execution time (gotta keep pushing on compression algorithm technology, don't
ya know).



Now a question.  There have seen some recent postings by Bryan Ford and others,
proposing an IFF File Archive Format (and presumably archiving programs that
would follow that standard).  I haven't followed the discussion too closely, but
I am curious about the rationale.  Why is it that we need another archival
format?  What will it buy us that we don't already have?  This isn't a flame,
just a request for information, as the benefit(s) are not obvious to me.



Whew ... this started out to be a simple comparison of compression efficiency!
Somewhere along the way, my VERBOSE_MODE got #define'd!  Those of you who stayed
with me to the bitter end deserve a cookie, or two, so ...

Earlier today, I sent off the latest offering from the AmigaDOS Replacement
Project (ARP) to the moderators.  A ZOO'd version of ARP v1.04 should be coming
to a tube near you soon.

Also mailed them the floppy gulping audio hack "muncho".  ARC'd.


> Well, that's enough for now from me.
>
>   - Mike Shawaluk

Me too!

/kim


** ["You make the call." is probably copyrighted by IBM; IBM *is* a trademark of
    International Business Machines, Inc.]

-- 
UUCP:  kim@amdahl.amdahl.com
  or:  {sun,decwrl,hplabs,pyramid,ihnp4,uunet,oliveb,cbosgd,ames}!amdahl!kim
DDD:   408-746-8462
USPS:  Amdahl Corp.  M/S 249,  1250 E. Arques Av,  Sunnyvale, CA 94086
CIS:   76535,25