[comp.sys.ibm.pc] PDtar/Compress vs. ZOO & ARC

iverson@cory.Berkeley.EDU (Tim Iverson) (12/02/87)
> ARC (and derivatives) has been around for quite some time, and has developed
>    into the MS-DOS de-facto standard for archiving and info-exchange.

PKARC is fast enough to use (unlike SEA ARC), but (as you mention below) it
doesn't handle subtrees.  It also doesn't allow recovery of bad archives.

[paraphrasing]
> ZOO stores subtrees,
>   two disadvantages: (1) not widely accepted (2) needs external command
>   to create structure (i.e. under Unix, it's the Find command.   MS-DOS
>   has no Find - you need to specify the structure manually.

ZOO also allows recovery of bad files in a munged archive, and has been
ported just about everywhere.

> PDTAR offers a number of significant advantages over both ZOO and ARC:
>  - It is the de-facto standard in the Unix world. Info-exchange with
>    Unix machines is much easier with TAR.

It is only mostly a standard.  I hear cpio/compress are used as well.

>  - It creates the structure it needs
>  - It is fast; on the small sample that I did -- faster than ARC or ZOO

Yes, but not faster than PKARC on MS-DOS.

>  - It can compress.

No.  It does an exec to compress, and so needs a compress command on
the machine it runs on.  The compress format is not specified by PDTAR,
and thus is machine dependent.  The obvious intent is that compress 4.0
is used, but it is not required.  Also, compress 4.0 requires a large
amount of memory in order to use 16 bit table entries, perhaps too much
memory to be used as a standard.

>  - The resulting archive is small; on the sample above, 
>    smaller than the .arc or .zoo files

This is only true under one special case: the files contain mostly (say 85%
by size) similar information (e.g. C source or English text).  Compress is
then able to build a better table, as well as use only a single table.  If
the archive contained 1/2 source & 1/2 executable, the resultant file would
be significantly larger than ARC or ZOO archives due to the inefficient
table.  I tested this with moria source & executable, and the compressed
tar file was 20% larger than ZOO and 25% larger than ARC.

This points out a way to get a rather large improvement in both ARC and ZOO
archives (Rahul Dhesi, are you out there?):  mainly, that similar files
could share tables.  There is the gain in compaction, but there is also a
loss in that if a similar file is added to an existing archive, all of the
other similar files would have to be repacked (similar might mean files with
the same extension).  This could be implemented as an option like -O, for
optimize entire archive.

Another failing of PDTAR is that it is very slow listing the contents of a
compressed tar file, and I'm not sure how good it is on adding and deleting
files from a compressed tar file (can it even do it?).  Also, there are
absolutely no consistency checks, much less any provision for reconstructing
a damaged archive.

This all sounds like PDTAR isn't a very good program.  Wrong, it seems to be
very good.  It's just that this isn't a particularily good application for
it.  If you want to create tar files, it works fine, but it was never
intended to perform all of the extra features that ARC and ZOO both provide.

Currently, I use ZOO for large collections of files and whole subtrees,
and PKARC for small collections of files, since it's faster.  If ZOO's
speed was better, I would use it for everything.  Since I don't need
to make tar files, I will rarely use PDTAR, if ever.


- Tim Iverson
  iverson@cory.Berkeley.EDU
  ucbvax!cory!iverson