[comp.binaries.ibm.pc.d] which archiver/compresser/encoder/decoder to use?

root@cca.ucsf.edu (Systems Staff) (03/01/90)

Even before we consider the detailed capabilities of any candidate
programs for these purposes we should screen out any that

   A1. Use proprietary algorithms or code
       Such algorithms are subject to the control of parties
       not primarily concerned with the purposes of this group.
       They may be changed at any time by their proprietors
       (e.g. several incompatible versions of the ARC family exist).
       In addition, such algorithms may lead to legal problems
       as we all know.

   A2. Are not available free and in source form.
       Source is essential for the resolution of compatibility
       problems as well as for porting to new OS's and machine
       architectures.

   A3. Are dependent on a particular OS or CPU architecture
       There is just no question of the need to run on Unix and
       MSDOS systems and I believe the need to run on VMS and
       (at least) the popular 68k systems has also been demonstrated.

Given that these requirements are met by a program set (which may
contain one or more programs) then it is reasonable to examine
its technical merit. Let's start with a check list of such items as

   B1. The program set should be able to produce files which can move
       transparently through all the major networks. It should also
       be able to produce files which take advantage of transparent
       connections and media.

       For example, files to be stored locally (e.g. for backups) or
       transmitted via a binary-capable protocol need not have the
       transparency restrictions mentioned below. The data integrity
       checking should be included in this format.

       The program set should include a capability for converting
       between these two forms without otherwise transforming the
       data.

   B2. The transparency form should be pure ASCII text files that can
       move transparently through internetworked connections that
       (at the least) require translations between ASCII and EBCDIC)
       even though the base data may be arbitrary collections of bytes.

       This format should be insensitive to such common transformations
       as trailing blank deletion, tab to space (or the inverse)
       conversions, varying newline (<CR>, <NL>, >CR><NL>) conventions
       etc.

       No extraneous material (e.g. rounding to an even number of bytes)
       should be added to the file by the complete process unless this
       is a part of the standard of the destination OS.

       The capability of splitting a file into pieces to meet
       limitations on message size should be included. Standardized
       starting and ending labels with part numbers and any other
       desired identification allows automatic sequencing of parts,
       skipping of extraneous headers and trailers etc.

   B3. Both formats should contain adequate checking information

       Each format should contain adequate checking to assure the
       recipient that the data has not been damaged in transit.
       This applies to both the binary and transparency formats.

       Each format should be designed to localize damage, i.e. only
       the actually damaged area should be unrecoverable and the
       area of damage should be easily determined.

       At least one version of uudecode includes a sample character
       set in its header which includes all the characters required
       for transparent data transmission. This provides a simple
       check on transparency at the character level and a method
       for recovery of some translation errors.

       An overall file check should include a CRC and a length check.
       Partial extraction should be supported for data outside the area
       where the damage occurred.

   B4. Data compression should be at a level competitive with other
       available programs both in speed and degree of compression.

       Unfortunately, the nature of the situation is that these are
       conflicting requirements with disagreement on their priority.

       If there are available methods whose balance is sufficiently
       different to justify it, two (or more?) methods may be worth
       inclusion with selection by user option at compression time.

       No compression should be a supported option.

   B5. The ability to handle file system subtrees automatically is
       required as is selective retrieval of elements from such a
       subtree.

   B6. File characteristics (file name, permissions, etc?) as well
       as file contents should be specified in the archive.

   B7. Multi-volume archives should be allowed. Individual files
       should be able to span volumes. Archives should be able to
       take the form of either volumes or files.

   B8. It should be possible to perform updates (addition, deletion,
       replacement) of individual files in an archive and generate
       a new archive without performing a complete extraction of the
       existing archive. This has substantial implications for the
       compression schemes used.

   B9. Absolute path names (if allowed at all) should be suppressible
       and suppressed by default, i.e. you would have to explicitly
       exercise an option to allow them at de-archiving (extraction)
       time.

I don't claim this list is complete and am interested in seeing what
additions or changes are proposed.

I have intentionally tried not to pin down the elements of this
"program set" to a particular module structure.

I have intentionally omitted "self-extracting" archives from this
list because of past controversy. I think that they are both useful
and a potential security hazard.

I have also omitted the user interface issues from this list because
that would complicate things too much at this point and because
a lot of points to be made on that are going to be system dependent.

 Thos Sumner       Internet: thos@cca.ucsf.edu
 (The I.G.)        UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
                   BITNET:  thos@ucsfcca

 U.S. Mail:  Thos Sumner, Computer Center, Rm U-76, UCSF
             San Francisco, CA 94143-0704 USA

I hear nothing in life is certain but death and taxes -- and they're
working on death.

#include <disclaimer.std>