root@cca.ucsf.edu (Systems Staff) (03/01/90)
Even before we consider the detailed capabilities of any candidate
programs for these purposes we should screen out any that
A1. Use proprietary algorithms or code
Such algorithms are subject to the control of parties
not primarily concerned with the purposes of this group.
They may be changed at any time by their proprietors
(e.g. several incompatible versions of the ARC family exist).
In addition, such algorithms may lead to legal problems
as we all know.
A2. Are not available free and in source form.
Source is essential for the resolution of compatibility
problems as well as for porting to new OS's and machine
architectures.
A3. Are dependent on a particular OS or CPU architecture
There is just no question of the need to run on Unix and
MSDOS systems and I believe the need to run on VMS and
(at least) the popular 68k systems has also been demonstrated.
Given that these requirements are met by a program set (which may
contain one or more programs) then it is reasonable to examine
its technical merit. Let's start with a check list of such items as
B1. The program set should be able to produce files which can move
transparently through all the major networks. It should also
be able to produce files which take advantage of transparent
connections and media.
For example, files to be stored locally (e.g. for backups) or
transmitted via a binary-capable protocol need not have the
transparency restrictions mentioned below. The data integrity
checking should be included in this format.
The program set should include a capability for converting
between these two forms without otherwise transforming the
data.
B2. The transparency form should be pure ASCII text files that can
move transparently through internetworked connections that
(at the least) require translations between ASCII and EBCDIC)
even though the base data may be arbitrary collections of bytes.
This format should be insensitive to such common transformations
as trailing blank deletion, tab to space (or the inverse)
conversions, varying newline (<CR>, <NL>, >CR><NL>) conventions
etc.
No extraneous material (e.g. rounding to an even number of bytes)
should be added to the file by the complete process unless this
is a part of the standard of the destination OS.
The capability of splitting a file into pieces to meet
limitations on message size should be included. Standardized
starting and ending labels with part numbers and any other
desired identification allows automatic sequencing of parts,
skipping of extraneous headers and trailers etc.
B3. Both formats should contain adequate checking information
Each format should contain adequate checking to assure the
recipient that the data has not been damaged in transit.
This applies to both the binary and transparency formats.
Each format should be designed to localize damage, i.e. only
the actually damaged area should be unrecoverable and the
area of damage should be easily determined.
At least one version of uudecode includes a sample character
set in its header which includes all the characters required
for transparent data transmission. This provides a simple
check on transparency at the character level and a method
for recovery of some translation errors.
An overall file check should include a CRC and a length check.
Partial extraction should be supported for data outside the area
where the damage occurred.
B4. Data compression should be at a level competitive with other
available programs both in speed and degree of compression.
Unfortunately, the nature of the situation is that these are
conflicting requirements with disagreement on their priority.
If there are available methods whose balance is sufficiently
different to justify it, two (or more?) methods may be worth
inclusion with selection by user option at compression time.
No compression should be a supported option.
B5. The ability to handle file system subtrees automatically is
required as is selective retrieval of elements from such a
subtree.
B6. File characteristics (file name, permissions, etc?) as well
as file contents should be specified in the archive.
B7. Multi-volume archives should be allowed. Individual files
should be able to span volumes. Archives should be able to
take the form of either volumes or files.
B8. It should be possible to perform updates (addition, deletion,
replacement) of individual files in an archive and generate
a new archive without performing a complete extraction of the
existing archive. This has substantial implications for the
compression schemes used.
B9. Absolute path names (if allowed at all) should be suppressible
and suppressed by default, i.e. you would have to explicitly
exercise an option to allow them at de-archiving (extraction)
time.
I don't claim this list is complete and am interested in seeing what
additions or changes are proposed.
I have intentionally tried not to pin down the elements of this
"program set" to a particular module structure.
I have intentionally omitted "self-extracting" archives from this
list because of past controversy. I think that they are both useful
and a potential security hazard.
I have also omitted the user interface issues from this list because
that would complicate things too much at this point and because
a lot of points to be made on that are going to be system dependent.
Thos Sumner Internet: thos@cca.ucsf.edu
(The I.G.) UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
BITNET: thos@ucsfcca
U.S. Mail: Thos Sumner, Computer Center, Rm U-76, UCSF
San Francisco, CA 94143-0704 USA
I hear nothing in life is certain but death and taxes -- and they're
working on death.
#include <disclaimer.std>