root@cca.ucsf.edu (Systems Staff) (03/01/90)
Even before we consider the detailed capabilities of any candidate programs for these purposes we should screen out any that A1. Use proprietary algorithms or code Such algorithms are subject to the control of parties not primarily concerned with the purposes of this group. They may be changed at any time by their proprietors (e.g. several incompatible versions of the ARC family exist). In addition, such algorithms may lead to legal problems as we all know. A2. Are not available free and in source form. Source is essential for the resolution of compatibility problems as well as for porting to new OS's and machine architectures. A3. Are dependent on a particular OS or CPU architecture There is just no question of the need to run on Unix and MSDOS systems and I believe the need to run on VMS and (at least) the popular 68k systems has also been demonstrated. Given that these requirements are met by a program set (which may contain one or more programs) then it is reasonable to examine its technical merit. Let's start with a check list of such items as B1. The program set should be able to produce files which can move transparently through all the major networks. It should also be able to produce files which take advantage of transparent connections and media. For example, files to be stored locally (e.g. for backups) or transmitted via a binary-capable protocol need not have the transparency restrictions mentioned below. The data integrity checking should be included in this format. The program set should include a capability for converting between these two forms without otherwise transforming the data. B2. The transparency form should be pure ASCII text files that can move transparently through internetworked connections that (at the least) require translations between ASCII and EBCDIC) even though the base data may be arbitrary collections of bytes. This format should be insensitive to such common transformations as trailing blank deletion, tab to space (or the inverse) conversions, varying newline (<CR>, <NL>, >CR><NL>) conventions etc. No extraneous material (e.g. rounding to an even number of bytes) should be added to the file by the complete process unless this is a part of the standard of the destination OS. The capability of splitting a file into pieces to meet limitations on message size should be included. Standardized starting and ending labels with part numbers and any other desired identification allows automatic sequencing of parts, skipping of extraneous headers and trailers etc. B3. Both formats should contain adequate checking information Each format should contain adequate checking to assure the recipient that the data has not been damaged in transit. This applies to both the binary and transparency formats. Each format should be designed to localize damage, i.e. only the actually damaged area should be unrecoverable and the area of damage should be easily determined. At least one version of uudecode includes a sample character set in its header which includes all the characters required for transparent data transmission. This provides a simple check on transparency at the character level and a method for recovery of some translation errors. An overall file check should include a CRC and a length check. Partial extraction should be supported for data outside the area where the damage occurred. B4. Data compression should be at a level competitive with other available programs both in speed and degree of compression. Unfortunately, the nature of the situation is that these are conflicting requirements with disagreement on their priority. If there are available methods whose balance is sufficiently different to justify it, two (or more?) methods may be worth inclusion with selection by user option at compression time. No compression should be a supported option. B5. The ability to handle file system subtrees automatically is required as is selective retrieval of elements from such a subtree. B6. File characteristics (file name, permissions, etc?) as well as file contents should be specified in the archive. B7. Multi-volume archives should be allowed. Individual files should be able to span volumes. Archives should be able to take the form of either volumes or files. B8. It should be possible to perform updates (addition, deletion, replacement) of individual files in an archive and generate a new archive without performing a complete extraction of the existing archive. This has substantial implications for the compression schemes used. B9. Absolute path names (if allowed at all) should be suppressible and suppressed by default, i.e. you would have to explicitly exercise an option to allow them at de-archiving (extraction) time. I don't claim this list is complete and am interested in seeing what additions or changes are proposed. I have intentionally tried not to pin down the elements of this "program set" to a particular module structure. I have intentionally omitted "self-extracting" archives from this list because of past controversy. I think that they are both useful and a potential security hazard. I have also omitted the user interface issues from this list because that would complicate things too much at this point and because a lot of points to be made on that are going to be system dependent. Thos Sumner Internet: thos@cca.ucsf.edu (The I.G.) UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos BITNET: thos@ucsfcca U.S. Mail: Thos Sumner, Computer Center, Rm U-76, UCSF San Francisco, CA 94143-0704 USA I hear nothing in life is certain but death and taxes -- and they're working on death. #include <disclaimer.std>