kdb@chinet.chi.il.us (Karl Botts) (02/05/90)
This file describes a program that, so far as I know, does not exist. I would rather not write it at all if something even remotely equivalent does exist; I would rather write it as a shell script if the components to make this possible can be found. If I have to write it from scratch in C the probability that I will ever get around to it is not high. Nevertheless, it seems to me that something like this is badly needed. I readily concede that doing backups on floppy disks is fundamentally insane; unfortunately, circumstances all too often compel it. The existing tools are poorly suited to the job. --- NAME fdar - write archives on floppy disks SYNOPSIS To be determined. DESCRIPTION: fdar writes cpio-readable archives tailored to the requirements of raw floppy disk volumes. It attempts to reconcile two conflicting aims: 1. To minimize the amount of data lost when some of the floppies on which an archive are stored are lost or damaged, 2. To maximize the efficiency of storage. The first aim is generally given precedence. In addition, fdar permits appending records to existing volumes. Names of files to be archived are read from standard input; archives are _not_ written to standard output (as with cpio), but rather to a specified device or file (as with tar.) (But, as with tar, standard output may be specified explicitly.) Several of the features of fdar require read as well as write access on the output device or file; fdar may be used without these features, but in this case most of its advantages over cpio or tar are lost. In the service of the first aim the following steps are taken: 1a. No record in an archive may cross a volume boundary. To achieve this, files larger than a floppy disk volume are split into multiple records. The names of the fragments are generated in a deterministic manner. Each archive record has its own cpio -c style header. All this is aimed at maximizing the data recoverable from a partially damaged floppy by reading it into a hard disk file with dd, and patching it up by various methods -- by hand if all else fails. Optionally, the maximum record size may be made smaller than a volume. 1b. The records corresponding to a file are stored on the minimum possible number of volumes; the files belonging to a directory are stored on the minimum number of volumes. This rule is applied recursively. 1c. The sequence of the volumes in an archive is logical only, and is not enforced when the archive is read (each volume is a separate archive so far as cpio is concerned.) In particular, the floppy disk sequencing mechanism provided by many implementations of cpio is not used. Thus, if some of the volumes in an archive are lost only the files on those volumes are lost. The sequencing is not necessary because of rule 1a. Even if an archive contains multiple versions of the same file, eschewing the -u option to cpio will insure that the version with the latest timestamp emerges. This also applies to split files, due to the "deterministic manner" clause of rule 1a. 1d. Additional optional steps may be specified; in particular, each header and/or file record may be forced to begin on a block, track or cylinder boundary. This additionally increases the possibility of recovering files from a damaged floppy volume. If this is done, the padding is arranged in such a way that cpio will throw it away when it is read. In the service of the second aim, these steps are taken: 2a) The contents (but _never_ the header) of an archive record may optionally be compressed. This will be done using compress; because this may make archiving many small files extremely slow due to process spawning overhead, it may optionally be done internally by fdar using code borrowed from compress. In either case the names of the archived files will be changed by the usual convention of appending ".Z" (like compress, fdar will decline to compress a file whose name already ends with ".Z"). To read such an archive, it must first be read and then the files uncompressed in place; compress and cpio cannot usefully be combined into a pipeline for this purpose. The procedure can still, of course, be encapsulated in a shell script. 2b) Within the limits imposed by rule 1b, the order in which filenames are submitted to fdar and the ordering of files within directories may be arbitrarily changed to minimize the space wasted because of rule 1a. Since this kind of bin-packing is a computationally intractable problem the packing may not be perfect, but it will be pretty good. Doing a good job of this will also require buffer space on a hard disk which could approach the size of the archive being created. fdar will be smart enough to do the best it can if it doesn't have enough buffer space (but it _must_ have sufficient space available to buffer at least one volume.) Although fdar is primarily intended for raw floppy disks, it can write to any device or to a file. Specifically, it can be used with tapes, although the buffer space required to effect rule 2b on a multiple-tape sized archive will frequently be prohibitive. fdar is normally interactive by nature, because it relies on a human to insert floppy disks in a drive. It takes advantage of this fact to recover from certain errors (but this can be disabled when desired.) For instance, if fdar is presented with a disk volume which has not been formatted it will stop, release its lock on the output device, and permit a shell escape so the human can correct the problem. It will handle write errors in a similar manner; it will always have the contents of the current volume buffered, so that it can back up and restart the volume without restarting the whole archive. If interactive error handling is disabled fdar will generally have to abort in these circumstances, as other archivers do. Naming conflicts stemming from the appending of ".Z" to long filenames or the splitting of files may also be handled interactively; in any event, fdar guarantees never to lose any filename information without explicit and immediate permission from a human. Appending to volumes will be done by the simple expedient of reading from the beginning of the volume to find the end. If fdar finds that there is not enough space to perform the append, it will prompt for a new volume in the usual way. fdar will need to know the sizes of the volumes available to write to before it commences the bin-packing described in rule 2b. It can get this information from the Volume Home Block of the first volume it is given, and in that case will assume that the rest of the volumes are to be the same size (it will check, and protest if they are not.) This behavior may be overridden and the volume sizes specified on the command line, or read from a configuration file. An environment variable may be used to specify a default configuration file. If cpio is used to read an fdar archive which contains files which were split to support rule 1a, the files will remain split after they are read. A separate little utility is provided to put them back together; this can also be done with cat, but the little utility knows about the fragment naming scheme and so requires less human intervention. It would be possible to make fdar also read archives, in which case it could know how to put split files back together by itself, as well as uncompress them. However, I tend to favor the idea that fdar should operate write-only. This point may be debatable. fdar can optionally write a complete record of all the work it does to stderr or a specified file, including filenames before splitting or compressing, which files are put in which volumes, and so forth. However, this is for user convenience only; retrieving the files from an archive _never_ depends on this information. BUGS To be determined.
dir@koala.UUCP (Dan Rosenblatt) (02/07/90)
I have taken afio, a public-domain super-set of cpio from someone at Lachman Associates, which has error-recovery & multiple volume handling & lots of other stuff, and added compression (compress 4.0). We are using it to distribute our next product release, and its impressive. We cut our diskette count from 8 to 4. Dan Rosenblatt Sigma Design, Inc.
jfh@rpp386.cactus.org (John F. Haugh II) (02/12/90)
In article <641@koala.UUCP> dir@koala.UUCP (Dan Rosenblatt) writes: >I have taken afio, a public-domain super-set of cpio from someone at >Lachman Associates, which has error-recovery & multiple volume handling & >lots of other stuff, and added compression (compress 4.0). We are using >it to distribute our next product release, and its impressive. We cut >our diskette count from 8 to 4. How about you post the source code to this thing? I currently use an Irwin 110 for tape backups of my machine and could really use some help cutting down the number of tapes it uses. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org