[alt.sources] floppy archives

kdb@chinet.chi.il.us (Karl Botts) (02/05/90)

This file describes a program that, so far as I know, does not exist.
I would rather not write it at all if something even remotely equivalent
does exist; I would rather write it as a shell script if the components
to make this possible can be found.  If I have to write it from scratch in
C the probability that I will ever get around to it is not high.

Nevertheless, it seems to me that something like this is badly needed. 
I readily concede that doing backups on floppy disks is fundamentally
insane; unfortunately, circumstances all too often compel it.  The
existing tools are poorly suited to the job. 
---


NAME

fdar - write archives on floppy disks


SYNOPSIS

To be determined.


DESCRIPTION:

fdar writes cpio-readable archives tailored to the requirements of raw
floppy disk volumes.  It attempts to reconcile two conflicting aims:

1. To minimize the amount of data lost when some of the floppies on
which an archive are stored are lost or damaged,

2.	To maximize the efficiency of storage.

The first aim is generally given precedence.  In addition, fdar permits
appending records to existing volumes.  Names of files to be archived
are read from standard input; archives are _not_ written to standard
output (as with cpio), but rather to a specified device or file (as with
tar.) (But, as with tar, standard output may be specified explicitly.)
Several of the features of fdar require read as well as write access on
the output device or file; fdar may be used without these features, but
in this case most of its advantages over cpio or tar are lost. 


In the service of the first aim the following steps are taken:

1a.  No record in an archive may cross a volume boundary.  To achieve
this, files larger than a floppy disk volume are split into multiple
records.  The names of the fragments are generated in a deterministic
manner.  Each archive record has its own cpio -c style header.  All this
is aimed at maximizing the data recoverable from a partially damaged
floppy by reading it into a hard disk file with dd, and patching it up
by various methods -- by hand if all else fails.  Optionally, the
maximum record size may be made smaller than a volume. 

1b.  The records corresponding to a file are stored on the minimum
possible number of volumes; the files belonging to a directory are
stored on the minimum number of volumes.  This rule is applied
recursively. 

1c.  The sequence of the volumes in an archive is logical only, and is
not enforced when the archive is read (each volume is a separate archive
so far as cpio is concerned.) In particular, the floppy disk sequencing
mechanism provided by many implementations of cpio is not used.  Thus,
if some of the volumes in an archive are lost only the files on those
volumes are lost.  The sequencing is not necessary because of rule 1a. 
Even if an archive contains multiple versions of the same file,
eschewing the -u option to cpio will insure that the version with the
latest timestamp emerges.  This also applies to split files, due to the
"deterministic manner" clause of rule 1a. 

1d.  Additional optional steps may be specified; in particular, each
header and/or file record may be forced to begin on a block, track or
cylinder boundary.  This additionally increases the possibility of
recovering files from a damaged floppy volume.  If this is done, the
padding is arranged in such a way that cpio will throw it away when it
is read. 


In the service of the second aim, these steps are taken:

2a) The contents (but _never_ the header) of an archive record may
optionally be compressed.  This will be done using compress; because
this may make archiving many small files extremely slow due to process
spawning overhead, it may optionally be done internally by fdar using
code borrowed from compress.  In either case the names of the archived
files will be changed by the usual convention of appending ".Z" (like
compress, fdar will decline to compress a file whose name already ends
with ".Z").  To read such an archive, it must first be read and then the
files uncompressed in place; compress and cpio cannot usefully be combined
into a pipeline for this purpose.  The procedure can still, of course,
be encapsulated in a shell script. 

2b) Within the limits imposed by rule 1b, the order in which filenames
are submitted to fdar and the ordering of files within directories may
be arbitrarily changed to minimize the space wasted because of rule 1a.
Since this kind of bin-packing is a computationally intractable problem
the packing may not be perfect, but it will be pretty good.  Doing a
good job of this will also require buffer space on a hard disk which
could approach the size of the archive being created.  fdar will be smart
enough to do the best it can if it doesn't have enough buffer space (but
it _must_ have sufficient space available to buffer at least one volume.)


Although fdar is primarily intended for raw floppy disks, it can write
to any device or to a file.  Specifically, it can be used with tapes,
although the buffer space required to effect rule 2b on a multiple-tape
sized archive will frequently be prohibitive. 

fdar is normally interactive by nature, because it relies on a human to
insert floppy disks in a drive.  It takes advantage of this fact to
recover from certain errors (but this can be disabled when desired.)
For instance, if fdar is presented with a disk volume which has not been
formatted it will stop, release its lock on the output device, and
permit a shell escape so the human can correct the problem.  It will handle
write errors in a similar manner; it will always have the contents of the
current volume buffered, so that it can back up and restart the volume
without restarting the whole archive.  If interactive error handling is
disabled fdar will generally have to abort in these circumstances, as other
archivers do.

Naming conflicts stemming from the appending of ".Z" to long filenames
or the splitting of files may also be handled interactively; in any
event, fdar guarantees never to lose any filename information without
explicit and immediate permission from a human. 

Appending to volumes will be done by the simple expedient of reading
from the beginning of the volume to find the end.  If fdar finds that
there is not enough space to perform the append, it will prompt for a
new volume in the usual way. 

fdar will need to know the sizes of the volumes available to write to
before it commences the bin-packing described in rule 2b.  It can get
this information from the Volume Home Block of the first volume it is
given, and in that case will assume that the rest of the volumes are to
be the same size (it will check, and protest if they are not.) This
behavior may be overridden and the volume sizes specified on the command
line, or read from a configuration file.  An environment variable may be
used to specify a default configuration file.

If cpio is used to read an fdar archive which contains files which were
split to support rule 1a, the files will remain split after they are
read.  A separate little utility is provided to put them back together;
this can also be done with cat, but the little utility knows about the
fragment naming scheme and so requires less human intervention.  It
would be possible to make fdar also read archives, in which case it
could know how to put split files back together by itself, as well as
uncompress them.  However, I tend to favor the idea that fdar should
operate write-only.  This point may be debatable. 

fdar can optionally write a complete record of all the work it does to
stderr or a specified file, including filenames before splitting or
compressing, which files are put in which volumes, and so forth.  However,
this is for user convenience only; retrieving the files from an archive
_never_ depends on this information.


BUGS

To be determined.

dir@koala.UUCP (Dan Rosenblatt) (02/07/90)

I have taken afio, a public-domain super-set of cpio from someone at
Lachman Associates, which has error-recovery & multiple volume handling &
lots of other stuff, and added compression (compress 4.0).  We are using
it to distribute our next product release, and its impressive.  We cut
our diskette count from 8 to 4.

Dan Rosenblatt
Sigma Design, Inc.

jfh@rpp386.cactus.org (John F. Haugh II) (02/12/90)

In article <641@koala.UUCP> dir@koala.UUCP (Dan Rosenblatt) writes:
>I have taken afio, a public-domain super-set of cpio from someone at
>Lachman Associates, which has error-recovery & multiple volume handling &
>lots of other stuff, and added compression (compress 4.0).  We are using
>it to distribute our next product release, and its impressive.  We cut
>our diskette count from 8 to 4.

How about you post the source code to this thing?  I currently use an
Irwin 110 for tape backups of my machine and could really use some help
cutting down the number of tapes it uses.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org