[comp.sys.sun] Incremental Backups

frazier@cs.ucla.edu (Greg Frazier) (04/24/91)

At UCLA we are currently using a progrm called "backup" to do incremental
backups to a hard disk.  Unfortunately, the from and to disks have to both
be on the same machine.  Does anybody have an incremental backup program
that will work across NFS-mounted disks?  Yes, I am aware that one can
hack up a script using rdump and rrestore, which is what I will do if I
cannot find a good program.

Thanks for any responses!

Greg Frazier	frazier@CS.UCLA.EDU	!{ucbvax,rutgers}!ucla-cs!frazier

mrl@uunet.uu.net (Mark R. Ludwig) (04/29/91)

In article <2573@brchh104.bnr.ca>, frazier@cs (Greg Frazier) writes:
>At UCLA we are currently using a progrm called "backup" to do incremental
>backups to a hard disk.  Unfortunately, the from and to disks have to both
>be on the same machine.  Does anybody have an incremental backup program
>that will work across NFS-mounted disks?  Yes, I am aware that one can
>hack up a script using rdump and rrestore, which is what I will do if I
>cannot find a good program.

We have no budget to buy *any* software, so I have had to build my own
wheel.  We have a multi-vendor network and all the nodes are backed up to
an 8mm drive on a Sun-4.  While we do only full backups (instead of
incremental), I believe my experiences are relevant enough to recommend
you do *not* try to do backup across NFS.  There are several reasons, but
the most important are that the file protections across NFS do not
guarantee that you can access all files as "root" on a client node (yes, I
know, *some* implementations of NFS allow this, and *many* do not), the
network traffic to access a full file partition via NFS is tremendous, and
rdump/rrestore have a horribly inefficient ACK/NACK tape manipulation
protocol which the use to the tape drive.

Instead, I suggest you run your favorite backup program on the NFS server,
send the output to stdout, and pipe the result into a remote shell on the
node with the tape drive, using ``dd'' to appropriately block the output
to the tape drive for both efficient space utilization and good tape
streaming performance.  (My experience with the 8mm drive using a block
size of 100b (51200 bytes) suggests that the ``dd'' writing to the tape
drive will consume a fair amount of CPU to do its bidding.)  This approach
results in less network traffic and better throughput for the backup
operation, because the ``rsh'' is basically uni-directional compared with
all the back-and-forth packets required for the equivalent NFS access to
the filesystem.

For those experts reading: Yes, I have the scripts Do the Right Thing by
unmounting all the filesystems, pruning them (with ``fsck''), and only
dumping them if the pruner is happy; no, I don't do ``root'' or ``usr''
this way, because, to the best of my knowledge, it is impossible to
automate hands-off backup for those.

INET: mrl@uai.com       UUCP: uunet!uaisun4!mrl       PSTN: +1 213 822 4422
USPS: 7740 West Manchester Boulevard, Suite 208, Playa del Rey, CA  90293
WANT: Succinct, insightful statement to occupy this space.  Inquire within.

bet@zachary.mc.duke.edu (Bennett Todd -- gaj) (06/05/91)

Well, I guess I've got a different approach to solving this problem.
We've got a couple of Exabyte 8200 drives on one of the (diskless)
workstations. Every night I stuff in two tapes. We are backing up about
16G of user data, from all over the network. We have Suns, Stellar
GS1000s, microvaxes, SGI Irises, and I forget what-all else.

I find, for each machine, a command that will allow me to do a full or
an incremental of a single filesystem, emitting the results to stdout.
On the Iris I made some trivial scripts with find(1) and cpio(1).

I have a master database, that contains one record for each filesystem.
Each record contains the hostname, partition name, filesystem name (for
comments), size (in megabytes), command to take a full dump, and command
to take an incremental dump. The size I generate by taking the output of
df(1) and adding used+avail. I run this database through a perl program
that generates a series of databases -- at this point it generates 15 of
them. Each database describes a single tape. The fulls are currently
taking 7 tapes. For each tape of the full dump the perl script generates
an incremental database describing an incremental dump of every
filesystem that isn't in that full. So that's 14 databases. The 15'th is
a complete incremental.

I have a script that will read one of these databases and write a dump
tape, dumping over the network via rsh piped into dd. I am doing a
two-week rotation. Every night of the first week, and Monday and Tuesday
of the second week, I write two tapes -- one of the full tapes and the
complementary incremental. Wednesday through Friday of the second week I
do incrementals.

Everything is backed up fully twice a month, everything is backed up one
way or another every night, and I never have to sit around shuffling
tapes. We are using Fuji P6-120MP tapes, which we can get for <$5 ea. As
easy as they are to store, and as cheap as they are, we never recycle
them -- we keep them all forever.

As for the issue of dumping live filesystems, I ignore it. I've never
heard anyone claim that doing dumps this way can corrupt the
*filesystem*, just the dump. As far as I know, changing files won't hurt
anything (just that the individual files may not be correctly dumped),
and the only thing that can corrupt the entire dump is a subdirectory
being deleted, and it's inode recycled as a file later on.

Given a choice between taking dumps oftener (and having a small but
non-zero probability of an error in the dump), and slightly increasing
the reliability but making it *much* more obstructive to take the dump,
I know which way I'll go.

-Bennett
bet@orion.mc.duke.edu