frazier@cs.ucla.edu (Greg Frazier) (04/24/91)
At UCLA we are currently using a progrm called "backup" to do incremental backups to a hard disk. Unfortunately, the from and to disks have to both be on the same machine. Does anybody have an incremental backup program that will work across NFS-mounted disks? Yes, I am aware that one can hack up a script using rdump and rrestore, which is what I will do if I cannot find a good program. Thanks for any responses! Greg Frazier frazier@CS.UCLA.EDU !{ucbvax,rutgers}!ucla-cs!frazier
mrl@uunet.uu.net (Mark R. Ludwig) (04/29/91)
In article <2573@brchh104.bnr.ca>, frazier@cs (Greg Frazier) writes: >At UCLA we are currently using a progrm called "backup" to do incremental >backups to a hard disk. Unfortunately, the from and to disks have to both >be on the same machine. Does anybody have an incremental backup program >that will work across NFS-mounted disks? Yes, I am aware that one can >hack up a script using rdump and rrestore, which is what I will do if I >cannot find a good program. We have no budget to buy *any* software, so I have had to build my own wheel. We have a multi-vendor network and all the nodes are backed up to an 8mm drive on a Sun-4. While we do only full backups (instead of incremental), I believe my experiences are relevant enough to recommend you do *not* try to do backup across NFS. There are several reasons, but the most important are that the file protections across NFS do not guarantee that you can access all files as "root" on a client node (yes, I know, *some* implementations of NFS allow this, and *many* do not), the network traffic to access a full file partition via NFS is tremendous, and rdump/rrestore have a horribly inefficient ACK/NACK tape manipulation protocol which the use to the tape drive. Instead, I suggest you run your favorite backup program on the NFS server, send the output to stdout, and pipe the result into a remote shell on the node with the tape drive, using ``dd'' to appropriately block the output to the tape drive for both efficient space utilization and good tape streaming performance. (My experience with the 8mm drive using a block size of 100b (51200 bytes) suggests that the ``dd'' writing to the tape drive will consume a fair amount of CPU to do its bidding.) This approach results in less network traffic and better throughput for the backup operation, because the ``rsh'' is basically uni-directional compared with all the back-and-forth packets required for the equivalent NFS access to the filesystem. For those experts reading: Yes, I have the scripts Do the Right Thing by unmounting all the filesystems, pruning them (with ``fsck''), and only dumping them if the pruner is happy; no, I don't do ``root'' or ``usr'' this way, because, to the best of my knowledge, it is impossible to automate hands-off backup for those. INET: mrl@uai.com UUCP: uunet!uaisun4!mrl PSTN: +1 213 822 4422 USPS: 7740 West Manchester Boulevard, Suite 208, Playa del Rey, CA 90293 WANT: Succinct, insightful statement to occupy this space. Inquire within.
bet@zachary.mc.duke.edu (Bennett Todd -- gaj) (06/05/91)
Well, I guess I've got a different approach to solving this problem. We've got a couple of Exabyte 8200 drives on one of the (diskless) workstations. Every night I stuff in two tapes. We are backing up about 16G of user data, from all over the network. We have Suns, Stellar GS1000s, microvaxes, SGI Irises, and I forget what-all else. I find, for each machine, a command that will allow me to do a full or an incremental of a single filesystem, emitting the results to stdout. On the Iris I made some trivial scripts with find(1) and cpio(1). I have a master database, that contains one record for each filesystem. Each record contains the hostname, partition name, filesystem name (for comments), size (in megabytes), command to take a full dump, and command to take an incremental dump. The size I generate by taking the output of df(1) and adding used+avail. I run this database through a perl program that generates a series of databases -- at this point it generates 15 of them. Each database describes a single tape. The fulls are currently taking 7 tapes. For each tape of the full dump the perl script generates an incremental database describing an incremental dump of every filesystem that isn't in that full. So that's 14 databases. The 15'th is a complete incremental. I have a script that will read one of these databases and write a dump tape, dumping over the network via rsh piped into dd. I am doing a two-week rotation. Every night of the first week, and Monday and Tuesday of the second week, I write two tapes -- one of the full tapes and the complementary incremental. Wednesday through Friday of the second week I do incrementals. Everything is backed up fully twice a month, everything is backed up one way or another every night, and I never have to sit around shuffling tapes. We are using Fuji P6-120MP tapes, which we can get for <$5 ea. As easy as they are to store, and as cheap as they are, we never recycle them -- we keep them all forever. As for the issue of dumping live filesystems, I ignore it. I've never heard anyone claim that doing dumps this way can corrupt the *filesystem*, just the dump. As far as I know, changing files won't hurt anything (just that the individual files may not be correctly dumped), and the only thing that can corrupt the entire dump is a subdirectory being deleted, and it's inode recycled as a file later on. Given a choice between taking dumps oftener (and having a small but non-zero probability of an error in the dump), and slightly increasing the reliability but making it *much* more obstructive to take the dump, I know which way I'll go. -Bennett bet@orion.mc.duke.edu