lkc@hpirs.HP (Lee Casuto) (12/12/86)
A few weeks ago I posted a request for information about a paper on the backup of live filesystems. I also received lots of mail from folks interested in this very topic. Rather than responding to each person directly, I am going to post the definitive word on the subject. Thanks to everyone for expressing an interest. Special thanks to Kirk Mckusick for taking the time to respond personally. Lee Casuto Mail: ...ucbvax!hpda!lkc Phone: 408-447-6686 --------------------------------------------- From: blia!lkc@hpirs.hp.ucsf.edu (Lee Casuto) Newsgroups: comp.unix.wizards Subject: Backup of a live file system? Date: 25 Nov 86 17:44:46 GMT There is a rumor around here that Mr. M. K. McKusick has written a paper on the backup of a live file system. It would certainly be appreciated if anyone could respond to me with the title of this illusive article (if it *really* exists). Thanks in advance for any cooperation. Lee Casuto Mail: ...ucbvax!hpda!lkc I hate to disappoint you, but no such paper exists. In fact I have gone out of my way over the years to make it as clear as possible that `live' dumps are NOT always going to work! The problem is that they usually do work, particularly if they are being used to extract individual files. But full incremental restores are likely to bomb out, and of course these are the ones that are most critical. Kirk McKusick
wcs@ho95e.UUCP (#Bill.Stewart) (12/17/86)
In article <4760002@hpirs.HP> lkc@hpirs.HP (Lee Casuto) writes: >A few weeks ago I posted a request for information about a paper on >the backup of live filesystems. I also received lots of mail from >..... >>I hate to disappoint you, but no such paper exists. In fact I have gone >>out of my way over the years to make it as clear as possible that `live' >>dumps are NOT always going to work! The problem is that they usually do >>work, particularly if they are being used to extract individual files. >>But full incremental restores are likely to bomb out, and of course these >>are the ones that are most critical. >> Kirk McKusick There are two basic approaches to backups: programs that use the file system (e.g. tar, cpio), and programs that scrounge directly off the disk (dd, dump, volcopy, finc) File-system based programs can work on live systems as long as the individual files are not changing. They are slow but flexible, and do incremental dumps well. Unfortunately, they can't tell when a given file has been *removed*, and can get horribly confused if you play games with links or modification times between dumps. Disk-based backup programs are normally much faster, but are unsafe on live file systems; if nothing's being written at 3AM you may luck out. Disk-based *restore* programs are another story; you should expect terrible corruption if you use one on a live disk. Suppose someone was already using inode 443 when you try to restore it? -- # Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
mangler@cit-vax.Caltech.Edu (System Mangler) (12/20/86)
In article <1226@ho95e.UUCP>, wcs@ho95e.UUCP (#Bill.Stewart) writes: > File-system based programs can work on live systems as long as the individual > files are not changing. They are slow but flexible, and do incremental dumps > well. > > Disk-based backup programs are normally much faster, but are unsafe on live > file systems; I claim that both types are unsafe, for the SAME reasons. In both cases, a file's inode is read (either by read, or by stat), and based on that information the rest of the file is read. Reading the inode is an atomic operation, because the inode is completely contained in one disk sector, so the inode will always be internally consistent. However, after the inode is read, the information that it points to may be freed by a creat(), and scribbled upon, before the backup program reads it. The program will either get garbage, or EOF, but in either case it has to write SOMETHING on the tape now that it has committed itself by writing out a header saying that the next st_size bytes are the contents of the file. That's one kind of corruption, and probably not that bad. It doesn't matter that you got garbage, the file was being zapped anyway, and will appear on the next backup tape. The important thing is to not bomb on it. Another is when the file is removed/renamed between the time that it's selected for backup and the time it actually gets read. This is simple to handle; just skip that file. The insidious case, though, is when subdirectories get moved out of a directory that hasn't been backed up yet, and into one that has already been done or was being skipped. That subtree won't be restored at all, and won't be on a subsequent incremental tape either, because the files didn't change. Filesystem-based backup programs won't even know that they missed something; disk-based programs will at least have a way to know that something happened, because they will come up with all these orphaned inodes. Presumably, these should get linked into lost+found. (I haven't looked to see what *actually* happens). Dump has the additional advantage that all the directories are read very early, so the window of vulnerability is smaller. Sure, I've gotten bad dumps. In large part I think this happened because the system mangler before me changed dump to wait for a tape mount between pass II and pass III, and at that time tape mounts often took hours - creating a very large window of vulnerability. > Disk-based backup programs are normally much faster, Making it feasible to keep one's backups more up-to-date. Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck
dave@onfcanim.UUCP (12/22/86)
Yet another thing that can go wrong is this: dump reads the I-list, and decides which files to write out. By the time it begins dumping a particular inode, that file has been re-written. The file is a large file; it has indirect blocks that have been released and re-allocated like the rest of the blocks in the file. Due to other filesystem activity, different blocks got allocated for the indirect blocks this time. When dump goes to read the indirect blocks (based on the old, obsolete inode) it gets a block full of ASCII text or machine code or whatever instead of disk block numbers. When it interprets the data as block number, it gets read errors trying to read ridiculous block numbers. Someone seeing all those read errors is likely to abort the dump, if dump doesn't decide itself to give up.
henry@utzoo.UUCP (Henry Spencer) (12/23/86)
Another wart of dump programs that go through the filesystem is that the access time of files becomes largely useless, since the dump program ends up updating it on every backup. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry
fnf@mcdsun.UUCP (Fred Fish) (12/24/86)
In article <1392@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (System Mangler) writes: [ deleted stuff ] >the backup program reads it. The program will either get garbage, >or EOF, but in either case it has to write SOMETHING on the tape now >that it has committed itself by writing out a header saying that the >next st_size bytes are the contents of the file. I had to deal with this when I wrote "bru" (Backup and Restore Utility). My basic strategy was that the archive would always contain exactly the number of bytes recorded in the archived file's header block. If the file actually shrunk or grew it was padded or truncated appropriately, with a warning message. This allows bru to always depend on the size recorded in the file header to seek to the next file header (rather than loop reading blocks) if it wasn't interested in the current file and the archive device supports seeking. A big win on table of contents... >The insidious case, though, is when subdirectories get moved out of >a directory that hasn't been backed up yet, and into one that has >already been done or was being skipped. That subtree won't be restored >at all, and won't be on a subsequent incremental tape either, because >the files didn't change. Yes, any sort of movement or restructuring of the file tree can confuse per-file type backups. My feeling is that maintenance of the tree structure is NOT the domain of the backup utility, but should be done with a separate utility, that keeps track of changes in the tree. The Unisoft vchk utility is close to this, but is oriented at keeping two systems in sync, not keeping track of changes on a single system. -Fred -- =========================================================================== Fred Fish Motorola Computer Division, 3013 S 52nd St, Tempe, Az 85282 USA {seismo!noao!mcdsun,hplabs!well}!fnf (602) 438-5976 ===========================================================================
fnf@mcdsun.UUCP (Fred Fish) (12/24/86)
In article <7446@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >Another wart of dump programs that go through the filesystem is that the >access time of files becomes largely useless, since the dump program ends >up updating it on every backup. Maybe I'm missing something, but why not just use utime(2) to reset the st_atime and st_mtime fields, that's what bru does. Of course, st_ctime is not resetable, and any changes to st_atime or st_mtime by another process while the file is being read out for backup are lost. Backup programs that diddle with the raw filesystem while it's active give me the creeps... -Fred -- =========================================================================== Fred Fish Motorola Computer Division, 3013 S 52nd St, Tempe, Az 85282 USA {seismo!noao!mcdsun,hplabs!well}!fnf (602) 438-5976 ===========================================================================
davy@pur-ee.UUCP (Dave Curry) (12/25/86)
We do both partial and full dumps of live file systems on a regular basis, and have had no troubles. The tricks: 1. Nice yourself down as far as you can. Like -20. 2. Modify dump (most of the mods are in dumptraverse.c) to skip any inode whose mtime or ctime is greater than spcl.c_date (time of the dump). The idea here is that you dump all files which have not changed since the dump started. If the file changes during the dump, it will not be looked at, and thus the problems of removed or changed files (removed files are the worst) go away. You MUST make these mods to dump to get away with this sort of thing; we found out real fast in testing that dump (actually restore) tends to get real upset if files go away when it thinks they should be there. The most we've seen happen doing things this way is that when restoring from a full dump you see a few "resync restore" messages. But we have never had a bad dump (non-restorable) in the 14 months or so that we've been doing this. NOTE: I'm not necessarily recommending this practice. If we had our way we'd do dumps in single-user mode. But shutting down 20 machines every morning for 30-minute partials and on weekends for 2- and 3-hour fulls is not practical. If you want the diffs (for 4.3BSD dump), send me mail... if I get enough requests I'll post them. --Dave Curry Purdue University Engineering Computer Network
henry@utzoo.UUCP (Henry Spencer) (12/31/86)
> >Another wart of dump programs that go through the filesystem is that the > >access time of files becomes largely useless, since the dump program ends > >up updating it on every backup. > > Maybe I'm missing something, but why not just use utime(2) to reset the > st_atime and st_mtime fields, that's what bru does... Then you lose st_ctime, which is infrequently used by humans, but is the field that *backups* ought to be based on, since it captures things like permission changes that don't alter st_mtime. There are other things that st_ctime is useful for also, albeit they are unusual. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry
mangler@cit-vax.Caltech.Edu (System Mangler) (01/03/87)
In article <5108@pur-ee.UUCP>, davy@pur-ee.UUCP (Dave Curry) writes: > 1. Nice yourself down as far as you can. Like -20. A couple weeks ago a friend was converting a VMS BACKUP tape for me, and as we stood around while the TU80 slowly turned, I wisecracked about the TU80 on the nearby 4.3 BSD machine, which was turning just as slowly. He retorted that he had reverse-nice'd /etc/dump to -20, users be damned. Later, I found that the reverse-nice (mean?) was what made dump run so slowly. 4.3 BSD dump uses several processes, to overlap disk and tape I/O. Since it is I/O-bound, `usrpri' stays pretty close to PUSER+2*nice. If nice is more negative than -6, this will be a better priority than the flock wakeup priority, so the scheduler favors the current process instead of waking up the next tape writer. If nice is more negative than -14, the current process gets priority over disk I/O completions too. I think the I/O wakeup priorities are much too close to PUSER; PZERO ought to be changed to about 10. > 2. Modify dump (most of the mods are in dumptraverse.c) to > skip any inode whose mtime or ctime is greater than > spcl.c_date (time of the dump). This means that a full dump of the root filesystem will be missing /dev/console and /dev/rmt8. Not my idea of a useful backup... The modifications affect only pass IV. But that wasn't where dump was weak. An rm -r during passes I/II/III of stock 4.2 BSD dump will make restore dump core. 4.3 BSD /etc/dump skips deleted files and directories, allowing restore to get *much* further before dumping core, and it seems to have no problem with deletions during pass IV. The only thing I have to add to the 4.3 BSD sanity checks would be a warning message in dirdump(), since a deleted directory is still quite likely to make the dump useless. Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck