[comp.unix.wizards] Backup of a live filesystem revisited

lkc@hpirs.HP (Lee Casuto) (12/12/86)

A few weeks ago I posted a request for information about a paper on
the backup of live filesystems. I also received lots of mail from
folks interested in this very topic. Rather than responding to each
person directly, I am going to post the definitive word on the
subject. Thanks to everyone for expressing an interest. Special
thanks to Kirk Mckusick for taking the time to respond personally.

Lee Casuto
Mail: ...ucbvax!hpda!lkc
Phone: 408-447-6686
---------------------------------------------
	From: blia!lkc@hpirs.hp.ucsf.edu (Lee Casuto)
	Newsgroups: comp.unix.wizards
	Subject: Backup of a live file system?
	Date: 25 Nov 86 17:44:46 GMT

	There is a rumor around here that Mr. M. K. McKusick has written a
	paper on the backup of a live file system. It would certainly be
	appreciated if anyone could respond to me with the title of this
	illusive article (if it *really* exists). Thanks in advance for any
	cooperation.

	Lee Casuto
	Mail: ...ucbvax!hpda!lkc

I hate to disappoint you, but no such paper exists. In fact I have gone
out of my way over the years to make it as clear as possible that `live'
dumps are NOT always going to work! The problem is that they usually do
work, particularly if they are being used to extract individual files.
But full incremental restores are likely to bomb out, and of course these
are the ones that are most critical.

	Kirk McKusick

wcs@ho95e.UUCP (#Bill.Stewart) (12/17/86)

In article <4760002@hpirs.HP> lkc@hpirs.HP (Lee Casuto) writes:
>A few weeks ago I posted a request for information about a paper on
>the backup of live filesystems. I also received lots of mail from
>.....
>>I hate to disappoint you, but no such paper exists. In fact I have gone
>>out of my way over the years to make it as clear as possible that `live'
>>dumps are NOT always going to work! The problem is that they usually do
>>work, particularly if they are being used to extract individual files.
>>But full incremental restores are likely to bomb out, and of course these
>>are the ones that are most critical.
>>	Kirk McKusick

There are two basic approaches to backups:
	programs that use the file system (e.g. tar, cpio), and
	programs that scrounge directly off the disk (dd, dump, volcopy, finc)

File-system based programs can work on live systems as long as the individual
files are not changing.  They are slow but flexible, and do incremental dumps
well.  Unfortunately, they can't tell when a given file has been *removed*, and
can get horribly confused if you play games with links or modification times
between dumps.

Disk-based backup programs are normally much faster, but are unsafe on live
file systems; if nothing's being written at 3AM you may luck out.
Disk-based *restore* programs are another story; you should expect terrible
corruption if you use one on a live disk.  Suppose someone was already using
inode 443 when you try to restore it?
-- 
# Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

mangler@cit-vax.Caltech.Edu (System Mangler) (12/20/86)

In article <1226@ho95e.UUCP>, wcs@ho95e.UUCP (#Bill.Stewart) writes:
> File-system based programs can work on live systems as long as the individual
> files are not changing.  They are slow but flexible, and do incremental dumps
> well.
>
> Disk-based backup programs are normally much faster, but are unsafe on live
> file systems;

I claim that both types are unsafe, for the SAME reasons.

In both cases, a file's inode is read (either by read, or by stat),
and based on that information the rest of the file is read.  Reading
the inode is an atomic operation, because the inode is completely
contained in one disk sector, so the inode will always be internally
consistent.  However, after the inode is read, the information that
it points to may be freed by a creat(), and scribbled upon, before
the backup program reads it.  The program will either get garbage,
or EOF, but in either case it has to write SOMETHING on the tape now
that it has committed itself by writing out a header saying that the
next st_size bytes are the contents of the file.

That's one kind of corruption, and probably not that bad.  It doesn't
matter that you got garbage, the file was being zapped anyway, and
will appear on the next backup tape.  The important thing is to not
bomb on it.

Another is when the file is removed/renamed between the time that it's
selected for backup and the time it actually gets read.  This is simple
to handle; just skip that file.

The insidious case, though, is when subdirectories get moved out of
a directory that hasn't been backed up yet, and into one that has
already been done or was being skipped.  That subtree won't be restored
at all, and won't be on a subsequent incremental tape either, because
the files didn't change.

Filesystem-based backup programs won't even know that they missed
something; disk-based programs will at least have a way to know
that something happened, because they will come up with all these
orphaned inodes.  Presumably, these should get linked into lost+found.
(I haven't looked to see what *actually* happens).

Dump has the additional advantage that all the directories are read
very early, so the window of vulnerability is smaller.

Sure, I've gotten bad dumps.  In large part I think this happened
because the system mangler before me changed dump to wait for a tape
mount between pass II and pass III, and at that time tape mounts
often took hours - creating a very large window of vulnerability.

> Disk-based backup programs are normally much faster,

Making it feasible to keep one's backups more up-to-date.

Don Speck   speck@vlsi.caltech.edu  {seismo,rutgers,ames}!cit-vax!speck

dave@onfcanim.UUCP (12/22/86)

Yet another thing that can go wrong is this: dump reads the I-list, and
decides which files to write out.  By the time it begins dumping
a particular inode, that file has been re-written.  The file is a
large file; it has indirect blocks that have been released and re-allocated
like the rest of the blocks in the file.  Due to other filesystem activity,
different blocks got allocated for the indirect blocks this time.

When dump goes to read the indirect blocks (based on the old, obsolete
inode) it gets a block full of ASCII text or machine code or whatever
instead of disk block numbers.  When it interprets the data as block
number, it gets read errors trying to read ridiculous block numbers.
Someone seeing all those read errors is likely to abort the dump, if 
dump doesn't decide itself to give up.

henry@utzoo.UUCP (Henry Spencer) (12/23/86)

Another wart of dump programs that go through the filesystem is that the
access time of files becomes largely useless, since the dump program ends
up updating it on every backup.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

fnf@mcdsun.UUCP (Fred Fish) (12/24/86)

In article <1392@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (System Mangler) writes:

[ deleted stuff ]

>the backup program reads it.  The program will either get garbage,
>or EOF, but in either case it has to write SOMETHING on the tape now
>that it has committed itself by writing out a header saying that the
>next st_size bytes are the contents of the file.

I had to deal with this when I wrote "bru" (Backup and Restore Utility).
My basic strategy was that the archive would always contain exactly the
number of bytes recorded in the archived file's header block.  If the
file actually shrunk or grew it was padded or truncated appropriately,
with a warning message.  This allows bru to always depend on the
size recorded in the file header to seek to the next file header (rather
than loop reading blocks) if it wasn't interested in the current file and
the archive device supports seeking.  A big win on table of contents...

>The insidious case, though, is when subdirectories get moved out of
>a directory that hasn't been backed up yet, and into one that has
>already been done or was being skipped.  That subtree won't be restored
>at all, and won't be on a subsequent incremental tape either, because
>the files didn't change.

Yes, any sort of movement or restructuring of the file tree can confuse
per-file type backups.  My feeling is that maintenance of the tree
structure is NOT the domain of the backup utility, but should be done
with a separate utility, that keeps track of changes in the tree.
The Unisoft vchk utility is close to this, but is oriented at keeping
two systems in sync, not keeping track of changes on a single system.

-Fred

-- 
===========================================================================
Fred Fish  Motorola Computer Division, 3013 S 52nd St, Tempe, Az 85282  USA
{seismo!noao!mcdsun,hplabs!well}!fnf    (602) 438-5976
===========================================================================

fnf@mcdsun.UUCP (Fred Fish) (12/24/86)

In article <7446@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Another wart of dump programs that go through the filesystem is that the
>access time of files becomes largely useless, since the dump program ends
>up updating it on every backup.

Maybe I'm missing something, but why not just use utime(2) to reset the
st_atime and st_mtime fields, that's what bru does.  Of course, st_ctime
is not resetable, and any changes to st_atime or st_mtime by another
process while the file is being read out for backup are lost.  Backup
programs that diddle with the raw filesystem while it's active
give me the creeps...

-Fred

-- 
===========================================================================
Fred Fish  Motorola Computer Division, 3013 S 52nd St, Tempe, Az 85282  USA
{seismo!noao!mcdsun,hplabs!well}!fnf    (602) 438-5976
===========================================================================

davy@pur-ee.UUCP (Dave Curry) (12/25/86)

We do both partial and full dumps of live file systems on a regular
basis, and have had no troubles.  The tricks:

	1. Nice yourself down as far as you can.  Like -20.

	2. Modify dump (most of the mods are in dumptraverse.c) to
	   skip any inode whose mtime or ctime is greater than
	   spcl.c_date (time of the dump).

The idea here is that you dump all files which have not changed since
the dump started.  If the file changes during the dump, it will not be
looked at, and thus the problems of removed or changed files (removed
files are the worst) go away.  You MUST make these mods to dump to get
away with this sort of thing; we found out real fast in testing that
dump (actually restore) tends to get real upset if files go away when
it thinks they should be there.

The most we've seen happen doing things this way is that when restoring
from a full dump you see a few "resync restore" messages.  But we have
never had a bad dump (non-restorable) in the 14 months or so that we've
been doing this.

NOTE: I'm not necessarily recommending this practice.  If we had our way
      we'd do dumps in single-user mode.  But shutting down 20 machines
      every morning for 30-minute partials and on weekends for 2- and
      3-hour fulls is not practical.

If you want the diffs (for 4.3BSD dump), send me mail... if I get enough
requests I'll post them.

--Dave Curry
Purdue University
Engineering Computer Network

henry@utzoo.UUCP (Henry Spencer) (12/31/86)

> >Another wart of dump programs that go through the filesystem is that the
> >access time of files becomes largely useless, since the dump program ends
> >up updating it on every backup.
> 
> Maybe I'm missing something, but why not just use utime(2) to reset the
> st_atime and st_mtime fields, that's what bru does...

Then you lose st_ctime, which is infrequently used by humans, but is the
field that *backups* ought to be based on, since it captures things like
permission changes that don't alter st_mtime.  There are other things that
st_ctime is useful for also, albeit they are unusual.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

mangler@cit-vax.Caltech.Edu (System Mangler) (01/03/87)

In article <5108@pur-ee.UUCP>, davy@pur-ee.UUCP (Dave Curry) writes:
>	1. Nice yourself down as far as you can.  Like -20.

    A couple weeks ago a friend was converting a VMS BACKUP tape for me,
and as we stood around while the TU80 slowly turned, I wisecracked about
the TU80 on the nearby 4.3 BSD machine, which was turning just as slowly.
He retorted that he had reverse-nice'd /etc/dump to -20, users be damned.
Later, I found that the reverse-nice (mean?) was what made dump run so
slowly.

4.3 BSD dump uses several processes, to overlap disk and tape I/O.
Since it is I/O-bound, `usrpri' stays pretty close to PUSER+2*nice.
If nice is more negative than -6, this will be a better priority
than the flock wakeup priority, so the scheduler favors the current
process instead of waking up the next tape writer.  If nice is more
negative than -14, the current process gets priority over disk I/O
completions too.

I think the I/O wakeup priorities are much too close to PUSER; PZERO
ought to be changed to about 10.

>	2. Modify dump (most of the mods are in dumptraverse.c) to
>	   skip any inode whose mtime or ctime is greater than
>	   spcl.c_date (time of the dump).

This means that a full dump of the root filesystem will be missing
/dev/console and /dev/rmt8.  Not my idea of a useful backup...

The modifications affect only pass IV.	But that wasn't where dump
was weak.  An rm -r during passes I/II/III of stock 4.2 BSD dump
will make restore dump core.  4.3 BSD /etc/dump skips deleted files
and directories, allowing restore to get *much* further before
dumping core, and it seems to have no problem with deletions
during pass IV.

The only thing I have to add to the 4.3 BSD sanity checks would
be a warning message in dirdump(), since a deleted directory is
still quite likely to make the dump useless.

Don Speck   speck@vlsi.caltech.edu  {seismo,rutgers,ames}!cit-vax!speck