[comp.unix.admin] Why idle backups??

john@achilles.ua.oz (John Warburton) (10/22/90)

From article <547@fciva.FRANKLIN.COM>, by dag@fciva.FRANKLIN.COM (Daniel A. Graifer):
> I have written a sequence of scripts we use here to get the system into an
> assured idle state for unattended backup.  These depend on init state 4 being

[ rather a good algorithm for backups deleted]

Forgive my ignorance, but I have looked through TFM, and it says that you should
not do a dump on an active file system. That's OK, but WHY??? I can't see any
documentation as to what would happen if you did do a dump on an live file
system.

My reasoning for this is that we currently do our back ups at midnight each
night WITHOUT shutting down the system. Is this likely to cause problems with
restoring files??

Any pointers will be appreciated,
thanks

John
--
John Warburton                          Phone   : +61 8 228 5583
Department of Computer Science          Telex   : UNIVAD AA89141
University of Adelaide                  Fax     : +61 8 223 1206
GPO Box 498 Adelaide SA 5001            ACSnet  : john@cs.ua.oz
AUSTRALIA                               Internet: john@cs.adelaide.edu.au
John Warburton                          Phone   : +61 8 228 5583
Department of Computer Science          Telex   : UNIVAD AA89141
University of Adelaide                  Fax     : +61 8 223 1206
GPO Box 498 Adelaide SA 5001            ACSnet  : john@cs.ua.oz

tif@doorstop.austin.ibm.com (Paul Chamberlain) (10/22/90)

In article <1642@sirius.ucs.adelaide.edu.au> john@achilles.ua.oz (John Warburton) writes:
>... we currently do our back ups at midnight each
>night WITHOUT shutting down the system. Is this likely to cause problems with
>restoring files??

It's not exactly the same but I tend to see restoring this tape as
being as risky as having hit reset at the time of the backup.

Paul Chamberlain | I do NOT represent IBM.     tif@doorstop, sc30661 at ausvm6
512/838-7008     | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif

srm@Unify.Com (Steve Maraglia) (10/23/90)

In article <3955@awdprime.UUCP> tif@doorstop.austin.ibm.com (Paul Chamberlain) writes:
>In article <1642@sirius.ucs.adelaide.edu.au> john@achilles.ua.oz (John Warburton) writes:
>>... we currently do our back ups at midnight each
>>night WITHOUT shutting down the system. Is this likely to cause problems with
>>restoring files??
>
>It's not exactly the same but I tend to see restoring this tape as
>being as risky as having hit reset at the time of the backup.
>

About 6 months ago we started performing backups of about 30 systems
( 1 Sequent, 1 Pyramid and the rest Suns) to an Exabyte tape drive,
there all performed at night from 11:00 p.m. to around 4:00 a.m.
while the file systems are mounted.

I've done dozens of restores from single files to entire partitions 
and have had zero failures.

I use the dump & restore utilities.

-- 
Steve Maraglia				   internet: srm@unify.com
Unify Corporation			   ..!{uunet,csusac,pyramid}!unify!srm
3870 Rosin Court    Sacramento, CA 95834   (916) 920-9092

brossard@sasun1.epfl.ch (Alain Brossard EPFL-SIC/SII) (10/23/90)

In article <3955@awdprime.UUCP>, tif@doorstop.austin.ibm.com (Paul Chamberlain) writes:
> In article <1642@sirius.ucs.adelaide.edu.au> john@achilles.ua.oz (John Warburton) writes:
> >... we currently do our back ups at midnight each
> >night WITHOUT shutting down the system. Is this likely to cause problems with
> >restoring files??
> 
> It's not exactly the same but I tend to see restoring this tape as
> being as risky as having hit reset at the time of the backup.
> 

	In theory, there is a risk if directory affecting operation
are done just at the wrong time (between passes of dump), but in
practice I have never heard of an actual problem occuring.  If you
can't take that small/really tiny risk then go to single user mode.
Otherwise, in practice, you most probably will never see a failure,
especially if you do your dump overnight.  At the U. of Waterloo,
we used to do our dump (level 0) during the daytime (6-8 hours!)
and we never had a failure in the three years I was there.  And
that machine was heavily used (Computer Graphics...:-).
-- 

Alain Brossard, Ecole Polytechnique Federale de Lausanne,
	SIC/SII, EL-Ecublens, CH-1015 Lausanne, Suisse
brossard@sasun1.epfl.ch

verber@pacific.mps.ohio-state.edu (Mark Verber) (10/23/90)

> In theory, there is a risk if directory affecting operation
> are done just at the wrong time (between passes of dump), but in
> practice I have never heard of an actual problem occuring.  If you
> can't take that small/really tiny risk then go to single user mode.
> Otherwise, in practice, you most probably will never see a failure,
> especially if you do your dump overnight.

I have in practice seen a dump corrupted when run on an active file
system.  I will admit that I have only seen this happen only once over
a 10 year period.  But if that one time failure has something critical
on it you do not forget the experience of having users out for your
blood.  I certainly never intend to repeat that experience.  My policy
has been to run incremental backups on a active file system late at
night (to exebyte's from cron), but when I do level 0 dumps I drop to
single user.  This is pretty painless for me since those things that
I regularly dump at level 0 will fit on a single exebyte tape so I
have my servers shut themselves down automatically in the middle of the
night, run dump, and then come back again to multi-user mode.

Cheers,
Mark

zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky) (10/24/90)

In article <1642@sirius.ucs.adelaide.edu.au> john@achilles.ua.oz (John Warburton) writes:
>Forgive my ignorance, but I have looked through TFM, and it says that you should
>not do a dump on an active file system. That's OK, but WHY??? I can't see any
>documentation as to what would happen if you did do a dump on an live file
>system.

Dump does not go through normal filesystem primitives to read files;
it builds a table and then reads inodes direct from disk. If the file
system changes between the time it reads the tables and the time it
reads the inodes, life is not good. The result can *and* *does*
produce tapes which are unusable - not just the active file, but
every file after that inode number is lost to restore.

Some sites run a dump modified by Purdue to skip inodes that change
after the tables are read - this means that your chances of getting
a corrupt tape are significantly reduced (there are still a few really
obscure conditions that can screw life up) but on the other hand files
are likely to be missing altogether. This is not what I want out of a
level 0.

The danger of doing active dumps is *NOT* theoretical; I have seen
missing files more than once, and completely mangled tapes at least
once. It may be acceptable to you if you do frequent dumps at
low-usage times, but you should watch out for users or processes that
inadvertently get synced with your dump schedule, you should run a
Purdue modified dump if at all possible, and you should give serious
thought to running level 0s in single-user; you can do this
automatically in the middle of the night by using the same sort of
trick that fastboot uses, creating a magic file that the rc files
look for in the boot process to tell them to do backups.

Programs like tar that use normal file system acesses do not suffer as
badly from this problem - they can tell when a file is active better -
but have speed problems and frequently problems with odd files. Given
the way UNIX file systems work, even these programs cannot guarantee
correct dumps of active files (instead, they skip them). You can get
around that by rewriting the file system, but that's not always an
available option...

	Elizabeth Zwicky
	zwicky@erg.sri.com

koppenh@informatik.uni-stuttgart.de (Andreas Koppenhoefer) (10/24/90)

In article <32749@sparkyfs.istc.sri.com> zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky ) writes:

> [...]
> Some sites run a dump modified by Purdue to skip inodes that change
> after the tables are read - this means that your chances of getting
> a corrupt tape are significantly reduced [...].

Hmm..., sounds good. In my anonymous ftp lists there are a lot of
purdue hosts. Please would someone tell me, where I can get the
source? I suppose it's in public domain.

Thank you,

	Andreas Koppenhoefer
--
Andreas Koppenhoefer, Student der Universitaet Stuttgart, BR Deutschland 
mail:   koppenh@dia.informatik.uni-stuttgart.dbp.de
privat: Belaustr. 5/3, 7000 Stuttgart 1
        Telefon: +49 711 694111 (Mo-Do 18-22h MEZ/MESZ)

emcguire@ccad.uiowa.edu (Ed McGuire) (10/24/90)

In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
> The danger of doing active dumps is *NOT* theoretical; I have seen
> missing files more than once, and completely mangled tapes at least
> once. It may be acceptable to you if you do frequent dumps at
> low-usage times

We do that.  Now is there any easy way to validate an active dump?
I have in mind something on the order of attempting an interactive
restore of the last file dumped.  I'd be curious to see just what
fraction of our dumps are invalid due to a change in the file system
during the dump.  If I find that there are too many, then I will
probably go to single-user mode dumps.
-- 
peace.  -- Ed
"Vote.  Because it's the Right Thing."

ping@cubmol.bio.columbia.edu (Shiping Zhang) (10/25/90)

In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>
>Purdue modified dump if at all possible, and you should give serious
>thought to running level 0s in single-user; you can do this
>automatically in the middle of the night by using the same sort of
 ^^^^^^^^^^^^^
>trick that fastboot uses, creating a magic file that the rc files
>look for in the boot process to tell them to do backups.
>

I have seen this suggestion more than once on this news group.
But I don't know how. One question I have is about the tape changes.
Certainly one tape is not enough for a zero level backup.
How to get away with this problem?  Thanks for any enlightenment.

-ping

zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky) (10/25/90)

In article <1990Oct24.151840.25570@ccad.uiowa.edu> emcguire@ccad.uiowa.edu (Ed McGuire) writes:

>Now is there any easy way to validate an active dump?
>I have in mind something on the order of attempting an interactive
>restore of the last file dumped. 

No, there isn't any easy way to validate it - unless you consider
doing a full restore easy. There are basically four ways in which the
tape can be screwed:
	1) Some individual file may be missing or damaged; without
attempting to restore that particular file, you will never know.
	2) Some individual file may be damaged so that any attempt to
read it confuses restore permanently. Again, unless you attempt to
read *that* file, you will never know; things after it on the tape are
perfectly accessible, as long as you don't read it first. (Since
restore doesn't tell you what it's trying to restore, only what it has
finished restoring, if you run into one of these when trying to
restore, you get to play binary search, doing add and extracts on
subsets of your original file list until you have everything but the
bad one. Ick.)
	3) At some point, some file may be screwed enough to corrupt
everything after it - this one you will catch by interactively
restoring the last file. 
	4) There may be physical write or read errors on the tape.
These will generally be caught by the scanning necessary to find a
file, so you will usually see indication of them if you try the
interactive restore. However, you won't know what they've eaten.

So you only catch half the possible kinds of error; since error number
one is the most common case for active files, you end up catching less
than half of all the errors. There are methods that check a dump
against a file system without requiring you to do a restore, which are
really useful for testing modifications to dump, but worthless for
verifying live dumps, since they will report that the dump is
incorrect if it doesn't match the disk - which it certainly won't, if
the disk is active.

	Elizabeth Zwicky

zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky) (10/27/90)

In article <1990Oct24.210312.3271@cubmol.bio.columbia.edu> ping@cubmol.bio.columbia.edu (Shiping Zhang) writes:
>In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>>You should give serious
>>thought to running level 0s in single-user; you can do this
>>automatically in the middle of the night by using the same sort of
>>trick that fastboot uses, creating a magic file that the rc files
>>look for in the boot process to tell them to do backups.

>I have seen this suggestion more than once on this news group.
>But I don't know how. One question I have is about the tape changes.
>Certainly one tape is not enough for a zero level backup.
>How to get away with this problem?  Thanks for any enlightenment.

Step 1: Set up your full saves so you don't have to change tapes.
The traditional way to do this is to buy high-capacity tape drives -
many people find that a whole machine can level 0 onto a 2 gigabyte
exabyte tape. For people like me who no longer have that luxury,
you can either back up different file systems to different tape drives
on the same night, or back up different file systems to the same tape
drive on different nights. There are hooks in the script I use to
do either one, although I actually do the latter. If you have single
file systems that won't fit on single tapes, you will have to buy
new hardware, re-partition your disks, or give up.

Step 2: Modify whatever script it is that your system runs during the
boot process (rc.local is a good choice on a Sun) to insert a check
for the existence of a special file - I use /backup - and execute the
backup script of your choice if it exists. You will need to figure out
where you can put this test so that enough of the machine is running
so that you can do the backups (we use remote tape drives, and also
need /usr/local, which is NFS mounted on many machines, to get the
scripts out of), but not enough so that the users can make the file
systems active. 

Step 3: When you intend to do a backup, set an at or cron job to
create /backup and schedule a shutdown with about an hour's lead time.
Note: You don't want to create /backup at 5 when you leave and set
the shutdown off, because if the machine crashes at 5:15 it will do
the full save, which will significantly delay its reboot process.

This trick was invented by Steve Romig, at Ohio State, although
probably other people have also invented it independently; it's not
all that tricky. It is discussed, along with lots of other fascinating
backup tricks, in his paper in the latest Large Installation System
Administration conference proceedings; get the conference proceedings
from the Usenix association, or you can probably get just the paper
from him, romig@cis.ohio-state.edu.

	Elizabeth D. Zwicky (zwicky@erg.sri.com)

garyb@gallium.UUCP (Gary Blumenstein) (10/28/90)

In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:

>[..]
>Some sites run a dump modified by Purdue to skip inodes that change
>after the tables are read  [..]

Does anyone have the sources for this?  Would somebody be willing to send it
to me if I snail-mail an 8mm or 1/4" tape?

Thanks,

--
Gary M. Blumenstein, UNIX System Administrator
______________________________________________________________________________
  ___         United Parcel Service, Research & Development 
 |___|         51-53 Kenosia Ave.  Danbury, CT  06810-7326
 |ups|   
 ` _ '     /* We run the tightest ship in the shipping business. */
------------------------------------------------------------------------------

edguer@charlie.CES.CWRU.Edu (Aydin Edguer) (10/30/90)

In article <32762@sparkyfs.istc.sri.com> zwicky@pterodactyl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>In article <1990Oct24.210312.3271@cubmol.bio.columbia.edu> ping@cubmol.bio.columbia.edu (Shiping Zhang) writes:
>>In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>>>You should give serious
>>>thought to running level 0s in single-user; you can do this
>>>automatically in the middle of the night by using the same sort of
>>>trick that fastboot uses, creating a magic file that the rc files
>>>look for in the boot process to tell them to do backups.
>
>>I have seen this suggestion more than once on this news group.
>>But I don't know how. One question I have is about the tape changes.
>>Certainly one tape is not enough for a zero level backup.
>>How to get away with this problem?  Thanks for any enlightenment.

I would like to suggest that going to single-user for a level 0 backup is
unnecessary.  All that really needs to be done is to unmount the (4.2/ufs)
file system.  This means that the computer can continue to function and
even serve diskless clients (as long as you are not backing up /export).
This permits the whole thing to be done in a script from "cron" or "at" without
mucking with your rc files.  The only file system that may not be backed
up in this manner is the root partition.  /usr is normally (SunOS4.1) read-only
and thus does not need to be backed up.

Aydin Edguer

dick@cca.ucsf.edu (Dick Karpinski) (10/31/90)

In article <32757@sparkyfs.istc.sri.com> zwicky@pterodactyl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>
>No, there isn't any easy way to validate it - unless you consider
>doing a full restore easy. There are basically four ways in which the
>tape can be screwed:

In principle, your dump program could take checksums of each block of
each file during its dump and then again from the disk after the dump.
This could catch flaws arising from dumping a live file system, though
fixing those flaws requires also a way to rewrite the dump tape or to
note that a file image is damaged.  Rereading the tape is necessary
and sufficient to detect writing and media errors.

Since almost all files are small enough to hold in RAM, you could even
employ two strategies, a quick one for little files and a slow, robust
one for the few large files.  Strategies that include file system mods
have more opportunities to be quick and efficient, but safe live dumps
need not require them.

Dick

zwicky@sparkyfs.istc.sri.com (Elizabeth Zwicky) (10/31/90)

In article <1990Oct29.225451.29481@usenet.ins.cwru.edu> edguer@charlie.CES.CWRU.Edu (Aydin Edguer) writes:
>I would like to suggest that going to single-user for a level 0 backup is
>unnecessary.  All that really needs to be done is to unmount the (4.2/ufs)
>file system.  This means that the computer can continue to function and
>even serve diskless clients (as long as you are not backing up /export).
>This permits the whole thing to be done in a script from "cron" or "at" without
>mucking with your rc files.  The only file system that may not be backed
>up in this manner is the root partition.  /usr is normally (SunOS4.1) read-only
>and thus does not need to be backed up.

You can't unmount an active file system except by rebooting the
machine - so if you need to guarantee the unmount will succeed, you're
back to rebooting, and fiddling with your rc files. (This should
no longer be true in BSD 4.4 release, but that's not going to do you
much good.)

/usr is read only for *clients*, not usually on the server itself. You
don't have to level 0 it if you never change it, and you don't mind
doing an OS re-install if you lose the disk. I don't like OS
re-installs, and I do install OS patches. I back up everything - if
all my disks melt down I'm going to have enough to worry about without
trying to recall exactly how to reinstall and recustomize my OS.
Actually, I don't back up swap partitions, because most of servers
have a gig devoted to client swap, and that's an awful lot of tape
when I don't actually care about the data; since all the swaps are
identical anyway, reconstructing them is no big deal for me. If we
were running a whole lot of different sizes, it might be worth the
tape even for that.

	Elizabeth Zwicky

perl@step.UUCP (Robert Perlberg) (11/01/90)

In article <1990Oct24.210312.3271@cubmol.bio.columbia.edu>, ping@cubmol.bio.columbia.edu (Shiping Zhang) writes:
> In article <32749@sparkyfs.istc.sri.com> zwicky@quetzalcoatl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
> >
> >Purdue modified dump if at all possible, and you should give serious
> >thought to running level 0s in single-user; you can do this
> >automatically in the middle of the night by using the same sort of
>  ^^^^^^^^^^^^^
> >trick that fastboot uses, creating a magic file that the rc files
> >look for in the boot process to tell them to do backups.
> >
> 
> I have seen this suggestion more than once on this news group.
> But I don't know how. One question I have is about the tape changes.
> Certainly one tape is not enough for a zero level backup.
> How to get away with this problem?  Thanks for any enlightenment.
> 
> -ping

Well, if you really have to change tapes, you're probably out of luck.
I'm assuming that you are using 6250 BPI 1/2" tape.  I don't know what
kind of machine you are using, but you might be able to get an 8mm tape
drive for it.  An 8mm drive can store about 2.3 Gigabytes on one tape.
We have a network with 2 Sun servers with 1 Gig of disk each both of
which get backed up to one 8mm tape.  If one 8mm tape is not big enough
for you, there is a company which sells an 8mm "jukebox" -- an 8mm
drive with a carousel which allows a number of 8mm tapes to be changed
automatically.

Robert Perlberg
Dean Witter Reynolds Inc., New York
{murphy | philabs | chuo}!step!perl
	-- "I am not a language ... I am a free man!"

chris@mimsy.umd.edu (Chris Torek) (11/01/90)

In article <32749@sparkyfs.istc.sri.com> zwicky@sparkyfs.istc.sri.com
(Elizabeth Zwicky) answers the `subject' question.  Five articles later...

In <KOPPENH.90Oct24113316@dia.informatik.uni-stuttgart.de>
koppenh@informatik.uni-stuttgart.de (Andreas Koppenhoefer), and in
<339@gallium.UUCP> garyb@gallium.UUCP (Gary Blumenstein), ask for the
Purdue mods mentioned.

Equivalent mods are already included in recent versions of `dump' (as
distributed by Berkeley since 4.3-tahoe if not earlier, and Sun since
4.0.3 if not earlier, and presumably DEC by 2001 if not earlier :-) ).
The actual changes are:

 1. Add a `dirdump' routine to dumptraverse; use this in
    dumpmain for pass III (directory dump) by changing

	pass(dump, dirmap);

    to

	pass(dirdump, dirmap);

    where dirdump(ip) simply calls dump(ip) if and only if
    (ip->di_mode & IFMT) == IFDIR.

    This prevents `restore' from seeing a regular file in the middle
    of the directory listing, which hopelessly confuses old versions
    of restore (and possibly new ones as well).  Such things happen if
    a directory is deleted and its inode reused as a regular file before
    dump manages to reach it.  (More on this below.)

 2. Add code to dump() (also in dumptraverse.c) to skip a file if its
    mode (ip->di_mode) is 0, i.e., the inode is no longer in use.  This
    happens whenever a file or directory is deleted and the inode is
    *not* reused.

In <1990Oct24.210312.3271@cubmol.bio.columbia.edu>
ping@cubmol.bio.columbia.edu (Shiping Zhang) asks how to put a complete
backup onto no more than one tape.  This is easily accomplished by
buying an 8mm Exabyte drive, unless you have disks that hold more than
2 GB.  (DAT drives will also work but hold less data, and the things
cost more.  New Exabyte hardware that stores over 4 GB per tape is now,
or will soon be, available as well.)

In <1990Oct24.151840.25570@ccad.uiowa.edu> emcguire@ccad.uiowa.edu (Ed
McGuire) asks about validating a dump.  This is difficult, as Elizabeth
Zwicky describes in <32757@sparkyfs.istc.sri.com>:

>1) Some individual file may be missing or damaged; without
>attempting to restore that particular file, you will never know.

It would not be difficult, although restore does not do this now, to
write a program that compares the maps at the front with the inode
special records to verify that all files exist on the tape.  Files
that were removed and not replaced, or directory files that were removed
and were replaced with something other than another directory, will of
course be `missing'.

>2) Some individual file may be damaged so that any attempt to
>read it confuses restore permanently.

Any such thing points to a bug in restore.  Restore should be (but
perhaps is not) able to recover from such things.  Naturally, such a
damaged file will itself not be recoverable.

These events [>1)] and [>2)] are most likely to happen when a file
changes size while that file is being dumped.  (Dump reads the inode,
then the direct block contents, then the indirect blocks and their
contents, all the while assuming that this data is valid.)  This should
merely cause the tape data to be invalid, and should not give restore
fits.  Note that restoring such a file could breach security: e.g., the
sequence of events could be:
 A. dump discovers a 100 MB file
 B. dump begins dumping the file
 C. the file is truncated
 D. the blocks for that file are allocated to a new, high-security
    (mode 0600) file owned by someone else
 E. dump finishes dumping the file.
The resulting tape holds up to 100 MB of high-security file contents
attached to the original user id.  When restored, the 100 MB file
`reappears' but its contents differ from the original.

>(Since restore doesn't tell you what it's trying to restore, only what
>it has finished restoring, if you run into one of these when trying to
>restore, you get to play binary search, doing add and extracts on
>subsets of your original file list until you have everything but the
>bad one. Ick.)

Actually, you can run a `restore iv', add what you like, `extract', and
note the name and/or inode number of the last file printed.  Then run
`restore t | sort -n' and look at the next higher inode number.  This is
the file that is causing restore to hang up.  (`restore rv' will also work.
Be sure to use a CRT so as not to waste paper.)

>3) At some point, some file may be screwed enough to corrupt
>everything after it ...

Again, this should never happen (but probably can).  In particular, this
used to happen with the 4.2BSD dump/restore when the pass(dump, dirmap)
wound up dumping a regular file (see `1.' near top of this article);
this has been fixed.

>4) There may be physical write or read errors on the tape.

Good hardware will detect these while the tape is being written, though
of course marginal defects may escape notice the first few times.

In another article which I foolishly forgot to note, Dick Karpinski
suggests that dump ought to be able to (slowly) produce a correct dump
even when the file system is active, perhaps (my interpretation) by
using some other algorithm.  The answer to this is `no and yes': it
could, but only by using a staging area at least as large as the final
backup, and potentially unbounded time.  The reason for this is simple,
though the details are not.

The tapes produced by dump are intended to be a complete snapshot of
the state of the file system, but are ordered so that restores are not
too difficult, without being ordered so strongly that dumps are slow.
(Some may argue with the latter statement. :-) )  To this end, the
contents of an infinitely long tape are:

 A. A `TS_TAPE' record naming dump time, level, etc.

 B. A bitmap of clear inodes (i.e., those that are not holding any file,
    of any kind).  This is used to tell which files have been removed
    since the previous dump (so that `restore r' can put things back as
    they were).  This is prefixed by a `TS_CLRI' record.

 C. A bitmap of set inodes (those that are holding files).  This is
    prefixed by a `TS_BITS' record.

 D. All the directories needed to produce complete path names to all the
    files on the tape.  These are a series of (TS_INODE,blocks,TS_ADDR,
    blocks,TS_ADDR,blocks,...) records, where each TS_INODE or TS_ADDR
    record contains enough information to tell how many `blocks' appear
    on the tape.  (Holes in files result in non-written `blocks', i.e.,
    a file consisting entirely of a hole has only TS_INODE and perhaps
    TS_ADDR records.)

  E. All the files being dumped (see item C above).

  F. A `TS_END' record.

The boundary between directories and files is defined implicitly by the
first non-directory on the tape.  This is why the `dirdump' routine is
so important for active dumps.  Restore would have to be made much
smarter to recover from `embedded' files in the directory area, and
would still have to read the entire dump, not just the directory part,
to be sure it got them all.

If a dump requires more than one tape, each tape after the first begins
with a TS_TAPE record followed by the same bitmap as in C above.  (In
theory this allows restore to `pick up' in the middle.  In actuality, a
data block which sufficiently resembles a TS_INODE record will fool a
restore that is doing this.  The 4.3-reno dump has a DR_NEWHEADER flag
and new fields in the TS_TAPE record that tell how far restore has to
go to get to a real TS_INODE record, which avoids this problem.)

Dump decides which files (including directories) to dump by checking
the inode times (atime, mtime, ctime, although the ctime alone should
suffice).  It reads a bunch of inodes from the raw disk device and
pokes through them, reads another bunch, etc., until it has read them
all.  Each file that must be dumped sets a bit in the `files to dump'
map.  This is `pass I (regular files)'.

Next dump scans through all the inodes again, this time checking to see
if it needs to add any parent directories so as to reach the marked
inodes.  It loops doing this secondary scan until nothing more is
marked.  This is `pass II (directories)', and this is why pass II is
usually run three or four times.  (To make it run lots of times, mkdir
a a/b a/b/c a/b/c/d a/b/c/d/e a/b/c/d/e/f a/b/c/d/e/f/g a/b/c/d/e/f/g/h
and do a full backup, then touch a/b/c/d/e/f/g/h/i and do an
incremental backup.)  I added a hack, included in the latest BSD dump,
that avoids pass II entirely if all directories are being dumped (this
speeds up all level 0 dumps).  (To make it pretty, it still claims to
run pass II.  You can tell that you have this version by the fact that
`dump 0 ...' prints `pass I', runs for a while, then prints `pass II'
and `pass III' without pausing in between.)  If a file with several
links changes, all directories leading to it are put on the tape.

In pass III, dump actually writes all those directories it marked in
passes I and II to the tape, and in pass IV, dump writes all the other
files it marked (including devices and symlinks).

In order to make a consistent backup, dump would have to:

 1. Scan the disk for files to back up.
 2. Write the backup to a staging area.
 3. Use file-system calls (lstat()) to check up on everything
    written to the staging area.
 4. For each file changed since part 1, replace its backup in the
    staging area, and add any new directories required.  For each
    file deleted since part 1, effectively remove it from the staging
    area.  Repeat from 3. until no files have changed or been removed.
 5. Dump the staging area to the backup device.  The date of this
    dump would be the time at which the final scan in step 3 (the
    one that found no changes) began.

A much simpler method would be to freeze activity on the file system
being dumped.  A `freezefs' system call is being contemplated.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

del@thrush.mlb.semi.harris.com (Don Lewis) (11/01/90)

In article <27337@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>t24.151840.2
>Dump decides which files (including directories) to dump by checking
>the inode times (atime, mtime, ctime, although the ctime alone should
>suffice).

One feature that I think would be useful would be to have another time
field in the inode (wtime?), that would only be updated by write()
(and ftruncate?).  Write() would also update ctime, as would link(),
unlink(), utimes(), chmod(), and chown().  If only ctime is newer than
the the previous dump time, then only the inode itself and not the
file data would need to be dumped.  If wtime is newer than the previous
dump time, then both the inode and data would need to be dumped.  This
would reduce the amount of data that needs to be dumped if
{chown,chgrp,chmod} -R are used.  The downside would be that it would be
more difficult to locate the tape containing the latest version of a
file to be restored (and you might have to still chmod/chgrp/chown it).

chmod -R  B-(
-- 
Don "Truck" Lewis                      Harris Semiconductor
Internet:  del@mlb.semi.harris.com     PO Box 883   MS 62A-028
Phone:     (407) 729-5205              Melbourne, FL  32901

chris@mimsy.umd.edu (Chris Torek) (11/02/90)

In article <27337@mimsy.umd.edu> I wrote:
>... This prevents `restore' from seeing a regular file in the middle
>of the directory listing, which hopelessly confuses old versions
>of restore (and possibly new ones as well).

That really ought to be `probably' (I was thinking of hacking restore
to make it scan on a few more files after the first `file' to see if
maybe there are some more directories around---this would not recover
from all such errors, but would handle most).

>  E. All the files being dumped (see item C above).

That should have been `item D'.

>... which ... that ...

Oops, was not careful to use `that' when defining and `which' when
describing.  (I was pressed for time---wanted to be out in time to
join the grad students in a small Halloween excursion....)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

craig@attcan.UUCP (Craig Campbell) (11/03/90)

In article <32777@sparkyfs.istc.sri.com> zwicky@pterodactyl.erg.sri.com.UUCP (Elizabeth Zwicky) writes:
>In article <1990Oct29.225451.29481@usenet.ins.cwru.edu> edguer@charlie.CES.CWRU.Edu (Aydin Edguer) writes:

>You can't unmount an active file system except by rebooting the
>machine - so if you need to guarantee the unmount will succeed, you're
>back to rebooting, and fiddling with your rc files. (This should
>no longer be true in BSD 4.4 release, but that's not going to do you
>much good.)

>	Elizabeth Zwicky

Wait a minute! Halt!  Wow!!

While it is true that you can't unmount an active file system (that's sort of
part of the definition), you can MAKE a file system inactive.  The fuser(1M)
command (sys V) will kill (if requested) all processes using the specified
resource (file system).  Once the resource is inactive, it can be unmounted.

Obviously, one would not want to attempt this on the root (or boot) file
system.

craig

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <27337@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
> A much simpler method would be to freeze activity on the file system
> being dumped.  A `freezefs' system call is being contemplated.

What would the semantics be? Presumably any process writing to that fs
would be paused in kernel mode until unfreezefs, and the disk would be
synced. Would freezes work like locks, and be released if the process
dies? What about deadlock detection? What happens to kernel writes, to,
e.g., accounting files?

If you're going to have freezefs, why not freezedir? freezefile? Why not
make mandatory write locks available throughout the system? What about
mandatory read locks? Do the applications (e.g., reliable ``find'')
outweigh the risks? Should you only be able to freeze files you own?
Should only root be able to freeze files?

Presumably NFS will muck this up, like mostly everything else. How bad
would the incompatibility be? Would it help if NFS were replaced by a
sane remote file system?

---Dan

les@chinet.chi.il.us (Leslie Mikesell) (11/07/90)

In article <12434:Nov603:25:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <27337@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>> A much simpler method would be to freeze activity on the file system
>> being dumped.  A `freezefs' system call is being contemplated.
>
>What would the semantics be? Presumably any process writing to that fs
>would be paused in kernel mode until unfreezefs, and the disk would be
>synced. Would freezes work like locks, and be released if the process
>dies? What about deadlock detection? What happens to kernel writes, to,
>e.g., accounting files?

A more pleasant scenario would be to have a 'readfrozen' system call
as well to bypass the common disk buffer cache which would be allowed
to grow and page into swap space for the duration.  'Unfreezefs' would
then perform a sync, collapsing the cache back to its normal size.
Other processes would not need to pause unless the swap space becomes
full. Some care would have to be taken to avoid pausing the program
that issued the freezefs call in this situation, but that doesn't sound
impossible.

>If you're going to have freezefs, why not freezedir? freezefile? Why not
>make mandatory write locks available throughout the system? What about
>mandatory read locks? Do the applications (e.g., reliable ``find'')
>outweigh the risks? Should you only be able to freeze files you own?

Freezedir would be pretty messy since you would essentially have to do
a "find" as an atomic operation.  Freezefile(list_of_files or inodes)
would suffice for the usual problem of getting a consistent snapshot
of a multi-file database, although only the application writing them
would know the proper time to issue the call (this, of course applies
to freezefs as well and is probably the reason nobody bothers doing it).
SysVr3 has the mandatory locking scheme - has anyone used it?

>Presumably NFS will muck this up, like mostly everything else. How bad
>would the incompatibility be? Would it help if NFS were replaced by a
>sane remote file system?

Wouldn't freezefs have to be provided at the server end? 

Les Mikesell
  les@chinet.chi.il.us