[comp.unix.admin] SUMMARY: Backup while in multi-user mode

baier@unipas.fmi.uni-passau.de (Joern Baier) (05/20/91)

Two weeks ago I posted an article concerning the problems which can arise
doing a dump in multi-user mode.

Nearly everyone who answered pointed out that at his site the backups are
run while in multi-user mode  but nobody has already observed a serious error
as a result of this policy.

Problems may occur when a file or a directory has been deleted between two
passes of the dump.

Peter Renzland <peter@ontmoh.uucp> explained this in great detail so I will
cite him here in full length:
>...
>One thing that can happen is that a file is deleted, and the space is
>re-allocted to a new file between the time the Inode for the old file
>is written to tape and the data for what used to be the old file is written.
>This data now belongs to another file.  This is no problem until you try
>to restore.  Now the old inode points to data blocks which not only contain
>data that could be none of the business of the owner of the old file, but
>there is now a file-system inconsistency, because two files (inodes) now
>point to the same data, and the old one could have had indirect blocks,
>which may result in filesystem corruption that extends beyond just the old
>file, and the (few) new file(s) that were created during the backup window
>of vulnerability.
>
>All this is possible because dump bypasses the filesystem, for added
>speed (and diminished integrity).
>...

According to Alain Brossard (brossard@sasun1.epfl.ch) it is also possible
that the entire dump will become unreadable if the freed inode had been
a directory and is now a file (or vice versa?) but this seems to be very
unlikely.

Thanks to all who answered.

Joern.
--
Joern Baier     (baier@unipas.fmi.uni-passau.de) 
Jesuitengasse 9 
D-W8390 Passau 
Tel.:   +49/851/35239

zwicky@erg.sri.com (Elizabeth Zwicky) (05/21/91)

In article <1991May20.123129.14433@forwiss.uni-passau.de> baier@unipas.fmi.uni-passau.de (Joern Baier) writes:
>Nearly everyone who answered pointed out that at his site the backups are
>run while in multi-user mode  but nobody has already observed a serious error
>as a result of this policy.

Sometimes I feel that I'm doomed to spend my life repeating this, but
here goes:

Yes, it does cause problems, I have seen it do so with my very own
eyes more than once. Depending on the version of dump you are running,
these problems range from files that are not on the tape to unusable
tapes. Most people do very few restores; many of the people who have
never had a problem have never done a full restore, either. 

I have seen files come up missing; I have also seen someone do a full
restore on a filesystem which fsck then deleted. If you cannot risk
having a backup be bad, don't do it in multi-user. You can probably
risk having your daily backups be bad. You probably can't risk all
your backups. This is why many of us do some backups in multi-user and
some in single-user. 

	Elizabeth Zwicky
	zwicky@erg.sri.com

jay@silence.princeton.nj.us (Jay Plett) (05/21/91)

In article <1991May20.204327.17694@erg.sri.com>, zwicky@erg.sri.com (Elizabeth Zwicky) writes:
> In article <1991May20.123129.14433@forwiss.uni-passau.de> baier@unipas.fmi.uni-passau.de (Joern Baier) writes:
> >Nearly everyone who answered pointed out that at his site the backups are
> >run while in multi-user mode  but nobody has already observed a serious error
> >as a result of this policy.
 ...
> Yes, it does cause problems, I have seen it do so ...
> ... Most people do very few restores; many of the people who have
> never had a problem have never done a full restore, either. 
I have done full restores.  Not a lot of them, relative to the number
of dumps.  I have never had a problem restoring a dump made on a live
filesystem.  This does not imply that I never will.

> ... If you cannot risk
> having a backup be bad, don't do it in multi-user. 
Good advice.

> You can probably
> risk having your daily backups be bad.
Ah, there's the point.  If you can _risk_ losing one or two days work,
then do daily level 0s on live filesystems.  This is the beauty of
Exabytes--it is feasible to do so.  If a tape is bad at restore time,
toss it and go back a day.  If that one is bad, go back another day.
The risk dimishes greatly with each day you go back.

Look at the odds.  The probability of a disk crash on any particular
day is really very small.  The probability of a bad level 0 done on a
live filesystem might be larger, but it's still small.  The probability
of two successive bad tapes is smaller.  Apply your favorite function
to calculate the probability of a bad tape coinciding with both of the
two days before a crash, and decide if that risk is acceptable to your
users.  Balance the risk against the cost to your users of routinely
shutting down for backups.  Don't forget to evaluate the possibility
that a dump of an idle system might also be unrestorable.

I believe that--for many sites--the advantages of dumping live filesystems
outweigh the disadvantages.

	...jay

tjc@ecs.soton.ac.uk (Tim Chown) (05/21/91)

In <1991May20.204327.17694@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes:

>I have seen files come up missing; I have also seen someone do a full
>restore on a filesystem which fsck then deleted. If you cannot risk
>having a backup be bad, don't do it in multi-user. You can probably
>risk having your daily backups be bad. You probably can't risk all
>your backups. This is why many of us do some backups in multi-user and
>some in single-user. 

Another solution, if you are backing up user files and only have
a few hundred Mb to do overnight, is to simply backup with 'tar'
rather than 'dump'.  We use that method to do 300Mb of user files
from a Masscomp onto an Exabyte on a Sun and have never had a
problem - the tar takes nearly four hours (it would take less
on a local device).

	Tim
--

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (05/21/91)

In article <690@silence.princeton.nj.us> jay@silence.princeton.nj.us (Jay Plett) writes:
   Look at the odds.  The probability of a disk crash on any particular
   day is really very small.  The probability of a bad level 0 done on a
   live filesystem might be larger, but it's still small. The probability
   of two successive bad tapes is smaller.

This last statement is only true if you assume that bad dumps are
unrelated. This is a false assumption. Given that someone was doing
something that caused a dump to be bad one day, I'd say the
probability of them having done that the previous day is larger than
the probability that a dump would be bad.

And the longer they've been doing it, the more likely it is that the
dump is bad. For instance, if you find that every dump made on a
weekday night for the last month is bad, which way would you bet on
the dump for tonight, if it were a weekday?

You're right - a stable file system doesn't guarantee that you can do
a restore. It eliminates one source of problems, and one that can be
set up to occur on a regular basis, at that.

	<mike
--
But I'll survive, no you won't catch me,		Mike Meyer
I'll resist the urge that is tempting me,		mwm@pa.dec.com
I'll avert my eyes, keep you off my knee,		decwrl!mwm
But it feels so good when you talk to me.

zwicky@erg.sri.com (Elizabeth Zwicky) (05/22/91)

The "we'll just do live backups over and over again" theory suffers
from a common problem with security through redundancy; common mode
failures. All the backups may fail the same way if the same file is
always active when they're running. The easiest way to do this is to
accidentally get the backups synchronized with a cron job or a very
predictable human, but you can get the same effect with a very
long-running program. Thus, you can backup something a few hundred
times and have all the backups missing the same file. This is Not Fun.

Using tar instead of dump buys you extremely little. tar will skip
active files, which means they won't corrupt your backup. This is its
sole advantage, and its only an advantage over some versions of dump.
It will *also* skip files with names that are too long; depending on
the version of tar you are running, it may also exhibit various nasty
other problems dump doesn't have. On the whole, dump is safer.

	Elizabeth Zwicky
	zwicky@erg.sri.com

russell@ccu1.aukuni.ac.nz (Russell J Fulton;ccc032u) (05/22/91)

zwicky@erg.sri.com (Elizabeth Zwicky) writes:

>Using tar instead of dump buys you extremely little. tar will skip
>active files, which means they won't corrupt your backup. This is its
>sole advantage, and its only an advantage over some versions of dump.
>It will *also* skip files with names that are too long; depending on
>the version of tar you are running, it may also exhibit various nasty
>other problems dump doesn't have. On the whole, dump is safer.

Would some knowledgable person care to comment on bru in light of Elizabeth's
comments above. We use bru to back up our SGI 4D/240S with 6GB disk (five
1.2 GB drives) We back up one drive a night and do an incremental on the
rest. The system is usually fairly quiet when the backup is done (in the
small hours) with only a small number of batch jobs active.

We have had no trouble, yet, and have had to restore a disk on two occasions
in the last year.

Cheers, Russell.

-- 
Russell Fulton, Computer Center, University of Auckland, New Zealand.
<rj_fulton@aukuni.ac.nz>

verber@pacific.mps.ohio-state.edu (Mark Verber) (05/22/91)

In article <690@silence.princeton.nj.us> jay@silence.princeton.nj.us (Jay Plett) writes:

   Ah, there's the point.  If you can _risk_ losing one or two days work,
   then do daily level 0s on live filesystems.  This is the beauty of
   Exabytes--it is feasible to do so.  If a tape is bad at restore time,
   toss it and go back a day.  If that one is bad, go back another day.
   The risk dimishes greatly with each day you go back.

Don't bet on it.  Lets say that you are running your backups from cron
-- most of us with exebytes do.  Suppose you have something else
running in cron, or like my site, a user process that runs for days at
a time which does all sorts of i/o.  Lets say the i/o going on when
dump runs happens to be doing just the wrong kind -- eg your dump is
corrupted.  Every dump you take could be screwed -- redundancy didn't
win you much, did it.

I understand the desire to do dumps on active file systems for daily
incrementals... but what do people have against doing the level 0
dumps in single user.  Can't you afford a few hours downtime in the
middle of the night once a month to insure a clean dump?  You don't
even have to be around while the dumps are running with exebytes since
you don't have to change tapes or if your full saves won't fit on the
drive(s) you have, get a stacker.

sigh,
mark

jeffl@NCoast.ORG (Jeff Leyser) (05/22/91)

In post <1991May21.172208.281@erg.sri.com>, zwicky@erg.sri.com (Elizabeth Zwicky) says:
!!  [Redundant dumps don't buy you much]
!!Using tar instead of dump buys you extremely little. tar will skip
!!active files, which means they won't corrupt your backup. This is its
!!sole advantage, and its only an advantage over some versions of dump.
!!It will *also* skip files with names that are too long; depending on
!!the version of tar you are running, it may also exhibit various nasty
!!other problems dump doesn't have. On the whole, dump is safer.

What about find | cpio?  We do that here to backup 1.5GB to a single exabyte
during a "quiet" period on Sunday.  Multi-user, but quiet.
-- 
Jeff Leyser                                     jeffl@ncoast.org
Opinions?  I thought this was typing practice!  leyser@tsa.attmail.com

rsk@gynko.circ.upenn.edu (Rich Kulawiec) (05/22/91)

In article <VERBER.91May21181220@avalon.mps.ohio-state.edu> verber@pacific.mps.ohio-state.edu (Mark Verber) writes:
>I understand the desire to do dumps on active file systems for daily
>incrementals... but what do people have against doing the level 0
>dumps in single user.  Can't you afford a few hours downtime in the
>middle of the night once a month to insure a clean dump?

The answer to that last question, for some folks in certain environments,
is "no, we can't".  But if you are running 4.3BSD or a derivative thereof,
you are probably running a version of dump(8) that has modifications
made at BRL (by Doug Gwyn, I think), Purdue EE (George Goble),
Purdue CS (Dan Trinkle) and Purdue CC (me).  Some of those modifications
are intended to prevent dump from being confused by an active filesystem.

Those mods aren't bulletproof -- but in about five years of using this version
of dump on many machines (VAX, Sun-3, Sun-4, MIPS, Pmax, etc.) I've
never encountered a dump that I couldn't restore, i.e. that wasn't
self-consistent.  That doesn't mean that all such dumps were "complete",
especially since the meaning of "complete" gets fuzzy when we attempt
to apply that term to an active filesystem; but it does mean that I
had what I needed to recover from crashes.

--

---Rsk
rsk@gynko.circ.upenn.edu

verber@pacific.mps.ohio-state.edu (Mark Verber) (05/22/91)

In article <43617@netnews.upenn.edu> rsk@gynko.circ.upenn.edu (Rich
Kulawiec) writes in responce to my question about doing level 0 dumps
in single user mode using BSD 4.3 dump+purdue+brl hacks which are
less likely to have problems with active file system:

    Those mods aren't bulletproof -- but in about five years of using this
    version of dump on many machines (VAX, Sun-3, Sun-4, MIPS, Pmax, etc.)
    I've never encountered a dump that I couldn't restore, i.e. that wasn't
    self-consistent.  That doesn't mean that all such dumps were
    "complete", especially since the meaning of "complete" gets fuzzy when
    we attempt to apply that term to an active filesystem; but it does
    mean that I had what I needed to recover from crashes.

I am glad for you.  On the other hand I have seen restored fail
utterly when the dump was taken on an active file system.  The dump we
used has all the above patches installed!  We all know even with all
those mods that there are failure conditions: Chris Torek and others
have posted them time to time so I am not going to repeat them.
Murphy's Law indicates that when you are in critical need of a clean
dump...  that is when you happened to get an inconsistent one.  I
haven't found a corrupted dump tape often, but it has happened ...
which is enough to keep me doing level 0 in single-user.  I value my
user's data.  [This may come from an incredible three day period of
time when the staff retyped an entire thesis for a graduating PhD
candidate who lost everything after a series of failures.]

Another thing to note is most of us are seeing more and more disk
hanging off our machines.  There are more files and more dumps being
done.  A few years ago most of us had 1-2gb of disk.  These days sites
with 5-10x that are common.  That increases the total number of
possible failures because a lot more dumps are being running.  The
sites I know who have seen inconsistent dumps are also sites that have
10-30 servers and 15-30gb of disk a few years ago.

Mark Verber
Ohio State Physics Dept / Computing Services

jb3o+@andrew.cmu.edu (Jon Allen Boone) (05/23/91)

Ok, so I'll take the advice that you ought to do level 0 dumps in
single-user mode.

	    Question:

	    		Will the following scenario work?

	  2:00am (Dump Time!)
		 Cron on machine A (MA) says to do dumps.

		 Exebyte is on machine B (MB), so rsh the job to MB.

		 MB determines which filesystems need to be dumped at
		 what levels.  Let's say that there are 10 different
		 file systems.  Let's say that 1 needs a level 0 dump.
		 

		 machine C (MC) needs the level 0 dump.  Cron on MC
		 has a job scheduled which determines that it needs a
		 level 0 dump - so it shuts down.  Then, it dumps the
		 level 0 filesystem, rsh'ing the output to a dd
		 command on MB.  Once it's done, it reboots the
		 machine to multi-user mode. 

		 Well?  (Note: Comments about the insecurity of rsh,
		 etc. are welcome - but probably already known.)

If you can't do that, then WHAT can you do?

   IF you have many different file systems - too many to actually go
around and hang a unique exebyte off of each one - and you can't
realistically change the location of the exebyte each night, what do
you do other than multi-user mode backups (some of which are level 0)?

    Also, I just today did a multi-user backup/restore from one
machine to another - a level 0 dump of both / and /usr - restoring
each to another machine (with myself and another person logged in on
the dump machine) - and it seems to have worked just fine.


   -=> iain <=-

----------------------------------|++++++++++++++++++++++++++++++++++++++++
| "He divines remedies against injuries;   | "Words are drugs."           |
|  he knows how to turn serious accidents  |     -Antero Alli             |
|  to his own advantage; whatever does not |                              |
|  kill him makes him stronger."           | "Culture is for bacteria."   |
|                   - Friedrich Nietzsche  |     - Christopher Hyatt      |
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

olson@anchor.esd.sgi.com (Dave Olson) (05/23/91)

In <1991May21.213844.12302@ccu1.aukuni.ac.nz> russell@ccu1.aukuni.ac.nz (Russell J Fulton;ccc032u) writes:

| zwicky@erg.sri.com (Elizabeth Zwicky) writes:
| 
| >Using tar instead of dump buys you extremely little. tar will skip
| >active files, which means they won't corrupt your backup. This is its
| >sole advantage, and its only an advantage over some versions of dump.
| >It will *also* skip files with names that are too long; depending on
| >the version of tar you are running, it may also exhibit various nasty
| >other problems dump doesn't have. On the whole, dump is safer.
| 
| Would some knowledgable person care to comment on bru in light of Elizabeth's
| comments above. We use bru to back up our SGI 4D/240S with 6GB disk (five
| 1.2 GB drives) We back up one drive a night and do an incremental on the
| rest. The system is usually fairly quiet when the backup is done (in the
| small hours) with only a small number of batch jobs active.
| 
| We have had no trouble, yet, and have had to restore a disk on two occasions
| in the last year.

This should be no problem, as long as you don't run into filename length
limitations.  bru has a limitation of 127 chars for the pathname length.
bru behaves similarly to tar when files change size while they are being
backed up.

bru also tends to be somewhat :) wasteful of tape, using about 25% more
tape for typical files than tar, due to the checksums, etc. that it does.

Most versions of tar limit you to 100 chars of pathname (which
might be relative or full path); the POSIX version, which should be showing
up on various systems (such as IRIX 4.0) has 255 char limit.  I have NEVER
seen a version of tar that skips 'active files'.  Files that grow between the
time tar stats them and finishes writing them will only have the original
length; those that get shorter via truncations and rewrites will be padded
with nulls to the original size.

One of the main limitations of bru and tar relative to dump (for some people)
is simply that they typically take longer, since they go through the filesystem.
This is much more evident when many small files are backed up for most
unix systems, as the open time becomes dominant.

tar also suffers from the limitation (in many peoples minds) that it is
difficult to do incremental backups.  bru has the ability to backup
files on mtime, but this misses ctime changes, such as owner or
permission changes.
--

	Dave Olson

Life would be so much easier if we could just look at the source code.

FFAAC09@cc1.kuleuven.ac.be (Nicole Delbecque & Paul Bijnens) (05/23/91)

In article <1991May21.172208.281@erg.sri.com>, zwicky@erg.sri.com (Elizabeth
Zwicky) says:
>
>Using tar instead of dump buys you extremely little. tar will skip
>active files, which means they won't corrupt your backup.

What is meant by "active files" in the context of tar.  How does
tar know when a file is opened by another program (it cannot read
/dev/kmem, can it)?  Do you mean it skips the files, not putting them
on the tape?

I thought the fundamental difference between dump and tar/cpio was
that tar/cpio just read the files through the block-device while
dump reads the raw device.

In the light of "internal" consistency of the backup (i.e. the tape
can always be restored) using tar, can somebody explain what happens
in these cases:
1. While tar reads through the file, the file grows at the end.
   e.g. log files frequently do this.  Where does tar stop?  At the
   old end of the file (so that the inode information on the tape is
   consistent with the length of the data following it) or when it
   encounters EOF of the disk-file (but now the tape is inconsistent)?
2. Someone truncates the file (to 0 or with the syscall trunc()).
   Tar just reads the block-device.
   Will tar add zero-filled blocks to match the recorded file-length?
3. Let's suppose a dbm-file. Tar just reads sequentially. Some other
   program updates the dbm-file.  I think the tape will contain
   some file (it will restore without problems), but you cannot
   do anything useful with the restored file.

Our version of cpio maintains it consistency: it can always restore
the files.  But, like in case 3, the restored file could be useless.
How does tar behave in this respect?

p.s. Another "advantage" of tar/cpio to do the backups is that you
     can restore the files on a different system (e.g. from a BSD
     to a SysV machine) without much hassle.
     This becomes more important for the archived backups.  You
     never know what machine you will have in 5 years from now.
--
Polleke   (Paul Bijnens)
Linguistics dept., K. University Leuven, Belgium
FFAAC09@cc1.kuleuven.ac.be

peter@ficc.ferranti.com (Peter da Silva) (05/24/91)

What I don't understand is why people are still using "dump" to do backups?
A pretty minimal script using "find -newer level-file" and "cpio" works just
fine on active file systems.
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

yar@cs.su.oz (Ray Loyzaga) (05/24/91)

In article <KJIBZ8B@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> What I don't understand is why people are still using "dump" to do backups?
> A pretty minimal script using "find -newer level-file" and "cpio" works just
> fine on active file systems.
> -- 
> Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
> Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

Restore -i is pretty cute, particularly as our users rarely know the
complete path name of a file accurately, and dump is faster ...
What does cpio do if it receives a name from find that has just been removed?
How about directories?
Do you have to read the entire cpio file to
know if a file is on it (assuming no TOC held on a disk)?
Does -newer just check the modification time, if so you might miss some files that
have been touched backwards, it should use the inode change time.

rmtodd@servalan.uucp (Richard Todd) (05/24/91)

peter@ficc.ferranti.com (Peter da Silva) writes:

>What I don't understand is why people are still using "dump" to do backups?
>A pretty minimal script using "find -newer level-file" and "cpio" works just
>fine on active file systems.

1. "dump" preserves the access times on files, and "restore" restores the
files with the access times set correctly.  "cpio" neither records the access
times in its archive nor leaves the access times of the files on disk 
unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
user files that haven't been accessed in, say, 6 months and automatically
moving them off to tape and deleting them.  

2.  "dump" handles files with holes in them correctly (the holes don't take
up space on the backup, and "restore" restores the files with holes correctly).
"cpio" doesn't.  Having all your dbm files suddenly explode in disk usage 
after having been brought back off of tape is considered bad form in some
circles...

3.  Just how were you planning to do restores of those incremental backups?
Seems to me that the naive approach (extracting the incremental cpio just
like the full cpio backups) won't work correctly on directories which have
had files deleted between the making of the full and the incremental
backup.  Say you've got a directory "foo", which had files "a","b", and "c"
in it at the time of the full backup, but between that backup and the
incremental someone deleted files "a" and "b".  In restoring the filesystem
after the crash, you read in the level-0 cpio backup, which puts "foo/a",
"foo/b", and "foo/c" on the disk.  Now you read in the incremental cpio
backup, which (because "foo" had files deleted, shows up with a newer mod
time than the level-0, and thus gets backed up) has the "foo" directory and
"foo/c" on it, and thus foo/c gets written to your disk--but "foo/a" and 
"foo/b" are not deleted.  You have not restored the filesystem to its state 
as of the time of the incremental backup.  This means that you need to do 
some extra work to make sure that all the stuff you got rid of once gets 
gotten rid of again.  (Note that if you're really unlucky, and had a lot of
old stuff deleted and new stuff added between the full and incr. backups, 
restoring the incremental cpio file will fill your disk.)  This just adds
to the hassle a sysadmin has to deal with when restoring a filesystem, when 
usually the sysadmin has entirely too much to deal with anyway...

  Basically, there's a subtle difference between the goal of "dump" and 
"cpio".  "Dump" is a *backup* program; its function is to save the state of
a filesystem in such a way that it can be restored exactly later.  "Cpio" is
an archiving program; like "tar" or "zoo", its function is to package up a
bunch of files in a halfway portable fashion so that they can be transported
about easily from one place to another, from one system to another.  You can
try to press "cpio" or "tar" into service as a backup program, but it's 
not really the same thing...
--
Richard Todd	rmtodd@uokmax.ecn.uoknor.edu  rmtodd@chinet.chi.il.us
	rmtodd@servalan.uucp

benseb@grumpy.sdsc.edu (Booker Bense) (05/24/91)

In article <1991May24.013214.2526@servalan.uucp> rmtodd@servalan.uucp (Richard Todd) writes:
>peter@ficc.ferranti.com (Peter da Silva) writes:
>
>>What I don't understand is why people are still using "dump" to do backups?
>>A pretty minimal script using "find -newer level-file" and "cpio" works just
>>fine on active file systems.
>
[stuff about dump being better ]
>--
>Richard Todd	rmtodd@uokmax.ecn.uoknor.edu  rmtodd@chinet.chi.il.us
>	rmtodd@servalan.uucp

- Well, I've been wrestling with this problem for some time now. I
sort of run things on a network that consists of 2 Ultrix Decstations,
3 VMS/vaxstations and some xterms. The VMS disks are visible from the
decstation using UCX. We have one 1.2 gig DAT hanging off a Decstation
and I am attempting to implement a reasonable backup stragety.

-First you have to define why you are backing up. I have two goals in 
mind. 

1. Disk crashes

	- Need to recreate enough of the environment to be useful. 

2. Pilot Error 

	- Backups for accidental deletion by users. 

- These two objectives have totally different goals and I have come to 
the conclusion that TWO different backup strageties are needed. 

- To implement the first I do ``dump''s of the major filesystems onces a
month. I come in on a saturday and do this with no one on the machine.
After this discussion , I'll do it single-user mode.

- For the second I have set NFS up so root on the machine with the DAT
can read any file on the network ( either VMS or Ultrix ). With
various combinations of find and egrep -v I create a list of files
from the ``user filesystems'' and use GNU tar to dump this list onto
the end of a tar archive. This job is run by cron every night. GNU tar
has enough flexiblity that I can get only the ``latest version'' of
the file off the archive when neccessary. I also have utilities that
will take care of converting VMS var. record length to Stream lf
format. This has proven far more useful than the dump tapes and is
relatively automatic. ( I only have to change tapes about once a month
).

- The hard part has been convincing the kernal that the tape drive
really was capable of 1.2 gigs. Many thanks to Don Rice in
comp.unix.ultrix for the helpful advice. 

- Booker C. Bense                    
prefered: benseb@grumpy.sdsc.edu	"I think it's GOOD that everyone 
NeXT Mail: benseb@next.sdsc.edu 	   becomes food " - Hobbes

peter@ficc.ferranti.com (Peter da Silva) (05/25/91)

In article <2458@cluster.cs.su.oz.au> yar@cluster.cs.su.oz (Ray Loyzaga) writes:
> Restore -i is pretty cute, particularly as our users rarely know the
> complete path name of a file accurately,

We can just grep the backup.log file. We have to do that to find the volume
number anyway.

> and dump is faster ...

That's probably true.

> What does cpio do if it receives a name from find that has just been removed?

Continues.

> How about directories?

Ditto.

> Do you have to read the entire cpio file to
> know if a file is on it (assuming no TOC held on a disk)?

We keep a list of files on disk.

> Does -newer just check the modification time, if so you might miss some files that
> have been touched backwards, it should use the inode change time.

Newer uses the modification time. I believe we use the inode change time
(we don't actually use find, but rather use a faster special-purpose program).
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

peter@ficc.ferranti.com (Peter da Silva) (05/25/91)

In article <1991May24.013214.2526@servalan.uucp> rmtodd@servalan.uucp (Richard Todd) writes:
> 1. "dump" preserves the access times on files, and "restore" restores the
> files with the access times set correctly.  "cpio" neither records the access
> times in its archive nor leaves the access times of the files on disk 
> unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
> user files that haven't been accessed in, say, 6 months and automatically
> moving them off to tape and deleting them.  

Your CPIO might have all those flaws. Ours doesn't. Ever hear of a program
by the name of "pax"?

> 2.  "dump" handles files with holes in them correctly (the holes don't take
> up space on the backup, and "restore" restores the files with holes correctly).
> "cpio" doesn't.  Having all your dbm files suddenly explode in disk usage 
> after having been brought back off of tape is considered bad form in some
> circles...

Again, a solved problem. (we don't use DBM, but our databases do have a
similar behaviour)

> 3.  Just how were you planning to do restores of those incremental backups?

We don't worry about deleted files reappearing, and it has not been a problem
in general. We do not restore en-masse from major disasters anyway... it's
always a good chance to tidy up old software, bring a system to the latest
rev level of everything, and so on.

> This means that you need to do 
> some extra work to make sure that all the stuff you got rid of once gets 
> gotten rid of again.

This is a minor problem compared to the complexity of shutting down all the
systems for the daily backups. I don't think we could work that way in any
case, as we usually back up a lot of systems remotely over the network.

> You can try to press "cpio" or "tar" into service as a backup program, but
> it's not really the same thing...

Until UNIX ships with a version of dump we can use, we don't really have an
alternative. I'm really surprised that anyone with any significant number of
machines is still using it.
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

rca@ingres.com (Bob Arnold) (05/25/91)

In article <KJIBZ8B@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>What I don't understand is why people are still using "dump" to do backups?
>A pretty minimal script using "find -newer level-file" and "cpio" works just
>fine on active file systems.

Is this a serious question?

If it is related to the discussion about potential problems with dump,
find | cpio is vulnerable to sync problems too, because the find is
running ahead of (and much faster than) the cpio.

A more generic answer:

1) dump will not traverse filesystem/NFS boundaries.  So just how am I
supposed to back up the root filesystem with cpio?  Like this ?!?!:
	cd / ; find . -print | cpio ....
Try that on a big NFS server/client sometime.  To make life even more
miserable, many systems without BSD dump also lack BSD find's "-xdev"
or "-fstype nfs -prune" options.

2) cpio's user interface is far inferior to dump/restore for both backup
and (especially) file retrieval.  dump/restore has all this, cpio doesn't:
	a) messages telling the user how much work has been done so far
	(yeah some people might not call this a feature, and dump does
	it in a funky way, but it's better than nothing if you like it)
	b) good media load error handling (write-protected media, etc)
	c) interactive selection of files for retrieval
	d) decent end-of-tape handling (do you want to restart the volume?)

3) dump provides services that cpio doesn't:
	a) tracking of multi-level backups
	b) lists of files that are supposed to be on the tape
	c) access to devices on remote hosts

4) many versions of cpio can't handle multi-volume backups.

5) various vendor wierdnesses with cpio (these are more braindamaged
than most cpio's):
	a) at least one new version of cpio doesn't handle device nodes
	b) at least one version of cpio can't write portable headers
	(the "c" option)

I say all this as someone who has written a backup script to put
multiple filesystems on a single tape.  It works on perhaps 30
UNIX variants.  The part of the script that actually puts backups
on tape is 18 lines when it uses dump and 282 lines when we are
forced to use cpio (both counts include comments).  The extra code
is basically to provide as much as I can of dump's functionality
(sort of as a wrapper for cpio).  Overall, I could probably eliminate
two thirds to three quarters of my code if I didn't have to deal
with cpio.  Now, if I only had the choice :-)

		Bob

--
  __   _    _   Bob Arnold		ASK / Ingres Product Division
|/  \ / \  / \| 			1080 Marina Village Parkway
|    /    /   |				Alameda, CA, 94501
|    \__/ \__/| rca@ingres.com		415/748-2819

bernie@metapro.DIALix.oz.au (Bernd Felsche) (05/25/91)

In <1991May24.013214.2526@servalan.uucp> rmtodd@servalan.uucp (Richard Todd) writes:

>peter@ficc.ferranti.com (Peter da Silva) writes:

>>What I don't understand is why people are still using "dump" to do backups?
>>A pretty minimal script using "find -newer level-file" and "cpio" works just
>>fine on active file systems.

>1. "dump" preserves the access times on files, and "restore" restores the
>files with the access times set correctly.  "cpio" neither records the access
>times in its archive nor leaves the access times of the files on disk 
>unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
>user files that haven't been accessed in, say, 6 months and automatically
>moving them off to tape and deleting them.  

Depends on the cpio options... doesn't your cpio have -a ? Do you know
how to use it?

>2.  "dump" handles files with holes in them correctly (the holes don't take
>up space on the backup, and "restore" restores the files with holes correctly).
>"cpio" doesn't.  Having all your dbm files suddenly explode in disk usage 
>after having been brought back off of tape is considered bad form in some
>circles...

This is a problem with any backup utility I know of, which accesses
files through the filesystem. The "holes" are not physical. You have
to look at the blocks allocated to check for holes. Mind you, there's
nothing that stops a utility from doing it from the filesystem, is
there? After all, any run of NULLs in a file can be represented as a
relative seek. Heck you could *shrink* files for backup.

>  Basically, there's a subtle difference between the goal of "dump" and 
>"cpio".  "Dump" is a *backup* program; its function is to save the state of
>a filesystem in such a way that it can be restored exactly later.  "Cpio" is
>an archiving program; like "tar" or "zoo", its function is to package up a
>bunch of files in a halfway portable fashion so that they can be transported
>about easily from one place to another, from one system to another.  You can
>try to press "cpio" or "tar" into service as a backup program, but it's 
>not really the same thing...

More than a subtle difference. We do an image copy of our root
filesystem every night onto a spare partition, on another drive. Even
with the system idle, an fsck on the copy still reports problems..
nothing serious.. mostly FIFO file size errors. The copy takes 3
minutes. This is archived on tape once a week.

-- 
Bernd Felsche,                 _--_|\   #include <std/disclaimer.h>
Metapro Systems,              / sale \  Fax:   +61 9 472 3337
328 Albany Highway,           \_.--._/  Phone: +61 9 362 9355
Victoria Park,  Western Australia   v   Email: bernie@metapro.DIALix.oz.au

peter@ficc.ferranti.com (Peter da Silva) (05/25/91)

In article <1991May25.003146.13982@ingres.Ingres.COM> rca@Ingres.COM (Bob Arnold) writes:
> In article <KJIBZ8B@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> >What I don't understand is why people are still using "dump" to do backups?
> >A pretty minimal script using "find -newer level-file" and "cpio" works just
> >fine on active file systems.

> Is this a serious question?

Yes.

> If it is related to the discussion about potential problems with dump,
> find | cpio is vulnerable to sync problems too, because the find is
> running ahead of (and much faster than) the cpio.

Yes, but cpio doesn't produce a bad archive when it gets out of sync.

> 1) dump will not traverse filesystem/NFS boundaries.  So just how am I
> supposed to back up the root filesystem with cpio?

We don't back up the root file system. We back up /sys, /etc and /net,
so we retain config files, but otherwise if the root file system gets blown
away we use it as an opportunity to copy in a clean one. We work hard on
keeping all our systems looking the same, and as a result backing up root
is a waste of time.

> 2) cpio's user interface is far inferior to dump/restore for both backup
> and (especially) file retrieval.

That's why we use pax.

> 3) dump provides services that cpio doesn't:
> 	a) tracking of multi-level backups

Trivial.

> 	b) lists of files that are supposed to be on the tape

Redirect the output of cpio to a file.

> 	c) access to devices on remote hosts

We're on a network that provides transparent UNIX file system semantics
on remote hosts, and another thing that puzzles us terribly is why people
put up with junk like NFS and RFS.

> 5) various vendor wierdnesses with cpio (these are more braindamaged
> than most cpio's):

That's why we use pax.

And another advantage to cpio is that it sticks to the traditional UNIX
tools approach. Why use one big integrated program when you get so much
more flexibility from a script built out of tools.
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

les@chinet.chi.il.us (Leslie Mikesell) (05/26/91)

In article <1991May24.013214.2526@servalan.uucp> rmtodd@servalan.uucp (Richard Todd) writes:

>1. "dump" preserves the access times on files, and "restore" restores the
>files with the access times set correctly.  "cpio" neither records the access
>times in its archive nor leaves the access times of the files on disk 
>unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
>user files that haven't been accessed in, say, 6 months and automatically
>moving them off to tape and deleting them.  

Most cpio's have the -a option to "fix" the disk access time to be the
same as it was before cpio read the file.  The problem is that if you
use it, it changes the ctime in the process of fixing the atime.  So
if you use ctime to test for the files that need to be put on an
incremental backup (and you should...), they all appear to be new.
The only solution I've found is to back up across a read only mount,
which isn't too difficult if you are going over an network anyway.

>2.  "dump" handles files with holes in them correctly (the holes don't take
>up space on the backup, and "restore" restores the files with holes correctly).
>"cpio" doesn't.  Having all your dbm files suddenly explode in disk usage 
>after having been brought back off of tape is considered bad form in some
>circles...

Afio (a cpio work-alike) will do this by seeking over blocks of nulls
during the restore.

>3.  Just how were you planning to do restores of those incremental backups?
>Seems to me that the naive approach (extracting the incremental cpio just
>like the full cpio backups) won't work correctly on directories which have
>had files deleted between the making of the full and the incremental
>backup. 

GNUtar is the only thing I've seen that gets this right (running AT&T
sysV, I haven't used dump/restore).  It (optionally) makes an entry
containing all the current contents of the directories as it goes by
and can (optionally) delete files that aren't supposed to be there
as you restore.

Les Mikesell
  les@chinet.chi.il.us

les@chinet.chi.il.us (Leslie Mikesell) (05/26/91)

In article <W3KBZID@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

>Yes, but cpio doesn't produce a bad archive when it gets out of sync.

It doesn't die if files have been deleted, but some very nasty things
happen if a file is truncated between the time cpio generates its
header and when it actually reads the file.  I think most versions
even get confused if the file grows in this interval (i.e. they put
the data to EOF in the archive even if it is not consistant with the
header length field.).

>We're on a network that provides transparent UNIX file system semantics
>on remote hosts, and another thing that puzzles us terribly is why people
>put up with junk like NFS and RFS.

What's wrong with RFS, other than killing all the processes that were
using it when a link goes down?

Les Mikesell
  les@chinet.chi.il.us

terry@jgaltstl.UUCP (terry linhardt) (05/27/91)

In article <1991May24.013214.2526@servalan.uucp>, rmtodd@servalan.uucp (Richard Todd) writes:
> 1. "dump" preserves the access times on files, and "restore" restores the
> files with the access times set correctly.  "cpio" neither records the access
> times in its archive nor leaves the access times of the files on disk 
> unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
> user files that haven't been accessed in, say, 6 months and automatically
> moving them off to tape and deleting them.  
> 
This statement is not necessarily correct. If you use cpio with
the -a option access times are *not* reset.

-- 
|---------------------------------------------------------------------|
|  Terry Linhardt      The Lafayette Group      uunet!jgaltstl!terry  | 
|---------------------------------------------------------------------|

acsiv@menudo.uh.edu (Duck @ U of Houston) (05/27/91)

In article <W3KBZID@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>
>That's why we use pax.
>
>-- 
>Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
>Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

I'll bite...what's pax?

Don

-- 
Donald M. Harper   (713) 749-6283 
University of Houston Academic User Services, User Services Specialist II   
Cannata Research Computing Center, Operator
Internet : DHarper@uh.edu | Bitnet : DHarper@UHOU  | THEnet : UHOU::DHARPER

dan@gacvx2.gac.edu (05/27/91)

In article <1991May26.180717.22463@menudo.uh.edu>, acsiv@menudo.uh.edu (Duck @ U of Houston) writes:
> In article <W3KBZID@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>>
>>That's why we use pax.
> 
> I'll bite...what's pax?

archie> whatis pax
 
pax                     Reads and writes tar(1) and "cpio" formats, both
                        traditional and IEEE 1003.1 (POSIX) extended.  Handles
                        multi-volume archives and automatically determines
                        format while reading.  Has tar(1), "cpio", and "pax"
                        interfaces.  "pax" interface is based on IEEE 1003.2
                        Draft 7


-- 
Dan Boehlke                    Internet:  dan@gac.edu
Campus Network Manager         BITNET:    dan@gacvax1.bitnet
Gustavus Adolphus College
St. Peter, MN 56082 USA        Phone:     (507)933-7596

adeboer@gjetor.geac.COM (Anthony DeBoer) (05/27/91)

In article <1991May24.013214.2526@servalan.uucp> rmtodd@servalan.uucp (Richard Todd) writes:
>1. "dump" preserves the access times on files, and "restore" restores the
>files with the access times set correctly.  "cpio" neither records the access
>times in its archive nor leaves the access times of the files on disk 
>unaffected.  Thus, "cpio" screws up any schemes one may have for locating 
>user files that haven't been accessed in, say, 6 months and automatically
>moving them off to tape and deleting them.  

If your cpio implements the "a" parameter, you can do a backup without
affecting the access times of the files on the disk:

# find . -print | cpio -ovBca > /dev/rmt0

Of course, if you ever have to restore these files, the access time would
get munged at that point.
-- 
Anthony DeBoer NAUI#Z8800 | adeboer@gjetor.geac.com   | Programmer (n): One who
Geac Canada Ltd.          | uunet!geac!gjetor!adeboer | makes the lies the 
Toronto, Ontario          | #include <disclaimer.h>   | salesman told come true.

torek@elf.ee.lbl.gov (Chris Torek) (05/28/91)

[NB: I am assuming you do incremental backups, not just full-system
backups.  If you have a few dozen gigabytes of disk, you almost
certainly rely on incremental backups.]

Cpio, like any utility that works through the file system, is not well
suited as an `exact backup' program for most if not all Unix systems.
(There is a `trick' to get around this, but it typically does not work
well anyway.)  Here is why:

In article <1991May27.132333.26592@gjetor.geac.COM>
adeboer@gjetor.geac.COM (Anthony DeBoer) writes:
>If your cpio implements the "a" parameter, you can do a backup without
>affecting the access times of the files on the disk:
>
># find . -print | cpio -ovBca > /dev/rmt0

Reading a file through the file system updates the file's access time
(and reading a directory updates the directory's access time).  The
only way to change the time back, through the file system, is with the
utime or utimes call (which call to use depends on which Unix system;
some support both---the only real difference is that utimes uses more
precise timestamps).

Using utime/utimes on a file will update the file's `ctime'
(inode-change time), since it changes the inode information.

Any exact-backup program must write out every file whose ctime is
greater than that of the last backup.  Otherwise an important change,
such as `file 1234 went from rw-r--r-- to rw-------', will not appear
on the tape.  It cannot quit after writing just the inode information
since the utime(s) system call(s) can be used to make a new file look
older; when this is done, only the ctime tells the truth: that the
file needs to be backed-up.

Thus, with a through-the-file system backup program, you have your
choice of either clobbering access times (reading everything that
is being backed-up) or making incrementals impossible.

Here is the trick:  When operating on a read-only file system, read
calls do not update the access time.  Thus, if you can unmount the file
system and remount it read-only, you can use cpio or other `ordinary'
utilities to make a backup without affecting the inode timestamps.  It
does not work well because you typically find that you cannot unmount
the file system.  If you could, you could unmount it and run `dump'
anyway.  (You can remount read-only; dump does not care if file systems
are mounted, only whether they change `non-atomically' to dump.)

If inode access times are unimportant to you, this argument collapses.
We happen to like them, and the fact that dump can write a gigabyte
to an Exabyte in just over 66 minutes (or 33 with the 500KB/s model)
does not hurt either.  Dump is currently limited by tape drive data
rates; this is all too often untrue for file system operations.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

khushro@caen.engin.umich.edu (Khushro Shahookar) (05/29/91)

We have had so many responses to
Re: SUMMARY: Backup while in multi-user mode
Could a kind soul please post a summary of the summary, listing all possible
holes in dump, tar, cpio, whatever... 

-KHUSHRO SHAHOOKAR   khushro@eecs.umich.edu

So much news,  so little time ...

bill@unixland.uucp (Bill Heiser) (05/30/91)

In article <EDJB+TC@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>
>Your CPIO might have all those flaws. Ours doesn't. Ever hear of a program
>by the name of "pax"?

Does pax have the problem that CPIO has, where if it encounters a file
on an NFS-mounted partition, and doesn't have read permission on the file,
it causes the CPIO to fail (rather than just skipping the file with
maybe a warning)?

I'm leery of something non-standard like PAX.  What happens 5 years from
now when we need to restore something from a tape created on a Sun/386i
(for example) to a Sparc-L? :-)

-- 
bill@unixland.natick.ma.us	The Think_Tank BBS & Public Access Unix
    ...!uunet!think!unixland!bill       bill@unixland
    ..!{uunet,bloom-beacon,esegue}!world!unixland!bill
508-655-3848 (2400)   508-651-8723 (9600-HST)   508-651-8733 (9600-PEP-V32)

karish@mindcraft.com (Chuck Karish) (05/30/91)

In article <1991May30.002422.14775@unixland.uucp> bill@unixland.uucp
(Bill Heiser) writes:
>In article <EDJB+TC@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>>
>>Your CPIO might have all those flaws. Ours doesn't. Ever hear of a program
>>by the name of "pax"?
>
>I'm leery of something non-standard like PAX.  What happens 5 years from
>now when we need to restore something from a tape created on a Sun/386i
>(for example) to a Sparc-L? :-)

Non-standard?  pax was written specifically to support the tar and
cpio formats immortalized by the POSIX.1 standard.  Its behavior is
specified by the draft POSIX.2 standard.  It knows how to read
traditional tar and cpio formats.

Five years from now you'll be able to read pax archives by using
pax.

-- 

	Chuck Karish		karish@mindcraft.com
	Mindcraft, Inc.		(415) 323-9000

mills@ccu.umanitoba.ca (Gary Mills) (05/30/91)

In <1991May30.002422.14775@unixland.uucp> bill@unixland.uucp (Bill Heiser) writes:

>In article <EDJB+TC@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>>
>>Your CPIO might have all those flaws. Ours doesn't. Ever hear of a program
>>by the name of "pax"?

>Does pax have the problem that CPIO has, where if it encounters a file

If you have SunOS 4.1.1, check out /usr/5bin/paxcpio.
-- 
-Gary Mills-         -Networking Group-          -U of M Computer Services-

bill@unixland.uucp (Bill Heiser) (05/31/91)

In article <675574289.1712@mindcraft.com> karish@mindcraft.com (Chuck Karish) writes:
>
>Non-standard?  pax was written specifically to support the tar and
>cpio formats immortalized by the POSIX.1 standard.  Its behavior is
>specified by the draft POSIX.2 standard.  It knows how to read
>traditional tar and cpio formats.
>
>Five years from now you'll be able to read pax archives by using
>pax.
>

OK -- my knowledge of PAX is obviously very limited.  Basically, I 
know it exists ...  

My point is that I may want to read tapes later on a machine that 
doesn't happen to have PAX available.  Will ordinary tars and cpios
read tapes written by PAX?


-- 
bill@unixland.natick.ma.us	The Think_Tank BBS & Public Access Unix
    ...!uunet!think!unixland!bill       bill@unixland
    ..!{uunet,bloom-beacon,esegue}!world!unixland!bill
508-655-3848 (2400)   508-651-8723 (9600-HST)   508-651-8733 (9600-PEP-V32)