[comp.unix.questions] Backups on active file system

FFAAC09@cc1.kuleuven.ac.be (Nicole Delbecque & Paul Bijnens) (03/12/91)

You wrote:
> I am planning to use the command
>
>find /users | cpio ....

I use a variation of "find ... | cpio ..." every day here
to make our backup.  We make the backup in multi-user mode.
You should make sure your cpio can handle two things correctly.
When it reads the next-file-to-backup from its standard input
it does a open() and then a statf().  The size information from
that statf is (with the other i-node info) writen to the tape.
While reading a file, which is opened by another process, three
things can happen.
  1. Most of the time, the file grows.
     A good implementation of cpio should not read until eof,
     but only write as much databytes as indicated bye the header,
     that was already written to the tape.
  2. The files shrinkes.  This can be the result of a chsize()
     system-call if it is implemented, or a truncation, followed
     by a some write()s.  To be consistent in this case, cpio
     should write as much zero-bytes to the tape, as to reach
     the file-size in the header.
  3. Some parts in the middle of the file are changed.
     (Opened for "w+", and read, seek, write in the program).
     Cpio then just writes part of the old data, followed by
     updated data.
In any of the above cases, cpio retains internal consistency.
I mean: the data on the tape can be read again with cpio without
"getting out of sync" or other trouble.
The restored data can be damaged: (again the three above cases):
  1. You just backed up the old data.  The new data is never
     backed up.  However, the next day it will be backed up...
  2. Restoring the file may result in a huge file with the last
     part all nulls.  Waste of disk-space, but easy to fix with
     a 10-line C-program (or one line perl).
  3. If the data in the file should be in an internal consistent
     state (e.g. database file), then you have trouble with this
     file.  Restoring the file can give a useless piece of junk.
On our system we do the backup in the evening or morning, when
there aren't many people logged on.  We had never any problems
like the above.  However, I did manage to set up an experimental
situation to fool cpio, so I could verify our cpio maintained
it internal consistency (we have no source to look at).
I decided to do our backup in multi-user mode because:
  a. our cpio maintains it internal consistency.
  b. None of our files are updated constantly.
     (Our database program has its own backup-facility.)
  c. Having some "damaged" file on tape one day is not a problem
     because the next day it will be backed up again with the
     next incremental backup (make sure to set the time-stamp
     of your backup to the BEGINNING of the operation: files
     modified during the backup are newer this way).

bye
--
Polleke   (Paul Bijnens)
Linguistics dept., K. University Leuven, Belgium
FFAAC09@cc1.kuleuven.ac.be

FLYNN%EVALUN11.BITNET@cunyvm.cuny.edu (Mark F. Flynn) (03/13/91)

I'm setting up to do backups of the /users directory on a regular basis.
The problem is the follwing. The system is an HP Apollo 400t running HP-UX
(probably not important) used for developing and running biggish numerical
applications. During the day, I expect to have lots of access to the source
codes (which is the really important bit), and during the night, lots of
CPU- (and possibly I/O-) hungry background processes. Telling people to not
run jobs a certain day or whatever is not acceptable. What I want to know is
if there will be any problems with backing up files which may be open for
either input or output. I am planning to use the command

find /users | cpio ....

Any ideas?

Mark Flynn
FLYNN@EVALUN11.(EARN BITNET)

Departamto de Fisica Atomica y Nuclear
Universidad de Valencia
Spain