[comp.unix.wizards] Compressed Backups

adeboer@gjetor.geac.COM (Anthony DeBoer) (04/09/91)

Awhile back there was some discussion of doing compressed backups, roughly
along the line of:

# find . -print | cpio -ovc | compress -v | dd bs=64k of=/dev/rmt0

At the time, there were some warnings posted that, with the usual compression
algorithm, a tape error would make the whole rest of the tape unusable since
uncompress would lose sync and the rest of the data stream would be garbage.

Now, it seems to me that the failure mode with every bad QIC tape I've ever
encountered has been that the whole rest of the tape was inaccessible anyway.
I'd like to inquire of the net: Have people in fact had tapes with errors or
other such glitches that they were in fact able to read past, and get at the
rest of the tape?  And does this have anything to do with QIC versus 9-track
versus other media?  It seems to me that with some of the low-level stuff
that happens on QIC, 9-track might be more amenable to recovery.  Can anyone
who knows low-level tape drivers comment?

What I'm getting at is, if in fact any tape error stops you dead anyway, then
there's no reason not to pipe a whole backup through compress (making sure you
have an uncompressed copy of uncompress available for recovery :-), since if
you can't get at the remaining data anyway it's academic whether you can
uncompress it.

Also, if you can back up the same data in half the tape, you can probably back
up the system twice as often and be better covered with the same volume of
tape.  In fact, you could show mathematically that you're just as well covered
this way with say a single-tape compressed backup daily as opposed to a
two-tape regular backup every other day, so long as the probability of the
compression itself preventing restore is less than 50% (the actual probability
would be considerably less!).  The exact ratio would vary with the "density"
of your disk contents, but we have some large data files that give better than
90% compression.

There's also the point that if your most recent backup was corrupt but allowed
you to continue and recover most of the system, you'd probably still go back
another generation if that restored perfectly, just so you'd have something
you trusted.  Since this is the one situation where compression would be an
issue, perhaps it wouldn't be after all.

I'm sure I don't need to request net.comment, but I will anyway.
-- 
Anthony DeBoer NAUI#Z8800 | adeboer@gjetor.geac.com   | Programmer (n): One who
Geac J&E Systems Ltd.     | uunet!geac!gjetor!adeboer | makes the lies the 
Toronto, Ontario, Canada  | #include <disclaimer.h>   | salesman told come true.

david@talgras.UUCP (David Hoopes) (04/09/91)

In article <1991Apr8.194026.29651@gjetor.geac.COM> adeboer@gjetor.geac.COM (Anthony DeBoer) writes:
>Awhile back there was some discussion of doing compressed backups, roughly
>along the line of:
>
># find . -print | cpio -ovc | compress -v | dd bs=64k of=/dev/rmt0
>
>At the time, there were some warnings posted that, with the usual compression
>algorithm, a tape error would make the whole rest of the tape unusable since
>uncompress would lose sync and the rest of the data stream would be garbage.
>
>Now, it seems to me that the failure mode with every bad QIC tape I've ever
>encountered has been that the whole rest of the tape was inaccessible anyway.

I think you are confusing a tape error (some section of tape is unreadable) 
with a user overwriting the beginning of tape.  Overwriting teh tape is
generally not recoverable on qic drives.  However  reading past a tape
error is posable.  It is more of a function of the software then the drive.
tar will choke and die.  Some newer versions of cpio will resync and 
continue the restore after a bad block.  You lose that file or even a 
couple of files but most of the backup is good.

I do not reccomend the above compression.  If you loose one byte off the
tape the rest of the tape is gone.

>I'd like to inquire of the net: Have people in fact had tapes with errors or
>other such glitches that they were in fact able to read past, and get at the
>rest of the tape?  And does this have anything to do with QIC versus 9-track

I work with tapes and tape drivers.  I have induced tape errors with a magnate
on several QIC drives and our DAT drive.  It is possable to read past an
error if the software can handle it.


>Also, if you can back up the same data in half the tape, you can probably back
>up the system twice as often and be better covered with the same volume of
>tape.  In fact, you could show mathematically that you're just as well covered

Try figuring out how much time this will add to doing your backup.  And don't
forget that the restore will take longer also.  You can do the same thing by
buying a couple of extra tapes.

>There's also the point that if your most recent backup was corrupt but allowed
>you to continue and recover most of the system, you'd probably still go back
>another generation if that restored perfectly, just so you'd have something
>you trusted.  Since this is the one situation where compression would be an
>issue, perhaps it wouldn't be after all.

What if you are just doing a selective restore?  Then if the error does not
fall in the files that you are restoreing who cares.


-- 
---------------------------------------------------------------------
David Hoopes                              Tallgrass Technologies Inc. 
uunet!talgras!david                       11100 W 82nd St.          
Voice: (913) 492-6002 x323                Lenexa, Ks  66214        

bill@bilver.uucp (Bill Vermillion) (04/11/91)

In article <1991Apr8.194026.29651@gjetor.geac.COM> adeboer@gjetor.geac.COM (Anthony DeBoer) writes:
>Awhile back there was some discussion of doing compressed backups, roughly
>along the line of:
 
># find . -print | cpio -ovc | compress -v | dd bs=64k of=/dev/rmt0
 
>At the time, there were some warnings posted that, with the usual compression
>algorithm, a tape error would make the whole rest of the tape unusable since
>uncompress would lose sync and the rest of the data stream would be garbage.

Sure.  But if you change it around, to find the files, compress the files
and then feed them to cpio as individual compressed files, and compression
failure was result in the loss of that file only, would it not?

In other words, isolate failure points so a failure will affect only one
file, not all of the files stuffed into one volume.
 
>Also, if you can back up the same data in half the tape, you can probably back
>up the system twice as often and be better covered with the same volume of
>tape.  In fact, you could show mathematically that you're just as well covered
>this way with say a single-tape compressed backup daily as opposed to a
>two-tape regular backup every other day, so long as the probability of the
>compression itself preventing restore is less than 50% (the actual probability
>would be considerably less!).  The exact ratio would vary with the "density"
>of your disk contents, but we have some large data files that give better than
>90% compression.

I have a couple of '386 sites using Xenix that have CTAR installed.

One used to take 2 and a fraction tapes.  Ctar has a compress option that
compresses the files before tar'ing them to tape.   Doing this they went
from  2.25 60 megs tapes to about 2/3 of one tape.   Of course the
compression takes it's toll.  So backup onto the one tape takes about as
long or longer than the backup to the 2+ tapes.

But all is well.  We just tell the system to backup at 11pm, hours after
the lastest workaholic goes home.  Then it reads the entire tape back to
verify itself.  And then mails the results to the administrator.
Requiring almost no effort of the adminstrator makes sure that tapes get
made every day.


-- 
Bill Vermillion - UUCP: uunet!tarpit!bilver!bill
                      : bill@bilver.UUCP

bernie@metapro.DIALix.oz.au (Bernd Felsche) (04/14/91)

In <1991Apr10.231744.1037@bilver.uucp> bill@bilver.uucp (Bill Vermillion) writes:

>In article <1991Apr8.194026.29651@gjetor.geac.COM> adeboer@gjetor.geac.COM (Anthony DeBoer) writes:
>>Awhile back there was some discussion of doing compressed backups, roughly
>>along the line of:
> 
>># find . -print | cpio -ovc | compress -v | dd bs=64k of=/dev/rmt0
> 
>>At the time, there were some warnings posted that, with the usual compression
>>algorithm, a tape error would make the whole rest of the tape unusable since
>>uncompress would lose sync and the rest of the data stream would be garbage.

>Sure.  But if you change it around, to find the files, compress the files
>and then feed them to cpio as individual compressed files, and compression
>failure was result in the loss of that file only, would it not?

One disadvantage is that compressing the files individually, is far
less effective that compressing a large archive containing them.

Furthermore, if you have many small files, archiving and recovery
becomes painfully slow, because compress gets started many times.

SunOS's bar utility has a compress option which actually calls
compress via system(3). (At least it used to, a year or so ago.) This
has the additional undesirable side-effect of bombing out, when trying
to extract files like "p&l", because the shell interprets the
"compress p&l" as something totally unintended. It also forks about
insanely, if you have many small files.

Obviously, it's a balancing act between available media sizes,
convenience, cost and data security. With unlimited cash, I'd probably
run an FDDI link to another building, containing a dumb front-end, and
a redundant mirrored disk array. Most convenient :-) 
-- 
Bernd Felsche,                 _--_|\   #include <std/disclaimer.h>
Metapro Systems,              / sale \  Fax:   +61 9 472 3337
328 Albany Highway,           \_.--._/  Phone: +61 9 362 9355
Victoria Park,  Western Australia   v   Email: bernie@metapro.DIALix.oz.au