[comp.unix.questions] Backups using compress

dpence@redstone-emh2.army.mil ( Dwayne Pence) (11/27/90)

Does anyone know of a way to compress files before sending them through 
cpio or tar to tape without actually creating a ".Z" file on disk.  I am 
trying to create backup tapes which are compressed versions of our files 
on disk.  Any help would be appreciated.


Dwayne Pence

gwyn@smoke.brl.mil (Doug Gwyn) (11/27/90)

In article <25106@adm.brl.mil> dpence@redstone-emh2.army.mil ( Dwayne Pence) writes:
>Does anyone know of a way to compress files before sending them through 
>cpio or tar to tape without actually creating a ".Z" file on disk.

No, because cpio and tar open the files individually and have no provisions
for per-file filtration.  What most people do is to compress the archive,
not the individual files; that also should generally be more compact.
In this method you can simply pipe the archive through "compress".

les@chinet.chi.il.us (Leslie Mikesell) (11/28/90)

In article <14582@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:

>>Does anyone know of a way to compress files before sending them through 
>>cpio or tar to tape without actually creating a ".Z" file on disk.

>No, because cpio and tar open the files individually and have no provisions
>for per-file filtration.  What most people do is to compress the archive,
>not the individual files; that also should generally be more compact.
>In this method you can simply pipe the archive through "compress".

There are 2 problems with this method, though.  If you have a single
media error anywhere in the archive it will be pretty much impossible
to recover any of it past the error, and if most of the files on the
disk were already compressed the output may actually expand when
piped through compress.

Has anyone worked on a "packetizing" version of compress that would
be able to pass chunks that would not compress any further through
unchanged (except for a packet header) and provide identifiable
restart points that could be used for error recovery?

Les Mikesell
  les@chinet.chi.il.us

itkin@mrspoc.Transact.COM (Steven M. List) (11/28/90)

dpence@redstone-emh2.army.mil ( Dwayne Pence) writes:

>Does anyone know of a way to compress files before sending them through 
>cpio or tar to tape without actually creating a ".Z" file on disk.  I am 
>trying to create backup tapes which are compressed versions of our files 
>on disk.  Any help would be appreciated.

CTAR from Microlite (Pennsylvania, I think - I'm home and don't have the
info here) does compression as it writes the tape, and does it in memory.
It's a pretty slick product, and I recommend it highly.
-- 
 +----------------------------------------------------------------------------+
 :                Steven List @ Transact Software, Inc. :^>~                  :
 :           Chairman, Unify User Group of Northern California                :
 :     {apple,coherent,limbo,mips,pyramid,ubvax}!itkin@guinan.Transact.COM    :

wd@distel.pcs.com (wd) (11/28/90)

In <14582@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:

>                       ...What most people do is to compress the archive,
>not the individual files; that also should generally be more compact.
>In this method you can simply pipe the archive through "compress".

*** WARNING ***
If you ever want to be able to restore the files then *DO NEVER EVER
compress archive files! Any I/O error will cause you to lose all
your files after the error because you can *BY NO WAY* re-synchronize
in a compressed archive file.

It would be a nice feature if tar or cpio could compress individual
files, but the archive headers *MUST NOT* be compressed. Otherwise
you could archive your date to a WOM-device (Write Only Memory) -
this saves o lot of time! :-) :-)

*** YOU HAVE BEEN WARNED ***

Wolfgang

==================================================================
Name    : Wolfgang Denk
Company : PCS GmbH, Pfaelzer-Wald-Str. 36, 8000 Munich W-Germany.
UUCP    : ..[pyramid ;uunet!unido]!pcsbst!wd  (PYRAMID PREFERRED!!)
DOMAIN  : wd@pcsbst.pcs.[ COM From rest of world; DE From Europe ]
######## The purpose of computing is insight, not numbers! ########

chip@tct.uucp (Chip Salzenberg) (11/29/90)

According to gwyn@smoke.brl.mil (Doug Gwyn):
>What most people do is to compress the archive, not the individual
>files; that also should generally be more compact.
>In this method you can simply pipe the archive through "compress".

I would think twice before using this method for backups.  LZW
compression (i.e. /usr/bin/compress), like most kinds of compression,
is _extremely_ unforgiving of trashed data.  If you lose ONE BYTE, the
rest of the archive will probably be lost permanently.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
    "I've been cranky ever since my comp.unix.wizards was removed
         by that evil Chip Salzenberg."   -- John F. Haugh II

dold@mitisft.Convergent.COM (Clarence Dold) (11/30/90)

in article <1990Nov27.191110.5314@chinet.chi.il.us>, les@chinet.chi.il.us (Leslie Mikesell) says:

> Has anyone worked on a "packetizing" version of compress that would
> be able to pass chunks that would not compress any further through
> unchanged (except for a packet header) and provide identifiable
> restart points that could be used for error recovery?

PCS (215) 226-2220
specializes in backup routines with error recovery, compressing, 
all the good stuff, 8mm, 4mm, on systems that don't offer vendor support, 
etc, etc...

-- 
---
Clarence A Dold - dold@tsmiti.Convergent.COM            (408) 435-5293
               ...pyramid!ctnews!tsmiti!dold        FAX (408) 435-3105
               P.O.Box 6685, San Jose, CA 95150-6685         MS#10-007

smith@urbana.mcd.mot.com (Bill Smith) (11/30/90)

In article <25106@adm.brl.mil> dpence@redstone-emh2.army.mil ( Dwayne Pence) writes:
>Does anyone know of a way to compress files before sending them through 
>cpio or tar to tape without actually creating a ".Z" file on disk.  I am 
>trying to create backup tapes which are compressed versions of our files 
>on disk.  Any help would be appreciated.
>
>
>Dwayne Pence

To write the tape record(s) you ask for, do this:

	find /usr -print | cpio -ocB | compress -c > /dev/<tape_drive>

To extract, do this:

	uncompress < /dev/<tape_drive> | cpio -icBumd

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Bill Smith, Motorola MCD, Urbana Design Center, Urbana, IL 61801
	Email:	smith@urbana.mcd.mot.com, uiucuxc!udc!smith
			Phone: 217 384 8525
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

bill@camco.Celestial.COM (Bill Campbell) (11/30/90)

In <1990Nov28.005522.7258@mrspoc.Transact.COM> itkin@mrspoc.Transact.COM (Steven M. List) writes:

:dpence@redstone-emh2.army.mil ( Dwayne Pence) writes:

:>Does anyone know of a way to compress files before sending them through 
:>cpio or tar to tape without actually creating a ".Z" file on disk.  I am 
:>trying to create backup tapes which are compressed versions of our files 
:>on disk.  Any help would be appreciated.

:CTAR from Microlite (Pennsylvania, I think - I'm home and don't have the
:info here) does compression as it writes the tape, and does it in memory.
:It's a pretty slick product, and I recommend it highly.
:-- 
: +----------------------------------------------------------------------------+
: :                Steven List @ Transact Software, Inc. :^>~                  :
: :           Chairman, Unify User Group of Northern California                :
: :     {apple,coherent,limbo,mips,pyramid,ubvax}!itkin@guinan.Transact.COM    :

Microlite's e-mail address is uunet!mlite!tom (Tom Podnar).  I
don't believe CTAR compresses in memory, but uses the standard
compress program.  If you are using a seekable device for your
backup medium (floppy, Bernoulli Box...) it compresses directly
on the backup media.  If you are using tape, it makes a
compressed backup in a temporary directory, then copies that file
to the tape.

CTAR includes routines to properly manage backups, both full and
incremental, and has a good menu system for those who can't type.

Bill.
--
-- 
INTERNET:  bill@Celestial.COM   Bill Campbell; Celestial Software
UUCP:   ...!thebes!camco!bill   6641 East Mercer Way
             uunet!camco!bill   Mercer Island, WA 98040; (206) 947-5591

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/30/90)

In article <27551FBF.2222@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
> I would think twice before using this method for backups.  LZW
> compression (i.e. /usr/bin/compress), like most kinds of compression,
> is _extremely_ unforgiving of trashed data.  If you lose ONE BYTE, the
> rest of the archive will probably be lost permanently.

If you want to correct errors, use an error-correcting code. Sheesh.

---Dan

arthur@sgi.com (Arthur Evans) (12/01/90)

In article <1022@pcsbst.pcs.com> wd@distel.pcs.com (wd) writes:
>
>*** WARNING ***
>If you ever want to be able to restore the files then *DO NEVER EVER
>compress archive files! Any I/O error will cause you to lose all
>your files after the error because you can *BY NO WAY* re-synchronize
>in a compressed archive file.

Well, I don't know about you, but I restore from compressed 
archive files all the time.  It's a fairly common way of 
storing things on ftp archives, and it works just fine.
I don't know why it doesn't work for you--perhaps it's 
specific to the type of machine you use.  

-arthur

--
----
Arthur Evans 					arthur@rawlinson.wpd.sgi.com

chip@tct.uucp (Chip Salzenberg) (12/04/90)

According to brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
>In article <27551FBF.2222@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>> I would think twice before using this method for backups.  LZW
>> compression (i.e. /usr/bin/compress), like most kinds of compression,
>> is _extremely_ unforgiving of trashed data.  If you lose ONE BYTE, the
>> rest of the archive will probably be lost permanently.
>
>If you want to correct errors, use an error-correcting code.

Sure, error correction is very nice.  But sometimes data are lost,
period, no recourse, from the *middle* of a backup.  And in those
cases, if you've compressed your archive, you're SOL: everthing from
the point of failure to the end of the archive is gone forever.  If
you've compressed individual files, at least you can recover the files
on the untrashed portions of the archive, both before and after the
point of failure.

>Sheesh.

My word(s) exactly.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
      "I'm really sorry I feel this need to insult some people..."
            -- John F. Haugh II    (He thinks HE'S sorry?)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/04/90)

In article <275A875A.3AB0@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
> According to brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> >In article <27551FBF.2222@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
> >> I would think twice before using this method for backups.  LZW
> >> compression (i.e. /usr/bin/compress), like most kinds of compression,
> >> is _extremely_ unforgiving of trashed data.  If you lose ONE BYTE, the
> >> rest of the archive will probably be lost permanently.
> >If you want to correct errors, use an error-correcting code.
> Sure, error correction is very nice.  But sometimes data are lost,
> period, no recourse, from the *middle* of a backup.

So what? Do you mean to say that error-correcting codes can't correct
errors?

``But if someone bombs your computer center and all your offsite storage
locations then you probably won't be able to recover the backup.''
Great, Chip. I think I'll waste half the space on every tape just in
case that happens.

---Dan

throop@aurs01.UUCP (Wayne Throop) (12/05/90)

>> chip@tct.uucp (Chip Salzenberg)
>>>,> brnstnd@kramden.acf.nyu.edu (Dan Bernstein)

>>> If you want to correct errors, use an error-correcting code.
>> Sure, error correction is very nice.  But sometimes data are lost,
>> period, no recourse, from the *middle* of a backup.
> ``But if someone bombs your computer center and all your offsite storage
> locations then you probably won't be able to recover the backup.''

I thought it perfectly clear that Chip was talking about losing a block
from the middle of a tape from "bit rot" or "alpha particles", or
losing a reel from the middle of a multi-reel backup by misfiling,
exposure to a kitchen magnet, or whatever.  That is, he was talking
about simple, plausible ways of losing many thousands of consecutive
bits, which error correcting codes cannot reasonably deal with.

I don't know about the archiving Dan depends upon, but these types of
of errors are all too common in my experience.  Certainly such
occurences are not nearly as remote as the example Dan uses in
attempting to downplay the likelihood of such losses.

On the other hand, I'd think a little thought and a slick shell
script or two could arrange to use (say) tar and compress to write
compressed archives with "sync points" built in, so that missing
data's impact is limited.
Wayne Throop       ...!mcnc!aurgate!throop

chip@tct.uucp (Chip Salzenberg) (12/05/90)

According to brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
>In article <275A875A.3AB0@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>>Sure, error correction is very nice.  But sometimes data are lost,
>>period, no recourse, from the *middle* of a backup.
>
>So what? Do you mean to say that error-correcting codes can't correct
>errors?

I meant "loss despite best efforts at error correction."  Obviously,
error-correcting codes can't fix everything.  If I lose a few bytes in
a tape block, then depending on the method used, I'm probably covered.
But if a media defect spans three tape blocks, those data are *gone*.

Furthermore, as I wrote (but was not quoted by Dan):

>>If you've compressed individual files, at least you can recover the
>>files on the untrashed portions of the archive, both before and after
>>the point of failure.

So you see, Dan, I am not attempting to proscribe compression during
archiving.  Rather, I am putting forth the position that files should
be compressed individually as they are archived, instead of having the
archive compressed as a whole.  This approach gains tape real estate
while minimizing the damage caused by dropped data in the middle of an
archive.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
      "I'm really sorry I feel this need to insult some people..."
            -- John F. Haugh II    (He thinks HE'S sorry?)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/05/90)

In article <59378@aurs01.UUCP> throop@aurs01.UUCP (Wayne Throop) writes:
> I thought it perfectly clear that Chip was talking about losing a block
> from the middle of a tape from "bit rot" or "alpha particles", or
> losing a reel from the middle of a multi-reel backup by misfiling,
> exposure to a kitchen magnet, or whatever.

Yes, it was quite clear. And there's no reason an error-correcting code
can't detect and correct shift errors to handle losing a block. Losing a
tape is rare---I don't think it's happened here, anyway. But then again,
we've never had the computer center and all backup sites bombed, either.
Maybe your experience differs.

> On the other hand, I'd think a little thought and a slick shell
> script or two could arrange to use (say) tar and compress to write
> compressed archives with "sync points" built in, so that missing
> data's impact is limited.

Hate to tell you this, but that's a (primitive) error-correcting code.

---Dan

news@aurs01.UUCP (news) (12/07/90)

<59378@aurs01.UUCP> <11170:Dec507:48:0390@kramden.acf.nyu.edu>
From: throop@aurs01.UUCP (Wayne Throop)
Path: aurs01!throop

> brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
> Maybe your experience differs.

Apparently so.  I've had tapes lost out of a set.  
Also several blocks out of tapes.  And not as rarely as I'd hope.

> Hate to tell you this, but that's a (primitive) error-correcting code.

Hate me not, for I'm eager to learn.  For example, the usage above is
unknown to me: since the error was accomodated instead of corrected,
I'd always called (and heard this type of thing called) error recovery,
not correction.  Just how muddy does common usage make this distinction?

Of course, error-correcting codes (and to a lesser extent
error-recovery codes) and compression are necessarily somewhat at odds,
since the redundant information required for the former conflicts with
the goal of the latter (and is why people settle for error recovery and
less redundancy).

Wayne Throop       ...!mcnc!aurgate!throop