[comp.unix.questions] Automatic file compression?

root@rdb1.UUCP (Robert Barrell) (10/10/90)

     Is there any way, or any library of routines, which would automatically
compress data when it is stored into a file, and uncompress it when it is
retrieved?  I'd like to be able to read and write compressed files directly
without having to run them through uncompress of zcat all the time, and still
have file positioning work relative to the uncompressed data. 
     Is such a thing possible?  It may be necessary to write a separate library
to handle it, such as Zopen(), Zclose(), Zprintf(), Zscanf(), Zread(), Zwrite(),
Zseek(), etc.  This may be a difficult or impossible task if one only uses the
existing library calls, and pipes to compress, but would someone with a thorough
knowledge of compression algorithms be able to do it?

-- 
Robert Barrell      |     rbarrell@rdb1.UUCP      | Phillips Consulting Group
Milo's Meadow BBS   | uunet!pcgbase!rdb1!rbarrell | 282 North Shore Drive
login: bbs or nuucp |     "... Pooh just IS."     | Ormond Beach, FL   32176
(904) 441-5028      |     -- The Tao of Pooh      | (904) 672 - 3856

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/11/90)

In article <5@rdb1.UUCP> root@rdb1.UUCP (Robert Barrell) writes:
>      Is there any way, or any library of routines, which would automatically
> compress data when it is stored into a file, and uncompress it when it is
> retrieved?

It shouldn't be hard to stick this into any filesystem implemented
outside the kernel. You store all files with a compression type: either
no compression or a choice of available methods. You keep an MRU cache
of uncompressed files, including all the files open at the moment. You
might keep a priority queue of LRU files to switch from uncompressed to
compressed, or you might have all files compressed immediately upon
close().

Anyone want to try to stick this into RFS?

Note that making this sort of transparent change becomes very, very
difficult if the filesystem is hidden inside the kernel. comp.std.unix
readers know what I'm referring to.

---Dan

mju@mudos.ann-arbor.mi.us (Marc Unangst) (10/12/90)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
[automatically compressing files transparantly]
> Anyone want to try to stick this into RFS?

This is a bad, bad idea, for the same reason that compressing backups
before writing them to tape is a bad idea.  The difficulty of
recovering a trashed filesystem increases by several orders of
magnitude when you need to reconstruct a compressed file; in fact, I'd
say it's almost impossible to recover the undamaged portions of a
compressed file (especially if it was the key table that got trashed).

Maybe your disks never fail.  But mine do on occasion, and I like to
have at least half a chance of bringing my data back.

--
Marc Unangst               |
mju@mudos.ann-arbor.mi.us  | "Bus error: passengers dumped"
...!umich!leebai!mudos!mju | 

root@rdb1.UUCP (Robert Barrell) (10/12/90)

In article <23653:Oct1019:40:1190@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> It shouldn't be hard to stick this into any filesystem implemented
> outside the kernel. You store all files with a compression type: either
> no compression or a choice of available methods. You keep an MRU cache
> of uncompressed files, including all the files open at the moment. You
> might keep a priority queue of LRU files to switch from uncompressed to
> compressed, or you might have all files compressed immediately upon
> close().

Dan, 
     I understand the gist of what you said, but also realize that such a
filesystem implementation is far beyond my current knowledge of *nix.  Even so,
MUST such a thing be incorporated into a filesystem?  Is it not possible for
appropriate library routines to handle the [un]compression then hand the data
to other, lower-level file i/o routines?  Or is that, by definition, a 
"filesystem implemented outside the kernel?"  
     The whole key to what I want is to eliminate the need for an entire file
to exist in its uncompressed form on the disk at any time.  Rather than taking
the time and disk space to uncompress a file before accessing it, then just
compressing it again when finished,  I'd like to see routines where I'd be able
to say:

Zfseek(fp,1000L,0);
Zgets(string,101,fp);

and have the file position to the 1000th byte of the uncompressed file instead
of the compressed file.  At the moment, when I only need to read information
from compressed files, without having to seek or write, I use:

sprintf(cmd,"zcat %s",filename);
fp = popen(cmd,"r");
...

which works fine.  The problem arises when I want to try to seek, especially if
I wish to seek backwards into the file.

     Your ideas sound wonderful, and such an implementation would mean that
all the regular library commands, dbm functions, etc. could be used directly.
Still, isn't there possibly a simpler interim solution?
     Of course, if anyone out there CAN and DOES implement something which
works either way, I'd like to see it.

-- 
Robert Barrell      |     rbarrell@rdb1.UUCP      | Phillips Consulting Group
Milo's Meadow BBS   | uunet!pcgbase!rdb1!rbarrell | 282 North Shore Drive
login: bbs or nuucp |     "... Pooh just IS."     | Ormond Beach, FL   32176
(904) 441-5028      |     -- The Tao of Pooh      | (904) 672 - 3856