[comp.compression] Compressing more then one file in one compression-run ?

hst@mh_co2.mh.nl (Klaas Hemstra) (05/16/91)

I have an idea for all of you archive-builders.
It would be nice if the archivers would be able to compress multiple files
in one compression-run. This way it would be possible to obtain much higher
compression ratio's for certain sets of files.
For example: If you have a lot of C sources in one directory and you want
to compress it using PKZIP or ARC or ZOO, each file is compressed seperately.
If you make a tar archive (with unix tar) and then compress it with unix
compress, the compression ratio will be better.

I recently obtained the new ARJ archiver. It has an option which lets you
store a file to an archive (.ARJ file) without compressing it. By first
using this option on all the files and then compression the resulting ARJ
archive, the compression ratio improved drastically on those archives which
contain a lot of simular or small files.

So I would like to suggest an option with the meaning:
- Compress all file with the same extention in one run.
(or some other simular conditions like small files, C+H files etc.)

Of course it will be more difficult to extract a single file from that archive
(the preceding files have to decompressed too), but that's a minor technical
problem.
I use PKZIP and ARJ mostly to backup programs i.e. directory trees.
Therefore I seldomly extrace only one file from an archive.

Example:
Suppose you want to make an archive containing the following files:
	AAA.EXE BBB.C  DDD.EXE EEEE.C FFF.EXE GGGG.C
This would be compressed on a per file bases with most current archive
programs.
I would like the files to be compressed like this:
	BBB.C   	\
	EEEE.C		 \
	GGGG.C		  \   One compression (LZW ?) run
	AAA.EXE		\
	DDD.EXE		 \
	FFF.EXE		  \   Another compression (LZW ?) run


If I missed some extention to an archiver that could do this please excuse me
(I don't like reading manuals anyway).

Any comments ?
								Klaas

Klaas Hemstra  (hst@mh.nl)                      |   /  / ,~~~ ~~/~~
uucp: ..{uunet!}hp4nl!mh.nl!hst                 |  /--/  `-,   /  ___  |_/ |__|
Multihouse Automatisering B.V. Gouda,Netherlands| /  / ___/   /   ---  | \ |  |
"Most of us mindreaders are atheist, you know" A song for Lya: George Martin

james@jack.sns.com (James Hwang) (05/25/91)

In article <5655@mhres.mh.nl> hst@mh_co2.mh.nl (Klaas Hemstra) writes:
=I have an idea for all of you archive-builders.
=It would be nice if the archivers would be able to compress multiple files
=in one compression-run. This way it would be possible to obtain much higher
=compression ratio's for certain sets of files.
=For example: If you have a lot of C sources in one directory and you want
=to compress it using PKZIP or ARC or ZOO, each file is compressed seperately.
=If you make a tar archive (with unix tar) and then compress it with unix
=compress, the compression ratio will be better.

Yes, I have same experience. The problem is almost every archiver
has no stdin/stdout.

For example:

If we have those options below, it would be really nice.

tar cv - |lharc a - filename

lharc x - filename |tar xv -

lharc x - filename |tar xv - single_file_to_be_extracted

Also the lharc handles the directories tree very poor but tar
has a nice way to do it.


-- 
| Disclaimer  : Author bears full responsibility for contents of this article |
| Smart Mailer: james@jack.sns.com | hwang@postgres.berkeley.edu              |
| Real world  : K. James Hwang                            % flame > /dev/null |
| "The job of a citizen is to keep his mouth open."    --- Gunter Grass       |

churchh@ut-emx.uucp (Henry Churchyard) (05/29/91)

 In article <5655@mhres.mh.nl> hst@mh_co2.mh.nl (Klaas Hemstra) writes:
> I have an idea for all of you archive-builders.
> It would be nice if the archivers would be able to compress multiple files
> in one compression-run. This way it would be possible to obtain much higher
> compression ratios for certain sets of files.
> For example: If you have a lot of C sources in one directory and you want
> to compress it using PKZIP or ARC or ZOO, each file is compressed separately.
> If you make a tar archive (with unix tar) and then compress it with unix
> compress, the compression ratio will be better.

   But if you treated all the files in an archive as a single block
for the purpose of compression (rather than compressing each file in
an archive separately in isolation), then what would happen when you
deleted a file in the middle of an archive?  You would basically have
to uncompress the whole archive, delete the uncompressed file, and
then recompress everything else from scratch.  Not very efficient :-(

--
         --Henry Churchyard     churchh@emx.cc.utexas.edu

u9039899@cs.uow.edu.au (Darrin Jon Smart) (05/29/91)

churchh@ut-emx.uucp (Henry Churchyard) writes:

>an archive separately in isolation), then what would happen when you
>deleted a file in the middle of an archive?  You would basically have
>to uncompress the whole archive, delete the uncompressed file, and
>then recompress everything else from scratch.  Not very efficient :-(

No, but it'd be great for distributing finished software, or any application
that requires the whole archive contents at once.

 - Darrin

brad@looking.on.ca (Brad Templeton) (05/29/91)

Well, my archiver does exactly what was suggested, and my answer to the
deletion problem was simple -- don't do it.

In fact, these days, with disk space as cheap as it is, I suspect that
fewer and fewer people are using archivers in the old ARC fashion -- keeping
the only online copy of a file in the archive, and doing inserts, deletes
and updates.

These days the prime use of this is file transportation, not file storage.
(Other than storage for transportation).   As such you build the archive
as needed.

Object libraries are an exception to this.  They, of course, are not
compressed, though they could be.  There's just not much call for it.

And in the case of a file-combing mode (which is what people get when they
do a tar and then compress) you do this to get maximal compression, and
sometimes you sacrifice some things for maximal compression.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473