[comp.binaries.ibm.pc.d] Compressors

nyh@GAUSS.TECHNION.AC.IL (Nadav Har'El) (06/19/91)

I have tested 10 of the most availiable compressors (all of which can be
downloaded from garbo.uwasa.fi). For the archived files I arbitarily chose
pak's distribution file, which contains docs,exes, and other files, all
together 203497 bytes. I've classified the compressors by the number of
bytes after the compression.
Extention	Compressor	Size after compression	Rank
------------------------------------------------------------------
.arj		arj -jm		85907			Great
------------------------------------------------------------------
.lzh		lha		89594			Very good
------------------------------------------------------------------
.hyp		hyper		94250			Good
.zip		pkzip		94635			Good
.pak		pak		95554			Good
.lzh		lharc		95981			Good
------------------------------------------------------------------
.arc		pkpak		124013			Bad
.dwc		dwc		124013			Bad
.zoo		zoo		126182			Bad
.arc		arc		128299			Bad
------------------------------------------------------------------

Notice the great difference between the compressors I've classified as Bad,
and the others. Zoo, which ranks one of the lowest, is the compressor most
widely used in the c.b.i.p group. Also, pkzip, the form most widely used in
ftp sites, is only the fourth out of ten, which a very big difference between
it and arj.
Therefore, I suggest ftp sites should use the arj format, instead any of the
others. I know it is hard work converting all the files to arj, so the sites
can make all new archives in the arj format.
I know that someone might say that arj doesn't work on a UNIX computer. well,
this is true, at least until Robert Jung will make a UNIX version. Until then,
there is zip, for which there is UNIX source, and is a lot better then zoo,
which is used in c.b.i.p. Also, someone might say that although arj is better
than zip, it uncompresses somewhat slower than zip. Well, this is irellevent to
me. When I get a program from an ftp site, I don't care if it takes 30 or 60
seconds to unpack. also, I don't really need to unpack it on a UNIX machine
because it wouldn't run on it anyway!
The best solution is for Robert Jung to make a UNIX arj. But still, even
if he doesn't, I think arj should be the prefered archiver for ftp sites.
Also, arj is the most complete archiver I have ever seen - It has many
options, and very complex. If anyone thinks that arj should not be the
preferred archiver, he can post a reply. Maybe some site admistrators could
post a reply, why they prefer using other archivers.

-----------------------------------------------------------------------------
Nadav Har'El                                         nyh@gauss.technion.ac.il
-----------------------------------------------------------------------------

jwbirdsa@amc.com (James Birdsall) (06/19/91)

[a reasonable discussion of the merits of various archivers deleted]

   The problem is that it isn't a static situation. For example, there are
supposed to be greatly-improved versions of both zoo and zip on the way
even now. The bottom line is that the best archiver tends to change every
several months.

   Furthermore, the effort involved in porting each latest-and-greatest to
all the different OSes and architectures [remember that the FTP archives
themselves typically run UNIX or something exotic, and hence need
archive-manipulation programs that run on the FTP host; furthermore,
there's a lot of value in being able to check an archive before you spend a
lot of time downloading it] should not be underestimated. It's a lot of
effort just to get something to run on a reasonably wide variety of UNIX
systems. And once the port is complete, the results must be distributed and
installed on thousands of systems all over the world. An archive is no good
if nobody can unpack it.

   In light of all this, it is hardly surprising that the maintainers and
moderators of the world have chosen to preserve their time, effort, and
sanity by picking a format that is "good enough" and sticking to it. Which
is not to say that nothing ever changes; if a new format makes a _large_
difference, it is adopted. Witness the changeover of most FTP sites from
arc to zip. But it takes a lot of time and effort to make such a change,
and it isn't worth going through that every time there's a new
latest-and-greatest that squeezes out another five K.

-- 
James W. Birdsall   WORK: jwbirdsa@amc.com   {uunet,uw-coco}!amc-gw!jwbirdsa
HOME: {uunet,uw-coco}!amc-gw!picarefy!jwbirdsa OTHER: 71261.1731@compuserve.com
"The OS shouldn't die every time the controller drools on a sector." -- a sysop
=========== "For it is the doom of men that they forget." -- Merlin ===========

pjh@mccc.edu (Pete Holsberg) (06/20/91)

In article <10002@discus.technion.ac.il> nyh@GAUSS.TECHNION.AC.IL (Nadav Har'El) writes:
=I have tested 10 of the most availiable compressors (all of which can be

But you didn't test for speed.

=Until then,
=there is zip, for which there is UNIX source, and is a lot better then zoo,

Um, there's unzip source for UNIX but no zip.

Pete
-- 
Prof. Peter J. Holsberg      Mercer County Community College
Voice: 609-586-4800          Engineering Technology, Computers and Math
FAX: 609-586-6944            1200 Old Trenton Road, Trenton, NJ 08690
Internet: pjh@mccc.edu	     TCF 92 - April ??-??, 1992

w8sdz@rigel.acs.oakland.edu (Keith Petersen) (06/21/91)

pjh@mccc.edu (Pete Holsberg) writes:
>Um, there's unzip source for UNIX but no zip.

That's not exactly true.  It should say:

"zip for UNIX is in the final stages of Beta test and freely-distributable
source will be released to the public in a few weeks".

Join the Info-ZIP mailing list if you would like to help in porting
ZIP and UNZIP to minicomputers and mainframes.  Send e-mail to:

     Info-ZIP-Request@WSMR-SIMTEL20.ARMY.MIL

to be added to the list.

Keith
--
Keith Petersen
Maintainer of the MSDOS, MISC and CP/M archives at SIMTEL20 [192.88.110.20]
Internet: w8sdz@WSMR-SIMTEL20.Army.Mil    or     w8sdz@vela.acs.oakland.edu
Uucp: uunet!umich!vela!w8sdz                          BITNET: w8sdz@OAKLAND

pjh@mccc.edu (Pete Holsberg) (06/23/91)

In article <7436@vela.acs.oakland.edu> w8sdz@wsmr-simtel20.army.mil writes:
=pjh@mccc.edu (Pete Holsberg) writes:
=>Um, there's unzip source for UNIX but no zip.
=
=That's not exactly true.  It should say:
=
="zip for UNIX is in the final stages of Beta test and freely-distributable
=source will be released to the public in a few weeks".

Yes, and it *would* have said that had I but known!  Seriously, Keith,
my answer *is* true for the vast majority of users, right?  I mean, we
all know about vapoware, huh?  ;-)
=
=Join the Info-ZIP mailing list if you would like to help in porting
=ZIP and UNZIP to minicomputers and mainframes.

What, and write programs for machines that are passe'?  Not me.  I can
barely program 386s!

Pete
-- 
Prof. Peter J. Holsberg      Mercer County Community College
Voice: 609-586-4800          Engineering Technology, Computers and Math
FAX: 609-586-6944            1200 Old Trenton Road, Trenton, NJ 08690
Internet: pjh@mccc.edu	     TCF 92 - April ??-??, 1992

reichert@dino.ulowell.edu (Bastard) (06/24/91)

In article <10002@discus.technion.ac.il> nyh@GAUSS.TECHNION.AC.IL (Nadav Har'El) writes:
>I have tested 10 of the most availiable compressors (all of which can be
>downloaded from garbo.uwasa.fi). For the archived files I arbitarily chose
>pak's distribution file, which contains docs,exes, and other files, all
>together 203497 bytes. I've classified the compressors by the number of
>bytes after the compression.
>
>  [Extensive table of results]
>
>Therefore, I suggest ftp sites should use the arj format, instead any of the
>others. I know it is hard work converting all the files to arj, so the sites
>can make all new archives in the arj format.

Comprehensive though this experiment was, another usful attribute of various
compressors is their speed.  I realize that sheer size is definitely the
_biggest_ (no joke intended) factor, but as an end user, slow decompressions
grate on my nerves.

Out of curiosity: I know that some of the tested compressors have options for
optimizing for size vs. speed.  Were the appropriate options used for these
tests?
bastard@dragon.cpe.ulowell.cpu			   Brian (you Bastard) Reichert

USnail: 85 Gershom Ave. #2
	Lowell, MA 01854		"Intel architecture: the left hand path"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/26/91)

In article <1991Jun24.065433.10109@ulowell.ulowell.edu> reichert@dino.ulowell.edu (Bastard) writes:
| In article <10002@discus.technion.ac.il> nyh@GAUSS.TECHNION.AC.IL (Nadav Har'El) writes:

| >Therefore, I suggest ftp sites should use the arj format, instead any of the
| >others. I know it is hard work converting all the files to arj, so the sites
| >can make all new archives in the arj format.
| 
| Comprehensive though this experiment was, another usful attribute of various
| compressors is their speed.  I realize that sheer size is definitely the
| _biggest_ (no joke intended) factor, but as an end user, slow decompressions
| grate on my nerves.
| 
| Out of curiosity: I know that some of the tested compressors have options for
| optimizing for size vs. speed.  Were the appropriate options used for these
| tests?

  The big problem with arj is that is works for DOS only. There is a
UNIX version of unzip, arc, and lharc. arj and lha are DOS only for the
moment. The new version of zoo, due out in ten days or so, is portable
to UNIX, VMS, and Amiga as well as DOS.

  I ran tests on all the archivers I could find, and I believe that arj
beats lha by about a percent on average, lha beats new zoo by about a
percent on average, and new zoo beats zip by about 4 percent on average.
No other archiver is in the ballpark, and there's a new version of zip
threatened, I mean promised, which will probably get better compression
and be DOS only again, at least for a while.

  The point is that the author of arj has made a decision not to have a
full function unix version, and I bet there will never be a TOPS-10
version, so I would not expect arj to be chosen as the archiver of
choice.

  I think zip will continue to be used on many sites, while zoo will be
used here because going to a new version won't break all the moderator
and unpack scripts, and will provide somewhat better compression than
zip.

  One of the problems with changing archivers is that while moving the
data is fairly easy, many zoo/zip/arj archives have important
information in the comments, which should be preserved.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  GE Corp R&D Center, Information Systems Operation, tech support group
  Moderator comp.binaries.ibm.pc and 386-users digest.

elmo@troi.cc.rochester.edu (Eric Cabot) (06/27/91)

In <3476@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

>  I ran tests on all the archivers I could find, and I believe that arj
>beats lha by about a percent on average, lha beats new zoo by about a
>percent on average, and new zoo beats zip by about 4 percent on average.
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  I think zip will continue to be used on many sites, while zoo will be
>used here because going to a new version won't break all the moderator
>and unpack scripts, and will provide somewhat better compression than
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>zip.
^^^^
Not on my PC it doesn't. I have routinely unzooed everyfile that
I have gotten from comp.binaries.ibm.pc and recompressed them with
pkzip 1.1 and have reclaimed on the order of 5k for every 30k of
zoo file.  I have my pkzip set for maximum compression, not maximum
speed.

-EC

--
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=
Eric Cabot                              |    elmo@uhura.cc.rochester.edu
      "insert your face here"           |    elmo@uordbv.bitnet
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=

valley@gsbsun.uchicago.edu (Doug Dougherty) (06/27/91)

elmo@troi.cc.rochester.edu (Eric Cabot) writes:

>In <3476@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

>>  I ran tests on all the archivers I could find, and I believe that arj
>>beats lha by about a percent on average, lha beats new zoo by about a
>>percent on average, and new zoo beats zip by about 4 percent on average.
------>			 ^^^^^
>                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>  I think zip will continue to be used on many sites, while zoo will be
>>used here because going to a new version won't break all the moderator
>>and unpack scripts, and will provide somewhat better compression than
>                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>zip.
>^^^^
>Not on my PC it doesn't. I have routinely unzooed everyfile that
>I have gotten from comp.binaries.ibm.pc and recompressed them with
>pkzip 1.1 and have reclaimed on the order of 5k for every 30k of
>zoo file.  I have my pkzip set for maximum compression, not maximum
>speed.

The new zoo isn't generally available yet.  I'd bet my hat you don't
have it...
--

	(Another fine mess brought to you by valley@gsbsun.uchicago.edu)

richard@mungarra.asis.unimelb.edu.au (Richard Begg) (06/27/91)

elmo@troi.cc.rochester.edu (Eric Cabot) writes:

>In <3476@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

>>  I ran tests on all the archivers I could find, and I believe that arj
>>beats lha by about a percent on average, lha beats new zoo by about a
>>percent on average, and new zoo beats zip by about 4 percent on average.


[ stuff deleted to keep the brain dead news system from compaining ]


>                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>Not on my PC it doesn't. I have routinely unzooed everyfile that
>I have gotten from comp.binaries.ibm.pc and recompressed them with
>pkzip 1.1 and have reclaimed on the order of 5k for every 30k of
>zoo file.  I have my pkzip set for maximum compression, not maximum
>speed.

Er... didn't he say the "new" zoo?  Am I right in saying that it hasn't
been officially released yet?

BTW: When is the official realease due?


--
Richard Begg (richard@asis.unimelb.edu.au)
Programmer ASIS/ITS - University of Melbourne

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/27/91)

In article <1991Jun26.200005.15317@galileo.cc.rochester.edu> elmo@troi.cc.rochester.edu (Eric Cabot) writes:

| Not on my PC it doesn't. I have routinely unzooed everyfile that
| I have gotten from comp.binaries.ibm.pc and recompressed them with
| pkzip 1.1 and have reclaimed on the order of 5k for every 30k of
| zoo file.  I have my pkzip set for maximum compression, not maximum
| speed.

  See the word "new" in my original posting. You are comparing v2.01
with zip1.10 or so, written some years apart. zoo v2.10 provides
slightly better compression than zip v1.10, comparisons of other
versions are not germane.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  GE Corp R&D Center, Information Systems Operation, tech support group
  Moderator comp.binaries.ibm.pc and 386-users digest.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/29/91)

In article <richard.677975220@mungarra> richard@mungarra.asis.unimelb.edu.au (Richard Begg) writes:

| Er... didn't he say the "new" zoo?  Am I right in saying that it hasn't
| been officially released yet?
| 
| BTW: When is the official realease due?

  It is scheduled for next week, but might slip a week or so beyond
that. I have had zero problems in the last month. I did provide some
changes to the user interface and better help, but Rahul's new
compressor works very well.

  It appears that the official 2.10 release will have the current
version of the compressor, since work on the faster version has not been
going well, and there would be little time to test it. This leaves the
possibility that in a few months another release would be available,
using the same compression in terms of the output produced, but three
times faster or so. The current unix version is somewhat faster than the
beta version of zip for unix.

  A later version might also have some of the MS-DOS parts coded in
assembler, or like lha have the asm output of the C compiler polished by
hand rather than writing new code.

  The current version has adequate speed and great compression, I think
the object now is to get it out the door, and then go for speed
improvements. I seriously believe that I know how to get another few
percent, although Rahul isn't convinced, but it's not needed.

  I've been repacking my old archives, and the disk usage has been
shrinking faster than some guy on late night TV ads for Slim-Fast!
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  GE Corp R&D Center, Information Systems Operation, tech support group
  Moderator comp.binaries.ibm.pc and 386-users digest.