[comp.sys.ibm.pc] Comparison of compaction routines

loci@csccat.UUCP (Chuck Brunow) (04/24/88)

	The tester of the various routines for compaction titled as ...

>                  TEST OF MS-DOS COMPRESSION PROGRAMS 
>
>                              Release 1.0
>
>                Tests performed 4-21-88 by Erik Talvola
>
>                 Article Copyright 1988 by Erik Talvola

	provided a cursory look at programs by name. It would be well
	to specify what the method of compaction used in each case was.
	Setting that aside, the author misses the principle point of
	compaction: to make it small. Speed is not a good measure of
	this function except as it relates to transmission speed, ie.
	how small is it. The 16-bit compress clearly blew everything
	else away in overall terms. Further, because the routines of
	compress are widely available, and Unix compatible, that method
	can be integrated into a program specially suited to the task
	at hand, and need not burden the user with a archiver (unpacking
	yet another layer is BS). The authors clear bias toward bells
	and whistles blinds him to the real point.

ralf@b.gp.cs.cmu.edu (Ralf Brown) (04/25/88)

In article <483@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
}
}	The tester of the various routines for compaction titled as ...
}
}>                  TEST OF MS-DOS COMPRESSION PROGRAMS 
}
}	provided a cursory look at programs by name. It would be well
}	to specify what the method of compaction used in each case was.

ARC: 	    12-bit LZW
PKARC: 	    13-bit LZW
PKARC -oct: 12-bit LZW
ZOO: 	    12-bit? LZW
SQPC: 	    Huffmann
Compress:   16-bit LZW

}	Setting that aside, the author misses the principle point of
}	compaction: to make it small. Speed is not a good measure of
}	this function except as it relates to transmission speed, ie.
}	how small is it. The 16-bit compress clearly blew everything
}	else away in overall terms. 

But what if the compaction/decompression takes longer than the time saved
by sending a smaller file? (this is a definite concern with 9600bps modems)
Also, available memory is a definite concern on MSDOS machines.  16-bit
compress needs ~500K to run (~450K for the compression tables, ~25K for
the executable, plus other overhead).  This basically means unloading all
TSRs, exiting any operating environment (such as DESQview/Windows/etc), even
on a 640K machine.

}                                    Further, because the routines of
}	compress are widely available, and Unix compatible, that method
}	can be integrated into a program specially suited to the task

ARC's LZW code *is* the compress LZW code! 


-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?
Newsgroups: TEST OF MS-DOS COMPRESSION PROGRAMS
Subject: Re: Comparison of compaction routines
Keywords: good data, bad (biased) conclusion
Distribution: 
References: <483@csccat.UUCP>
Organization: Carnegie-Mellon University, CS/RI

In article <483@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
}
}	The tester of the various routines for compaction titled as ...
}
}>                  TEST OF MS-DOS COMPRESSION PROGRAMS 
}
}	provided a cursory look at programs by name. It would be well
}	to specify what the method of compaction used in each case was.

ARC: 	    12-bit LZW
PKARC: 	    13-bit LZW
PKARC -oct: 12-bit LZW
ZOO: 	    12-bit? LZW
SQPC: 	    Huffmann
Compress:   16-bit LZW

}	Setting that aside, the author misses the principle point of
}	compaction: to make it small. Speed is not a good measure of
}	this function except as it relates to transmission speed, ie.
}	how small is it. The 16-bit compress clearly blew everything
}	else away in overall terms. 

But what if the compaction/decompression takes longer than the time saved
by sending a smaller file? (this is a definite concern with 9600bps modems)
Also, available memory is a definite concern on MSDOS machines.  16-bit
compress needs ~500K to run (~450K for the compression tables, ~25K for
the executable, plus other overhead).  This basically means unloading all
TSRs, exiting any operating environment (such as DESQview/Windows/etc), even
on a 640K machine.

}                                    Further, because the routines of
}	compress are widely available, and Unix compatible, that method
}	can be integrated into a program specially suited to the task

ARC's LZW code *is* the compress LZW code! 

gudeman@arizona.edu (David Gudeman) (04/25/88)

In article  <483@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>
>	The tester of the various routines for compaction titled as ...
>
>>                  TEST OF MS-DOS COMPRESSION PROGRAMS ...
>>
>>                Tests performed 4-21-88 by Erik Talvola
>
>       ... the author misses the principle point of compaction: to
>       make it small. Speed is not a good measure of this function
>       except as it relates to transmission speed, ie.  how small is
>       it.

You may not mind waiting 3 minutes for a file to unpack, but a lot of
us feel otherwise.

>	The 16-bit compress clearly blew everything else away in
>	overall terms.

Only in terms of compaction.  I would hardly call that "overall".

>	Further, because the routines of compress are widely
>	available, and Unix compatible...

So are zoo and arc formats.

>	... and need not burden the user with an archiver...

I thought the archiving ability was a bonus.  It seems that most
programs posted to this group have consisted of several files
(executable, docs, sometimes special libraries and sources), and the
ability to archive makes posting several related files much easier.

>	The authors clear bias toward bells and whistles blinds him
>	to the real point. 

The author obviously has different priorities than you do, that hardly
makes him blind to the REAL POINT as defined by you.  Archiving and
speed are hardly "bells and whistles", and your intolerant attitude
doesn't add anything constructive to the conversation.

I prefer zoo because it handles directory structure.  That isn't a
bell or a whistle either.

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/26/88)

In article <483@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
| >                Tests performed 4-21-88 by Erik Talvola
| >
| >                 Article Copyright 1988 by Erik Talvola
| 
| 	provided a cursory look at programs by name. It would be well
| 	to specify what the method of compaction used in each case was.
| 	Setting that aside, the author misses the principle point of
| 	compaction: to make it small. Speed is not a good measure of
| 	this function except as it relates to transmission speed, ie.
| 	how small is it. The 16-bit compress clearly blew everything

  Excuse me? I don't disagree that transmission speed is important, or
that it must be the most important thing to you. However, I think there
are a number of us who use archivers for a number of things other than
sending files, such as saving them on our hard disk.

  I thought the article was useful, although there were a few points
which I communicated to the author by mail. I don't think the author
misses the point at all, DOS software is not a background item where you
don't care about how long it takes.

  I'm really mistified about why compress was so slow. On Xenix it runs
slightly faster than zoo. I'm rerunning some tests and will share the
results with the original author for his next posting.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

rce229@uxa.cso.uiuc.edu (04/26/88)

I know of many people who complain that PKARC with its 13-bit
"Squash" method should not be used due to the extra memory required
(over the normal 12-bit method).  How much memory does the Compress
port require?              

caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/27/88)

I ran some comparisions on a 157741 ASCII (shar archive) file.
Compress and zoo were compiled 32 bits, arc is an old version
compiled 286 small model with packed structures.  The new
squashing arc on the net won't compile on SYS V systems yet.
The zoo I use is compiled from sources about a year old.

User	Sys	Real	File	Program(s)
5.34	1.98	7.98	75083	time (tar cvf baz rzsz.sh;compress baz)
12.82	0.68	16.36	84375	time zoo a foo rzsz.sh
24.86	1.96	29.26	95735	time arc a foo rzsz.sh
4.52	0.86	5.48	74697	time compress rzsz.sh
4.60	0.90	5.72	85306	time compress -b13 rzsz.sh

keithe@tekirl.TEK.COM (Keith Ericson) (04/28/88)

In article <669@omen.UUCP> caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
< I ran some [timimg] comparisions on a 157741 ASCII (shar archive)
< file.
...
< Program(s)
< (tar cvf baz rzsz.sh;compress baz)
< zoo a foo rzsz.sh
< arc a foo rzsz.sh
< compress rzsz.sh
< compress -b13 rzsz.sh

A crucial feature missing from this comparison is the ability (or lack
thereof) of the particular program(s) to retain a directory structure in
the compressed result. Tar does it by itself; zoo can do it when used
in conjunction with "stuff" (or at least that's what the documentation
tells me); another combination would be "find" coupled with "cpio" and
"compress" which would also retain the directory structure.

For my money this is the primary failing of arc. Consequently I plan
to use zoo whenever necessary. It works - well - on both MSDOS and UNIX.

keith

laba-5ac@web7f.berkeley.edu (Erik Talvola) (04/28/88)

  Here is a response to the numerous postings about my original comparison
of the compression programs on the PC.  First - about the compress program.
This program was obtained from SIMTEL20.ARPA as the file COMPRESS16.ARC which
is in PD1:<MSDOS.SQ-USQ> directory.  I used their executable file.  Maybe
this used hideous source code or a hideous C compiler - I just ran the
program as it was and it came out very slow.  Second - about the points
I stressed.  One of the first replies claimed that Compress seemed like
the winner because it compressed the best.  I still disagree - I think a
balance between ease of use and compression rate should be the important
factor.  If a program gets even 5% better compression, but takes 5 times
as long to run, then it is not convenient to use.  PKARC seemed to be
the best - it generated very small compressed files and ran the quickest.

  Also, one user noted that perhaps compression should not be used at
all in transfering files, as compression is already used automatically
by UseNET, and compressing already compressed files actually increases
the size.  I have no information on this, so if someone could confirm
this, I would appreciate it.  However, since the files still have to
be UUencoded (or processed with something to convert binary to ASCII),
I am not sure whether this is still valid.  Perhaps compression a UUencoded
compressed file works well, or not.  I repeat - I have little knowledge
of the UseNET network.  

  Finally, I do not have benchmarks for Unix versions of the same 
compress files.  When I get this information, I will pass it along
also.  In my opinion, Zoo or a Squashing Arc (when a Sys V version
comes along) seems to be the way to go, with Zoo being better (maybe)
in that it handles more things, such as saving directory structures,
even though it appears to be a bit slower than PKARC on MS-DOS.  

  Well, this should provide some new material for people to flame about.
I don't know when an updated version of my test will be released, but
I will post it when it does.

---------------------------------------------------
Erik Talvola          laba-5ac@widow.berkeley.edu

"...death is an acquired trait." -- Woody Allen
---------------------------------------------------

nelson@sun.soe.clarkson.edu (Russ Nelson) (04/29/88)

From article <9331@agate.BERKELEY.EDU>, by laba-5ac@web7f.berkeley.edu (Erik Talvola):
> ... PKARC seemed to be
> the best - it generated very small compressed files and ran the quickest.

That was the conclusion in my article in the May 1987 Dr. Dobbs.  In fact,
PKWARE is now quoting that article in its ads :-)
-- 
char *reply-to-russ(int network) {
if(network == BITNET) return "NELSON@CLUTX";
else return "nelson@clutx.clarkson.edu"; }

loci@csccat.UUCP (Chuck Brunow) (05/02/88)

In article <824@sun.soe.clarkson.edu> nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
>From article <9331@agate.BERKELEY.EDU>, by laba-5ac@web7f.berkeley.edu (Erik Talvola):
>> ... PKARC seemed to be
>> the best - it generated very small compressed files and ran the quickest.
>
>That was the conclusion in my article in the May 1987 Dr. Dobbs.  In fact,
>PKWARE is now quoting that article in its ads :-)
>-- 

	I believe that a distinction should be made between compaction
	and archiving. This was a flaw in the original posting which
	purported to compare compaction routines. All it really did
	was to compare LZW to LZW to LZW ... Sometimes it's 12 bit,
	sometimes 13 bit, sometimes 16 bit, but nearly always Lempel-
	Ziv.

	That is the last time that there is any reasonable uniformity:
	everything breaks down because it isn't a question of which
	archiver is the best, because everyone must have each and every
	one in order to follow this group. There is no order, rhyme or
	reason to it. That was what prompted my earlier posting about
	the insanity of "(shar(uuencode(arc)))". I don't want them all,
	I don't even want one: I find another layer of unpacking to
	be a pain in the @@@. 

	Another point that I raised in my previous posting was that
	the authors of "shareware" archivers must be rich or they've
	been robbed. Well, the results are in: grand larceny is rampant.
	Anyone who writes software for shareware distribution should
	heed the advice of B. Franklin ("Poor Richard's Almanac):
	"Expect nothing and you'll never be disappointed".

	Now, before you leap to your keyboards to send me your knee-
	jerk reaction to why I'm wrong, why your favorite archiver
	is best, or anything else, forget it. I already know that
	the best compaction routine is LZW, the best archiver hasn't
	been written yet, and the best OS is Unix.