[comp.compression] How StuffIt works

hsu_wh@jhunix.HCF.JHU.EDU (William H Hsu) (06/26/91)

	First of all, why is Stuffit so relatively inefficient?  I can't see
why there is such a high overhead for being a Mac application (Compact Pro
seems to do fine).  And there seems to be an "expensive" header length for
small files (< 5K).  Even the compression ratio is unimpressive.
	According to the October 1990 MacTutor article on LZW on the Mac,
Stuffit uses the UNIX compress 14-bit scheme.  From the graph recently
posted, I see that compress ranks fairly high.  Why the discrepancy (rather
large)?

	Another question: how does Stuffit (and other compression programs),
in Ray Lau's words, "determine the characteristics of the input data"?  Can
someone please direct me to such scanning code -- the kinds which determines
whether a file is binary/non-, text/non-, color/TIFF&RIFF/PICT/etc graphics,
DSAD'able data, an image (fit for lossy compression), or fractal data (if
there exists a standard format)?

d88-jwa@byse.nada.kth.se (Jon W{tte) (06/29/91)

hsu_wh@jhunix.HCF.JHU.EDU (William H Hsu) writes:

	   First of all, why is Stuffit so relatively inefficient?  I can't see

	   According to the October 1990 MacTutor article on LZW on the Mac,
   Stuffit uses the UNIX compress 14-bit scheme.  From the graph recently

Well, 14 bit is less than UNIX compress uses - it has a 16 bit hash table
(which takes half a meg !) StuffIt may use the same algo, but you have to
spend some memory to make it behave, too !

	   Another question: how does Stuffit (and other compression programs),
   in Ray Lau's words, "determine the characteristics of the input data"?  Can

StuffIt tries several different methods, and chooses the one that works
best. Trying HuffMan doesn't imply actually doing the coding, just collecting
frequencies and building the tree is enough to know how large the coding
will be.

Getting the data "type" from a set of data is shakey, but you can set
some rules to see what characterizes different sets of data. Some formats
include magic numbers/headers which can be recognized. On the mac this is
easy, since it has a typed file system; the info's already there.

--
						Jon W{tte
						h+@nada.kth.se
						- Speed !