[comp.theory] Huffmann compression string length question

jmunkki@hila.hut.fi (Juri Munkki) (10/06/90)

I just wrote a Huffmann compression system for sound files. I decided to limit
the length of bit strings to 32 bits. A few quick calculations seem to indicate
that quite a lot of data will be needed before strings exceed 32 bits. I'm only
going to compress up to 300KB of data, but it wouldn't hurt to know the limits
of my program.

My question is:

Assuming that your string length is N, how much data is needed before this
is insufficient to represent the tree structure?

A worst case scenario for N=3 looks like this:

			  A
			 / \
			/   \
			0    B
			    / \
			   /   \
			  10	C
			       / \
			      /   \
			    110	   111

It is easy to prove that if the frequency of 0 is equal to the frequency of
B, you get the worst case scenario. This simple case requires 2^N bytes for
the worst case (N=3, requires 8 bytes).

The minimum amount of data is:
	111	1 byte
	110	1 byte	(C=2 bytes)
	10	1 byte	(B=3 bytes)
	0	3 bytes	(A=6 bytes)
	___________________________
	total	6 bytes
	
This is assuming that the program tries to minimize maximum string length by
trying to eliminate leaf nodes first. If this isn't done, then you only need
a 2 byte leaf node frequence for point 0 to get the worst case scenario.

If N=4, add one more leaf node with a frequency of 4. The rule would seem to
be that every new leaf node has to have a frequency that is 1+the amount of
data for the case N-1. For N=32, I get the result 7 881 195, which is well
over 7 MB of data (>>300KB).

One technique to make the algorithm work for larger amounts of data
would be to lower the resolution of the frequency count enough to make
the longest string shorter than 32 bits.

I hope this isn't too trivial for this group...I'm just a comp.sci. student.

   ____________________________________________________________________________
  / Juri Munkki	    /  Helsinki University of Technology   /  Wind  / Project /
 / jmunkki@hut.fi  /  Computing Center Macintosh Support  /  Surf  /  STORM  /
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~