jmunkki@hila.hut.fi (Juri Munkki) (10/06/90)
I just wrote a Huffmann compression system for sound files. I decided to limit the length of bit strings to 32 bits. A few quick calculations seem to indicate that quite a lot of data will be needed before strings exceed 32 bits. I'm only going to compress up to 300KB of data, but it wouldn't hurt to know the limits of my program. My question is: Assuming that your string length is N, how much data is needed before this is insufficient to represent the tree structure? A worst case scenario for N=3 looks like this: A / \ / \ 0 B / \ / \ 10 C / \ / \ 110 111 It is easy to prove that if the frequency of 0 is equal to the frequency of B, you get the worst case scenario. This simple case requires 2^N bytes for the worst case (N=3, requires 8 bytes). The minimum amount of data is: 111 1 byte 110 1 byte (C=2 bytes) 10 1 byte (B=3 bytes) 0 3 bytes (A=6 bytes) ___________________________ total 6 bytes This is assuming that the program tries to minimize maximum string length by trying to eliminate leaf nodes first. If this isn't done, then you only need a 2 byte leaf node frequence for point 0 to get the worst case scenario. If N=4, add one more leaf node with a frequency of 4. The rule would seem to be that every new leaf node has to have a frequency that is 1+the amount of data for the case N-1. For N=32, I get the result 7 881 195, which is well over 7 MB of data (>>300KB). One technique to make the algorithm work for larger amounts of data would be to lower the resolution of the frequency count enough to make the longest string shorter than 32 bits. I hope this isn't too trivial for this group...I'm just a comp.sci. student. ____________________________________________________________________________ / Juri Munkki / Helsinki University of Technology / Wind / Project / / jmunkki@hut.fi / Computing Center Macintosh Support / Surf / STORM / ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~