[net.math] Huffman decompression?

mwm@ea.UUCP (09/05/84)

#N:ea:6900008:000:1266
ea!mwm    Sep  5 09:34:00 1984

While working on a hypothetical id problem, we encountered an interesting
problem. Given a huffman encoded text, without the tree, can the original
text be recreated?

For those not familiar with huffman compression, the technique works in
three phases:

1. Scan the text to get character occurrence counts.

2. Given a list of characters/nodes, choose the two characters/nodes of
    lowest count.  Make a tree node with those two characters as the left and
    right subtree, and give the node the sum of the two occurrence counts as
    its occurrence count. Label one arc 0 and the other arc 1.  Delete the two
    characters from the list and put the node in the list.  Repeat this step
    until the list is a single tree containing all the characters as leaves.

3. Scan the text, and as you see each character, walk the tree from the
    root to the leaf with that character on it, outputting the bits that the
    arcs are labelled with.

The resulting output stream is the encoded text. If the original input was
unstructured binary data of such a nature that you could recognize "valid"
text if you saw it (say it was a video image of a Picasso painting, or a
retina scan) , is it possible to get the original text back *without*
having the tree?

	<mike