mwm@ea.UUCP (09/05/84)
#N:ea:6900008:000:1266 ea!mwm Sep 5 09:34:00 1984 While working on a hypothetical id problem, we encountered an interesting problem. Given a huffman encoded text, without the tree, can the original text be recreated? For those not familiar with huffman compression, the technique works in three phases: 1. Scan the text to get character occurrence counts. 2. Given a list of characters/nodes, choose the two characters/nodes of lowest count. Make a tree node with those two characters as the left and right subtree, and give the node the sum of the two occurrence counts as its occurrence count. Label one arc 0 and the other arc 1. Delete the two characters from the list and put the node in the list. Repeat this step until the list is a single tree containing all the characters as leaves. 3. Scan the text, and as you see each character, walk the tree from the root to the leaf with that character on it, outputting the bits that the arcs are labelled with. The resulting output stream is the encoded text. If the original input was unstructured binary data of such a nature that you could recognize "valid" text if you saw it (say it was a video image of a Picasso painting, or a retina scan) , is it possible to get the original text back *without* having the tree? <mike