Best huffman tree for optimal compression
-
29-10-2019 - |
Question
I am coding an Huffman string compressor and I would like to have a confirmation I am doing the optimal compression with my tree.
I am using this kind of tree:
Instead of this kinda tree:
I think that over 10 single characters, it's not possible to compress on 8 bits..
Is the first image really the optimal one?
Solution
The very basic idea is to add the two smallest nodes, creating a new node which value is the sum of its 2 children.
Respecting this rule up to the root of the tree guarantee that the tree produced will be optimal.
Therefore, you have no control on the shape of the tree : it entirely depends on the probability distribution of characters. It may end up being a degenerated tree (one branch per level) if the probability distribution looks like a Fibonacci serie.
Creating Huffman tree with a pre-set maximum depth is therefore more complex, and requires to break the usual rule of always adding the 2 smallest nodes. The resulting tree will obviously not be optimal.