Question

We have to derive an optimal binary encoding for the infinite set of symbols $\{s_1, s_2, \dots \}$. They're distribution is given by $$p(s_i) = 9 \cdot 10^{-i}$$

My intuition was to use a Huffman encoding. Obviously the Huffman encoding algorithm needs to first find the smallest two elements, which we can't do. But, if we imagine the last step of the Huffman algorithm, it would have the two elements $s_1$ and the supersymbol $s_2s_3s_4\dots$ with probabilities $0.9$ and $0.1$ respectively. Thus, (if left is $0$ and right is $1$) symbol $s_1$ would get codeword $0$. The other symbols would all get a codeword starting with $1$. This tree is clearly self similar. If we represent binary trees as tuples with an empty tree as the nullset $\varnothing$, we have that this tree is given by

$$ T = (\varnothing, T) $$

That is, a left branch which is empty an the right branch which is identical to the whole tree. Thus, we get the codewords

$$ s_1 \mapsto 0, s_2 \mapsto 10, s_3 \mapsto 110, s_4 \mapsto 1110, \dots $$

Now, I know that these are optimal. You can do a proof by contradiction showing that changing any of these codewords and still satisfying the Kraft inequality will cause the expected length to increase.

My question is, is the fact that this infinite Huffman tree arrived at the answer a coincidence or is there a way to formalize Huffman encoding for an infinite set of symbols.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top