Question

Let's say I have an alphabet $$\Sigma = \{A, B, C, D, E\}$$

with probabilities $$P(A) = P(B) = P(C) = 0.25 \text{ and } P(D)=P(E) = 0.125.$$

I know that the entropy then is: $$H(\Sigma) = 3 \cdot 0.25 \cdot \log 4 + 2 \cdot 0.125 \cdot \log 8 = 2.25.$$

My question now is: What does this mean in relation to the lower limit of compression? How many bits will I at least need to compress a text that consists of the above alphabet?

Was it helpful?

Solution

The idea is to decode the more frequently used symbols with less number of bits than the less used ones

So ur example means we can compress more if we decode A, B, C in less bits than E, D rather than equiprobable decoding By Huffman Coding A, B, C is represented by 2 bits (that's log 4); while D, E take 3 bits (log 8) This way ur expected coding size is minimal (2.25* text length) because u expect ur file to have 0.25 of its characters as A,... 0.125 as E,...

I hope this makes it clear...

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top