Question

I have compressed a binary file using Huffman encoding. Now i am trying to find the compression efficiency.

In my Binary file i have symbols(bunch of 0 & 1) and frequency(repetition of symbols). suppose i have :

symbol :0 freq : 173
symbol :1 freq : 50
symbol :2 freq : 48
symbol :3 freq : 45 

size of file would be (173+50+48+45)*8=2528 (If my way of calculating the size is correct? please correct me if i am wrong. (On debugging i get 2536) (8 more i don't know why ?)

After compression i got encoding like this

symbol : 0 Code : 1
symbol : 1 Code : 00
symbol : 2 Code : 011
symbol : 3 Code : 010

Could some one please tell me how to get Huffman compression efficiency of this binary file using these information ? (I tried searching on google but there is no sample of binary file they have some frequency of float type which i am not able to understand how to relate them with my Binary file). Thanks a lot for this . Algorithm (c/c++/c#) to do that is also appreciated.

Was it helpful?

Solution

Given your symbol table:

symbol : 0 Code : 1
symbol : 1 Code : 00
symbol : 2 Code : 011
symbol : 3 Code : 010

and your byte counts:

symbol :0 freq : 173
symbol :1 freq : 50
symbol :2 freq : 48
symbol :3 freq : 45 

You then multiply the number of occurrences of each symbol by the number of bits for that symbol. For example, symbol 0 requires 1 bit to encode, so the number of bits would be 173. You have:

(1 * 173) + (2 * 50) + (3 * 48) + (3 * 45)

That count is in bits. Divide by 8 to give you the number of bytes, and round up. That will tell you how many bytes for the encoded data.

You also have to store the Huffman table, which in this case you could do in 8 bytes. Actually, 9 bytes because you have to store the size. The general case of storing the Huffman tree is somewhat more involved.

OTHER TIPS

Once you have your Huffman table you can calculate the size of the compressed image in bits by multiplying the bit encoding length of each symbol with that symbol's frequency.

On top of that you then need to add the size of the Huffman tree itself, which is of course needed to un-compress.

So for you example the compressed length will be

173 * 1 + 50 * 2 + 48 * 3 + 45 * 3 = 173 + 100 + 144 + 135 = 552 bits ~= 70 bytes.

The size of the table depends on how you represent it.

compression rate = ((1 * 173) + (2 * 50) + (3 * 48) + (3 * 45)) / (173+50+48+45) = 1.746 bits entropy rate = sum of Plog2P = 1.439 bits

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top