How to pack bits when doing Huffman compression?

https://stackoverflow.com/questions/8415701

huffman-code

29-10-2019
|

Question

I am implementing a program to perform Huffman compression/decompression (for the sake of learning/having fun, so I don't want to use existing libraries/programs).

I managed to build the compression three, so I have a table with all the characters and their respective compressed representation in bits. For example:

a = 0010 b = 01101 c = 0011 d = 1101 e = 101

Now my idea is to store the bits into a container (e.g., a char or int variable) and then to output them to a file.

I know how to pack/unpack bits into a char or int using bitwise operations. The problem I am facing, however, is that the number of bits in the compressed version won't match the number of bits I have available.

Suppose I want to compress the string "abc" using the table above. I would start by compressing 'a', so packing 0010 into a char variable. Next I would compress 'b', but that requires 5 bits, and I only have 4 bits left on my char variable. I could use another variable, but it would become a mess to keep track what variable is using how many bits.

Using int would give me 32 bits to work with, but the same problem would happen once I get close to the limit.

Solution

There is no way around. You have to keep track of the bits left in your storing structure.

It's not really a mess. It's in fact relatively easy to try. Just store the upper bits in the remaining storage, and then store the lower ones into the new storage. At each step, you need to know how many bits are left.

I would also suggest to use uint32 types instead of char, due to their superior storage capacity. It will require less shuffling, and therefore improve speed.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow