Binary Array Compression in C

https://stackoverflow.com/questions/3541612

30-09-2019
|

Question

I have binary array in c, I want to compress the array, kindly suggest me algorithm which compress binary array. I have used Lempel–Ziv–Welch (LZW) algorithm but its not suitable for me because there is no repetition in my data.

Solution

You may have no repetition, but there could still be a pattern in the data which could be taken advantage of. This requires knowing more about the data than that there is no repetition, though.

If you data is actually (or nearly) randomly distributed then compressing it is going to run into the Pidgin Hole problem. This states that if you only have X pidgins and Y holes to put them in, and X > Y, then you don't have enough room. In compression this means that you aren't able to take advantage of the ability to not store some pidgins which are identical twins of one already in a hole, and just leave a note to decompression algorithm to clone that pidgin. In Huffman coding, all pidgins are clones of pidgins in the pidgin library. In several other compression schemes some pidgins may be mega-pidgins made up of other pidgins.

OTHER TIPS

Why not just to use the libz's deflate? As added bonus, libz is available on pretty much every existing platform.

Or newer LZMA? It beats the bzip2 on binary data compression.

You can cut the space in half easily!

Since your binary data has NO repetition, your only options are [0, 1], [1, 0]. Anything more would repeat either a zero or a one. Therefore, you can just represent the first set with a 0 and the second set with a 1. Encoding would look something like this...

encode [0, 1] = 0
encode [1, 0] = 1

And decoding would be...

decode 0 = [0, 1]
decode 1 = [1, 0]

Sorry for the haskell syntax, it's just so much more readable in this case. This turns your two element array into a one element array, and can be stored in half the space! Magic.

EDIT: This ignores the trivial case of [0] and [1]. If those need to be handled (although you shouldn't really be compressing 1 bit), it is impossible to get a better compression ratio than 100%.

If you have binary data, you will most likely treat them as something like a char[]. In your question and comment you state that there is (almost) no repetition, which is only possible if you do not have more than 256 (char) data items.

But I guess you have more data and so compression is possible. If the frequency of your data items is not evenly distributed, you may have some luck with a simple Huffman coding.

To give you more precise advice we need more details about the kind of data you want to compress.

Alternatively: Your binary data is representing certain values. You could reduce the bit count of all values. You need to know the possible range and write and read the data bitwise. This might save a lot of space if you for example store value in uint32 that only need a few bits.

Compression is not magic. If your data is completely random, there is no compression algorithm available which can make it smaller.

Most data is not completely random, but it is up to you to discover the optimum way to express it so that the patterns can be detected. Images and sound are common enough that standard algorithms have been developed, but no more can be said about your specific problem without getting many more details.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow