Compressing small piece of data

https://stackoverflow.com/questions/11962959

26-06-2021
|

Pergunta

I have a buffer of let's say 4KB, containing data in JSON-like format. I need to add significantly more information (up to let's say 3x more) to it, but I have to fit in this small chunk of memory. I was thinking about using libZ to compress text, but I'm afraid it will not perform well since the data consists mostly of some unique substrings. What would you recommend in this situation? Thanks, Chris

Solução

Consider a fixed dictionary containing up to 32K of strings that you expect to appear in your data. You would use zlib's deflateSetDictionary() and inflateSetDictionary() on each end (the sender and receiver of the data respectively) with the same dictionary on both ends. That may get you the compression you're looking for. Without a dictionary, you are unlikely to get that sort of compression with such a small amount of data.

Outras dicas

If you really want to stick with compression, a compression algorithm that uses a custom dictionary that leverages the specific structure of your data will perform the best. I implemented something just like that with SharpZipLib.

If you want to store more data in the buffer and aren't stuck on using compression of text-like data, consider a binary protocol such as Google's Protocol Buffers.

Update

@Mark's answer outlines how to use a custom dictionary with zlib.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow