Question

Does anyone of you know a lossless compression algorithm, which produces headerless outputs? For example do not store the huffman tree used to compress it? I do not speak about hard coded huffman trees, but I like to know if there is any algorithm that can compress and decompress input without storing some metadata in its output. Or is this even theoretically impossible?

Was it helpful?

Solution

Adaptive Huffman coding does exactly that. More generally, the term adaptive coding is used to describe entropy codes with this property. Some dictionary codes have this property too, e.g. run-length encoding (RLE) and Lempel-Ziv-Welch (LZW).

OTHER TIPS

Of course it is posible. Among others, the LZ family of compressors don't need to output anything apart from the compressed data itself, as the dictionary is built on-line as compression (or decompression) progress. You have a lot of reference implementations for those LZ-type algorithms. For example, LZMA, component of 7zip.

Run Length Encoding would be one example

lzo springs to mind. it's used in OpenVPN, with great results

Why are you looking for compression algorithms with headerless compressed output?

Perhaps (a) you have a system like 2-way telephony that needs low-latency streaming compression/decompression. The adaptive coding category of compression algorithms mentioned by Zach Scrivena and the LZ family of dictionary compression algorithms mentioned by Diego Sevilla and Javier are excellent for this kind of application. Practical implementations of these algorithms usually do have a byte or two of metadata at the beginning (making them useless for (b) applications), but that has little or no effect on latency.

Perhaps (b) you are mainly interested in cryptography, and you hear that compress-before-encrypt gives some improved security properties, as long as the compressed text does not have fixed metadata header "crib". Modern encryption algorithms aren't (as far as we know) vulnerable to such "cribs", but if you're paranoid you might be interested in "bijective compression" (a, b, c, etc.). It's not possible to detect errors in transmission (flipped bits, inserted bits, deleted bits, etc.) when a receiver gets such compressed output (making these algorithms not especially useful for (a) applications).

Perhaps (c) you are interested in headerless compression for some other reason. Sounds fascinating -- what is that reason?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top