Question

I am trying to create valid gzip file (which can be decompressed using standard linux gzip) with data encoded using DEFLATE algorithm with static/preset dictionary.

I've read both specifications for DEFLATE and gzip, and it looks like it is impossible. As I get from DEFLATE specification there are two types of encoding for compressed data blocks:

  • Compressed with dynamic dictionary (sliding window), such blocks starts with header with FDICT flag set to 0.
  • Compressed with static (preset dictionary), with FDICT = 1

But I found no way to actually write such dictionary to file. Is it possible to add some header with my dictionary/dictionaries, or in some other way make gzip to uncompress data encoded with FDICT = 1?

P.S. I am trying to accomplish it using Java's Deflate class, but interested in an actuall gzip support of blocks compressed in such a way.

Was it helpful?

Solution

You are conflating two different concepts, so I'm not sure which you are talking about.

There are deflate blocks which use a static Huffman code, which are generally used when compressing very small amounts of data. Normally dynamic Huffman codes are used, where the code optimized for that particular block is sent at the start of the block. For small amounts of data, e.g. 100 bytes, the overhead of that code description would dominate the size of the output. Instead a static code would be used, which avoids the overhead at the cost of less compression. But overall, the result is smaller. All deflate applications (gzip, zlib, png, etc.) support all deflate block types.

The other concept is a pre-defined dictionary, which is a chunk of 32K of data that preloads the sliding dictionary in which matching strings are searched for. That is only supported by zlib. It is not possible to provide a pre-defined dictionary for a gzip stream. Your link for "deflate" is actually a link to the zlib format, which is where FDICT is defined.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top