Frage

It's well known that GZIP or DEFLATE (or any compression mechanism) can increase file size sometimes. Is there a maximum (either percentage or constant) that a file can be increased? What is it?

If a file is X bytes, and I'm going to gzip it, and I need to budget for file space in advance - what's the worst case scenario?

UPDATE: There are two overheads: GZIP adds a header, typically 18 bytes but essentially arbitrarily long. What about DEFLATE? That can expand content by a multiplicative factor, which I don't know. Does anyone know what it is?

War es hilfreich?

Lösung

gzip will add a header and trailer of at least 18 bytes. The header can also contain a path name, which will add that many bytes plus a trailing zero.

The deflate implementation in gzip has the option to store 16383 bytes per block, with an overhead of five bytes. It will always choose to do so if the alternative would take more bytes. So the maximum number of compressed bytes for n input bytes is:

n+5(floor(n/16383)+1)

Andere Tipps

Compressed files always have a header indicating how to decompress them.

The size of that header represents the worst case overhead when compressing a file that cannot be compressed (because there is no order/pattern to the data; it is random).

The header varies based on the specific algorithm, and may contain variable-length information as well such as a list of files in the archive.

GZip has at least 18 bytes of overhead (header + CRC-32 in the footer), and may contain optionally a list of files in the archive.

http://en.wikipedia.org/wiki/Gzip#File_format

Note that in special situations, custom compression algorithms can reduce or eliminate the header overhead. For example, I have used a custom compression dictionary known by the compressing and decompressing software to compress short texts, so that a header was not needed. That was a rather rare use case, and probably not useful in most situations (given that storage and bandwidth are relatively cheap).

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top