Frage

I'm compressing some packets of data using the GZipStream class within the .NET framework. Everything works fine and the compression ratio is OK, but when I peek at the compressed data using a hex editor I noticed that as much as a third of each compressed packet is trailing zeroes. Is this normal?

Presumably GZipStream is a block based compressor and the output is padded to align with some block size? Are there some alternatives that are equally stable, as well supported but without this issue? (I figure I can gain another 10-30% compression by using a similar compression algorithm that doesn't pad so excessively).

War es hilfreich?

Lösung

Using GZipStream shouldn't add an excessive amount of trailing zeros.

But if you use MemoryStream incorrectly, it can cause this effect. It internally uses a byte[] to store the data. This internal buffer can be larger than the data written so far to reduce the number of allocations. If you use GetBuffer() you get back the full array, it's your own responsibility to only use the first Length bytes of it.

Alternatively you can call ToArray() which returns a new byte array with exactly Length bytes.

To quote the documentation for GetBuffer():

Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

Andere Tipps

The gzip format does not have trailing zeros. Other than at most three trailing zero bytes for small files, since the length of the uncompressed data (modulo 232) is stored at the end as a four-byte little-endian integer.

Something else is putting those zeros there.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top