Pergunta

I am definitely missing something very obvious but can anyone explain why there is a lot better compression rate in second case?!

Case 1: very low compression and sometimes even growth in size.

using (var memoryStream = new System.IO.MemoryStream())
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
  new BinaryFormatter().Serialize(gZipStream, obj);
  gZipStream.Close();
  return memoryStream.ToArray();
}

Case 2: a lot better compression and I did not get a size growth.

using (MemoryStream msCompressed = new MemoryStream())
using (GZipStream gZipStream = new GZipStream(msCompressed, CompressionMode.Compress))
using (MemoryStream msDecompressed = new MemoryStream())
{
  new BinaryFormatter().Serialize(msDecompressed, obj);
  byte[] byteArray = msDecompressed.ToArray();

  gZipStream.Write(byteArray, 0, byteArray.Length);
  gZipStream.Close();
  return msCompressed.ToArray();
}

I have done mirrored decompression and in both cases I can deserialize it into source object without any issues.

Here are some stats:

UncSize: 58062085B, Comp1: 46828139B, 0.81%

UncSize: 58062085B, Comp2: 31326029B, 0.54%

UncSize: 7624735B, Comp1: 7743947B, 1.02%

UncSize: 7624735B, Comp2: 5337522B, 0.70%

UncSize: 1237628B, Comp1: 1265406B, 1.02%

UncSize: 1237628B, Comp2: 921695B, 0.74%

Foi útil?

Solução

You don't say which version of .NET you're using. In versions prior to 4.0, GZipStream compresses data on a per-write basis. That is, it compresses the buffer you send to it. In your first example, the Serialize method is likely writing very small buffers to the stream (one field at a time). In your second example, Serialize serializes the entire object to the memory stream, and then the memory stream's buffer is written to the GZipStream in one big chunk. GZipStream does much better when it has a larger buffer (64K is close to optimum) to work with.

This may still be the case in .NET 4.0. I don't remember if I tested it.

The way I've handled this in the past is with a BufferedStream:

using (var mstream = new MemoryStream())
{
    using (var bstream = new BufferedStream(new GZipStream(mstream, CompressionMode.Compress), 65536))
    {
        new BinaryFormatter().Serialize(btream, obj);
    }
    return mstream.ToArray();
}

That way, the compressor gets a 64K buffer to work with.

Prior to .NET 4.0, there was no benefit to providing a buffer larger than 64K for GZipStream. I've seen some information indicating that the compressor in .NET 4.0 can do a better job of compression with a larger buffer. However, I've not tested that myself.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top