Question

I have a big object in memory which I want to save as a blob into database. I want to compress it before saving because database server is usually not local.

This is what I have at the moment:

using (var memoryStream = new MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);

    return memoryStream.ToArray();
  }
}

However when I zip same bytes with Total Commander it cuts down the size always by 50% at least. With the above code it compresses 58MB to 48MB and anything smaller than 15MB gets even bigger.

Should I use a third-party zip library or is there a better way of doing this in .NET 3.5. Any other alternatives to my problem?

EDIT:

Just found a bug in a code above. Angelo thanks for your fix.

GZipStream compression is still not great. I gets Average 35% compression by gZipStream compared to TC 48% compression.

I have no idea what kind of bytes I was getting out with previous version :)

EDIT2:

I have found how to improve compression from 20% to 47%. I had to use two Memory streams instead of one! Can anyone explain why is this the case?

Here is a code with 2 memory streams which does a lot better compression !!!

using (MemoryStream msCompressed = new MemoryStream())
using (GZipStream gZipStream = new GZipStream(msCompressed, CompressionMode.Compress))
using (MemoryStream msDecompressed = new MemoryStream())
{
  new BinaryFormatter().Serialize(msDecompressed, obj);
  byte[] byteArray = msDecompressed.ToArray();

  gZipStream.Write(byteArray, 0, byteArray.Length);
  gZipStream.Close();
  return msCompressed.ToArray();
}
Was it helpful?

Solution

GZipStream from .NET 3.5 doesn't allow you to set compression level. This parameter was introduced in .NET 4.5, but I don't know if it will give you better result or upgrade is suitable for you. Built in algorithm is not very optimal, due to patents AFAIK. So in 3.5 is only one way to get better compression is to use third party library like SDK provided by 7zip or SharpZipLib. Probably you should experiment a little bit with different libs to get better compression of your data.

OTHER TIPS

You have a bug in your code and the explanation is too long for a comment so I present it as an answer even though it's not answering your real question.

You need to call memoryStream.ToArray() only after closing GZipStream otherwise you are creating compressed data that you will not be able to deserialize.

Fixed code follows:

using (var memoryStream = new System.IO.MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);
  }
  return memoryStream.ToArray();
}

The GZipStream writes to the underlying buffer in chunks and also appends a footer to the end of the stream and this is only performed at the moment you close the stream.

You can easily prove this by running the following code sample:

byte[] compressed;
int[] integers = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var mem1 = new MemoryStream();
using (var compressor = new GZipStream(mem1, CompressionMode.Compress))
{
    new BinaryFormatter().Serialize(compressor, integers);
    compressed = mem1.ToArray();
}

var mem2 = new MemoryStream(compressed);
using (var decompressor = new GZipStream(mem2, CompressionMode.Decompress))
{
    // The next line will throw SerializationException
    integers = (int[])new BinaryFormatter().Deserialize(decompressor);
}

The default CompressionLevel used is Optimal, at least according to http://msdn.microsoft.com/en-us/library/as1ff51s, so there is no way to tell the GZipStream to "try harder".. It seems for me that a 3rd party lib would be better.

I personally never considered the GZipStream to be 'good' in terms of the compression - probably they put the effort in minimizing the memory footprint or maximizing speed. However, seeing how WindowsXP/WindowsVista/Windows7 handles the ZIP files natively in the Explorer - well.. I cannot say neither it is fast, nor have good compression.. I'd not be surprised if the Explorer in Win7 actually uses the GZipStream - all in all they have implemented it and put into the framework, so probably they use it in many places (i.e., seems to be used in HTTP GZIP handling), so I'd stay away from it I needed an efficient processing.. I've never done any serious research in this topic, as my company bought a nice zip-handler many years ago when the .Net was in its early days.

edit:

More refs:
http://dotnetzip.codeplex.com/workitem/7159 - but marked as "closed/resolved" in 2009.. maybe you will find something interesting in that code?

heh, after a few minutes of googling, it seems that 7Zip exposes some C# bindings: http://www.splinter.com.au/compressing-using-the-7zip-lzma-algorithm-in/

edit#2:

just a FYI abou .net4.5: https://stackoverflow.com/a/9808000/717732

The original question was related to .NET 3.5. Three years after, .NET 4.5 is much more likely to be used, my answer is only valid for 4.5. As other mentioned earlier, the compression algorithm got good improvements with .NET 4.5

Today, I wanted to compress my data set to save some space. So similar than the original question but for .NET4.5. And because I remember having using the same trick with double MemoryStream many years ago, I just gave a try. My data set is a container objects with many hashsets and lists of custom ojects with string/int/DateTime properties. The data set contains about 45 000 objects and when serialized without compression, it creates a 3500 kB binary file.

Now, with GZipStream, with single or double MemoryStream as described in the question, or with DeflateStream (which uses zlib in 4.5), I always get a file of 818 kB. So I just want to insist here than the trick with double MemoryStream got useless with .NET 4.5.

Eventually, my generic code is as follow:

     public static byte[] SerializeAndCompress<T, TStream>(T objectToWrite, Func<TStream> createStream, Func<TStream, byte[]> returnMethod, Action catchAction)
        where T : class
        where TStream : Stream
     {
        if (objectToWrite == null || createStream == null)
        {
            return null;
        }
        byte[] result = null;
        try
        {
            using (var outputStream = createStream())
            {
                using (var compressionStream = new GZipStream(outputStream, CompressionMode.Compress))
                {
                    var formatter = new BinaryFormatter();
                    formatter.Serialize(compressionStream, objectToWrite);
                }
                if (returnMethod != null)
                    result = returnMethod(outputStream);
            }
        }
        catch (Exception ex)
        {
            Trace.TraceError(Exceptions.ExceptionFormat.Serialize(ex));
            catchAction?.Invoke();
        }
        return result;
    }

so that I can use different TStream, e.g.

    public static void SerializeAndCompress<T>(T objectToWrite, string filePath) where T : class
    {
        //var buffer = SerializeAndCompress(collection);
        //File.WriteAllBytes(filePath, buffer);
        SerializeAndCompress(objectToWrite, () => new FileStream(filePath, FileMode.Create), null, () =>
        {
            if (File.Exists(filePath))
                File.Delete(filePath);
        });
    }

    public static byte[] SerializeAndCompress<T>(T collection) where T : class
    {
        return SerializeAndCompress(collection, () => new MemoryStream(), st => st.ToArray(), null);
    }

you can use a custom formatter

public class GZipFormatter : IFormatter
{
    IFormatter formatter;
    public GZipFormatter()
    {
        this.formatter = new BinaryFormatter();
    }
    public GZipFormatter(IFormatter formatter)
    {
        this.formatter = formatter; 
    }
    ISurrogateSelector IFormatter.SurrogateSelector { get => formatter.SurrogateSelector; set => formatter.SurrogateSelector = value; }
    SerializationBinder IFormatter.Binder { get => formatter.Binder; set => formatter.Binder = value; }
    StreamingContext IFormatter.Context { get => formatter.Context; set => formatter.Context = value; }

    object IFormatter.Deserialize(Stream serializationStream)
    {
        using (GZipStream gZipStream = new GZipStream(serializationStream, CompressionMode.Decompress))
        {
            return formatter.Deserialize(gZipStream);                
        }
    }
    void IFormatter.Serialize(Stream serializationStream, object graph)
    {
        using (GZipStream gZipStream = new GZipStream(serializationStream, CompressionMode.Compress))
        using (MemoryStream msDecompressed = new MemoryStream())
        {
            formatter.Serialize(msDecompressed, graph);
            byte[] byteArray = msDecompressed.ToArray();

            gZipStream.Write(byteArray, 0, byteArray.Length);
            gZipStream.Close();                
        }
    }

then you can use as this :

IFormatter formatter = new GZipFormatter();
using (Stream stream = new FileStream(path...)){
   formatter.Serialize(stream, obj); 
}        
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top