File Compressed by GZIP grows instead of shrinking

https://stackoverflow.com/questions/479865

vb.net
gzip

20-08-2019
|

Question

I used the code below to compress files and they keep growing instead of shrinking. I comressed a 4 kb file and it became 6. That is understandable for a small file because of the compression overhead. I tried a 400 mb file and it became 628 mb after compressing. What is wrong? See the code. (.net 2.0)

Public Sub Compress(ByVal infile As String, ByVal outfile As String)
    Dim sourceFile As FileStream = File.OpenRead(inFile)
    Dim destFile As FileStream = File.Create(outfile)

    Dim compStream As New GZipStream(destFile, CompressionMode.Compress)

    Dim myByte As Integer = sourceFile.ReadByte()
    While myByte <> -1
        compStream.WriteByte(CType(myByte, Byte))
        myByte = sourceFile.ReadByte()
    End While

    sourceFile.Close()
    destFile.Close()
End Sub

Solution

Are you sure that writing byte by byte to the stream is a really good idea? It will certainly not have ideal performance characteristics and maybe that's what confuses the gzip compressing algorithm too.

Also, it might happen that the data you are trying to compress is just not really well-compressable. If I were you I would try your code with a text document of the same size as text documents tend to compress much better than random binary.

Also, you could try using a pure DeflateStream as opposed to a GZipStream as they both use the same compression algorithm (deflate), the only difference is that gzip adds some additional data (like error checking) so a DeflateStream might yield smaller results.

My VB.NET is a bit rusty so I'll rather not try to write a code example in VB.NET. Instead, here's how you should do it in C#, it should be relatively straightforward to translate it to VB.NET for someone with a bit of experience: (or maybe someone who is good at VB.NET could edit my post and translate it to VB.NET)

FileStream sourceFile;
GZipStream compStream;

byte[] buffer = new byte[65536];
int bytesRead = 0;
while (bytesRead = sourceFile.Read(buffer, 0, 65536) > 0)
{
     compStream.Write(buffer, 0, bytesRead);
}

OTHER TIPS

If the underlying file is itself highly unpredictable (already compressed or largely random) then attempting to compress it will cause the file to become bigger.

Going from 400 to 628Mb sounds highly improbable as an expansion factor since the deflate algorithm (used for GZip) tends towards a maximum expansion factor of 0.03% The overhead of the GZip header should be negligible.

Edit: The 4.0 c# release indicates that the compression libraries have been improved to not cause significant expansion of uncompressable data. This suggests that they were not implementing the "fallback to the raw stream blocks" mode. Try using SharpZipLib's library as a quick test. That should provide you with close to identical performance when the stream is incompressible by deflate. If it does consider moving to that or waiting for the 4.0 release for a more performant BCL implementation. Note that the lack of compression you are getting strongly suggests that there is no point you attempting to compress further anyway

This is a known anomaly with the built-in GZipStream (And DeflateStream).
I can think of two workarounds:

use an alternative compressor.
build some logic that examines the size of the "compressed" output and compares it to the size of the input. If larger, chuck the output and just store the data.

DotNetZip includes a "fixed" GZipStream based on a managed port of zlib. (It takes approach #1 from above). The Ionic.Zlib.GZipStream can replace the built-in GZipStream in your apps with a simple namespace swap.

Thank you all for good answers. Earlier on I tried to compress .wmv files and one text file. I changed the code to DeflateStream and it seems to work now. Cheers.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow