Question

I have created a simple CLR functions for compressing/decompressing NVARCHAR columns:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlBinary Compress( string str ){
    if( str == null ){return new SqlBinary();}

    if( String.IsNullOrEmpty( str ) ){str = " ";}

    byte[] bytes = Encoding.Unicode.GetBytes( str );
    using( MemoryStream msi = new MemoryStream( bytes ) ){
        using( MemoryStream mso = new MemoryStream() ){
            using( GZipStream gs = new GZipStream( mso, CompressionMode.Compress ) ){
                msi.CopyTo( gs );
            }
            return new SqlBinary( mso.ToArray() );
        }
    }
}

The compression ratio I get is about 4 or if I have 1024 KB of not compressed data I will get 256 KB of compressed data. I am aware that the ratio depends on the data itself and its size, but I want to get a better ratio.

As I am using SQL Server 2012 and .net 4.0, is there a change that the compressions are not giving the expected ratio because of issues like this?

And is there an alternative class I can use in the SQL CLR function? There are such alternatives but are not supported currently.

Was it helpful?

Solution

Here are some thoughts on this:

  1. Do you know that you should get better compression on the strings that you have tested with? Have you tested those same strings by gzipping outside of .NET? Such as on Linux / CygWin -- DOS port of UNIX utilities / PHP / etc?

  2. If you have updated your system with .NET 4.5, then you are using the updated GZipStream. This is due to it being in System.dll which is a supported library. You can test this out by using the new constructor that allows for a CompressionLevel. Just change CompressionMode.Compress to be CompressionLevel.Optimal. SQL Server is bound to a specific version of the CLR, not to a specific version of the .NET Framework. Meaning, any new functionality in any of the supported libraries is usable, so long as any server you deploy your code to has had its .NET updated.

    This does not mean you will get better compression. I tested this code and it came up with the same 31 bytes for "Hello World" as PHP and Fiddler generated, as noted in one of the questions you linked to: https://stackoverflow.com/questions/11435200/why-does-my-c-sharp-gzip-produce-a-larger-file-than-fiddler-or-php .

    I just tested again with a string of 3405 random characters (i.e. "fsdkjf skdj f..."). I declared the variable as NVARCHAR(4000) and ran it through your code, after making the changes that I suggest here. The length of the compressed binary was 211 bytes. I then copied and pasted that same string into Notepad++, made sure that the encoding was set to "UCS-2 Little Endian" and saved. I checked the file in Windows Explorer and it was 6812 bytes (6810 in data, as also reported by DATALENGTH of the variable, plus 2 for the byte order mark). I ftped that to a Linux server in binary mode. File size was still 6812 on the Linux server. I then ran gzip -9 on it (i.e. max compression; default is -6). Compressed size? 231 bytes. So the .NET GZipStream actually did slightly better.

  3. CompressionMode.Compress and CompressionLevel.Optimal are functionally equivalent. Each one is the assumed default when specifying the other.

  4. Don't use string for the input param; use SqlString.

  5. Get rid of the byte[] bytes line

  6. Change new MemoryStream( bytes ) in the first using block to be:
    new MemoryStream(str.GetUnicodeBytes())

  7. You can get rid of the if( str == null ){return new SqlBinary();} line. Instead of handling this in the .NET code, just add WITH RETURNS NULL ON NULL INPUT to the CREATE FUNCTION. This way SQL Server won't even invoke your code if the input is NULL :). Just keep in mind for when you have multiple input parameters, this option will return NULL if any of them are NULL. If naturally at least one of them should be able to pass in a NULL, then you do have to handle this situation in your code.

  8. Replace this line if( String.IsNullOrEmpty( str ) ){str = " ";} -- which actually returns a compressed space which is not an empty string -- with:

    if (str.Value.Length == 0)
    {
        return SqlBinary.Null;
    }
    
  9. I haven't tried "zlib" yet, but unfortunately both "SharpZipLib" and "DotNotZip" are buggy and have not been updated in several years and no indication that they will be updated. However, the bugs in "DotNetZip" seem to be mostly around zip file archives and not the GZip functionality (which has been working quite well in SQL# :-) ).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top