문제

Right now we have a database table (SQL Server 2008 R2) that stores an uploaded file (PDF, DOC, TXT, etc.) in an image type column. A user uploads this file from an ASP.NET application. My project is to get a handle on the size at which this table is growing, and I've come up with a couple of questions along the way.

  1. On the database side, I've discovered the image column type is supposedly somewhat depreciated? Will I gain any benefits to switching over to varbinary(max), or should I say varbinary(5767168) because that is my file size cap, or might as well I just let it stay as an image type as far as space-efficiency is concerned?

  2. On the application side, I want to compress the byte array. Microsoft's built in GZip sometimes made the file bigger instead of smaller. I switched over to SharpZipLib, which is much better, but I still occasionally run into the same problem. Is there a way to find out the average file compression savings before I implement it on a wide scale? I'm having a hard time finding out what the underlying algorithm is that they use.

  3. Would it be worth writing a Huffman code algorithm of my own, or will that present the same problem where there is occasionally a larger compressed file than original file?

For reference, in case it matters, here's the code in my app:

    using ICSharpCode.SharpZipLib.GZip;

    private static byte[] Compress(byte[] data)
    {
        MemoryStream output = new MemoryStream();

        using (GZipOutputStream gzip = new GZipOutputStream(output))
        {
            gzip.IsStreamOwner = false;
            gzip.Write(data, 0, data.Length);
            gzip.Close();
        }
        return output.ToArray();
    }

    private static byte[] Decompress(byte[] data)
    {
        MemoryStream output = new MemoryStream();
        MemoryStream input = new MemoryStream();
        input.Write(data, 0, data.Length);
        input.Position = 0;

        using (GZipInputStream gzip = new GZipInputStream(input))
        {
            byte[] buff = new byte[64];
            int read = gzip.Read(buff, 0, buff.Length);

            while (read > 0)
            {
                output.Write(buff, 0, read);
                read = gzip.Read(buff, 0, buff.Length);
            }

            gzip.Close();
        }
        return output.ToArray();
    }

Thanks in advance for any help. :)

도움이 되었습니까?

해결책 2

I hate to be a jerk and answer my own question, but I thought I'd summarize my findings into a complete answer for anyone else looking to space-efficiently store file/image data within a database:

* Using varbinary(MAX) versus Image?

Many reasons for using varbinary(MAX), but top among them is that Image is deprecated and in a future version of SQL it will be removed altogether. Not starting any new projects with it is just nipping a future problem in the bud.

According to the info in this question: SQL Server table structure for storing a large number of images, varbinary(MAX) has more operations available to be used on it.

Varbinary(MAX) is easy to stream from a .NET application by using an SQL Parameter. Negative one is for 'MAX' length. Like so:

SQLCommand1.Parameters.Add("@binaryValue", SqlDbType.VarBinary, -1).Value = compressedBytes;

* What compression algorithm to use?

I'm really not much closer to a decent answer on this one. I used ICSharpCode.SharpZipLib.Gzip and found it had better performance than the built in zipping functions simply by running it on a bunch of stuff and comparing it.

My results:

I reduced my total file size by about 20%. Unfortunately, a lot of the files I had were PDFs which don't compress that well, but there was still some benefit. Not much luck (obviously) with file types that were already compressed.

다른 팁

that's not a byte array, that's a BLOB. 10 years ago, you would have used the IMAGE datatype.

these days, it's more efficient to use VARBINARY(MAX) I really reccomend that people use FILESTREAM for VarBinary(Max) as it makes backing up the database (without the blobs) quite easy.

Keep in mind that using the native formats (without compression) will allow full text searches.. Which is pretty incredible if you think about it. You have to install some iFilter from Adobe for searching inside PDF.. but it's a killer feature, I can't live without it.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top