Pergunta

I'm compressing a log file as data is written to it, something like:

using (var fs = new FileStream("Test.gz", FileMode.Create, FileAccess.Write, FileShare.None))
{
  using (var compress = new GZipStream(fs, CompressionMode.Compress))
  {
    for (int i = 0; i < 1000000; i++)
    {
      // Clearly this isn't what is happening in production, just 
      // a simply example
      byte[] message = RandomBytes();
      compress.Write(message, 0, message.Length);

      // Flush to disk (in production we will do this every x lines, 
      // or x milliseconds, whichever comes first)
      if (i % 20 == 0)
      {
        compress.Flush();
      }
    }
  }
}

What I want to ensure is that if the process crashes or is killed, the archive is still valid and readable. I had hoped that anything since the last flush would be safe, but instead I am just ending up with a corrupt archive.

Is there any way to ensure I end up with a readable archive after each flush?

Note: it isn't essential that we use GZipStream, if something else will give us the desired result.

Foi útil?

Solução

An option is to let Windows handle the compression. Just enable compression on the folder where you're storing your log files. There are some performance considerations you should be aware of when copying the compressed files, and I don't know how well NT compression performs in comparision to GZipStream or other compression options. You'll probably want to compare compression ratios and CPU load.

There's also the option of opening a compressed file, if you don't want to enable compression on the entire folder. I haven't tried this, but you might want to look into it: http://social.msdn.microsoft.com/forums/en-US/netfxbcl/thread/1b63b4a4-b197-4286-8f3f-af2498e3afe5

Outras dicas

Good news: GZip is a streaming format. Therefore corruption at the end of the stream cannot affect the beginning which was already written.

So even if your streaming writes are interrupted at an arbitrary point, most of the stream is still good. You can write yourself a little tool that reads from it and just stops at the first exception it sees.

If you want an error-free solution I'd recommend splitting the log into one file every x seconds (maybe x = 1 or 10?). Write into a file with extensions ".gz.tmp" and rename to ".gz" after the file was completely written and closed.

Yes, but it's more involved than just flushing. Take a look at gzlog.h and gzlog.c in the zlib distribution. It does exactly what you want, efficiently adding short log entries to a gzip file, and always leaving a valid gzip file behind. It also has protection against crashes or shutdowns during the process, still leaving a valid gzip file behind and not losing any log entries.

I recommend not using GZIPStream. It is buggy and does not provide the necessary functionality. Use DotNetZip instead as your interface to zlib.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top