This is a problem with the way GzipStream handles gzip files with multiple gzip entries. It reads the first entry, and treats all succeeding entries as garbage (interestingly, utilities like gzip and winzip handle it correctly by extracting them all into one file).There are a couple of workarounds, or you can use a third-party utility like DotNetZip (http://dotnetzip.codeplex.com/).
Perhaps the easiest is to scan the file for all of the gzip headers, and then manually moving the stream to each one and decompressing the content. This can be done by looking for the ID1, ID2, and 0x8 in the raw file bytes (Deflate compression method, see the specification: http://www.gzip.org/zlib/rfc-gzip.html). This isn't always enough to guarantee that you're looking at a gzip header, so you would want to read the rest of the header (or at least the first ten bytes) in to verify:
const int Id1 = 0x1F;
const int Id2 = 0x8B;
const int DeflateCompression = 0x8;
const int GzipFooterLength = 8;
const int MaxGzipFlag = 32;
/// <summary>
/// Returns true if the stream could be a valid gzip header at the current position.
/// </summary>
/// <param name="stream">The stream to check.</param>
/// <returns>Returns true if the stream could be a valid gzip header at the current position.</returns>
public static bool IsHeaderCandidate(Stream stream)
{
// Read the first ten bytes of the stream
byte[] header = new byte[10];
int bytesRead = stream.Read(header, 0, header.Length);
stream.Seek(-bytesRead, SeekOrigin.Current);
if (bytesRead < header.Length)
{
return false;
}
// Check the id tokens and compression algorithm
if (header[0] != Id1 || header[1] != Id2 || header[2] != DeflateCompression)
{
return false;
}
// Extract the GZIP flags, of which only 5 are allowed (2 pow. 5 = 32)
if (header[3] > MaxGzipFlag)
{
return false;
}
// Check the extra compression flags, which is either 2 or 4 with the Deflate algorithm
if (header[8] != 0x0 && header[8] != 0x2 && header[8] != 0x4)
{
return false;
}
return true;
}
Note that GzipStream might move the stream to the end of the file if you use the file stream directly. You may want to read each part into a MemoryStream and then decompress each part individually in memory.
An alternate approach would be to modify the gzip headers to specify the length of the content so that you don't have to scan the file for headers (you could programmatically determine the offset of each), which would require diving a bit deeper into the gzip spec.