GZIP Java vs .NET

https://stackoverflow.com/questions/2737471

02-10-2019
|

Question

Using the following Java code to compress/decompress bytes[] to/from GZIP. First text bytes to gzip bytes:

public static byte[] fromByteToGByte(byte[] bytes) {
    ByteArrayOutputStream baos = null;
    try {
        ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
        baos = new ByteArrayOutputStream();
        GZIPOutputStream gzos = new GZIPOutputStream(baos);
        byte[] buffer = new byte[1024];
        int len;
        while((len = bais.read(buffer)) >= 0) {
            gzos.write(buffer, 0, len);
        }
        gzos.close();
        baos.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return(baos.toByteArray());
}

Then the method that goes the other way compressed bytes to uncompressed bytes:

public static byte[] fromGByteToByte(byte[] gbytes) {
    ByteArrayOutputStream baos = null;
    ByteArrayInputStream bais = new ByteArrayInputStream(gbytes);
    try {
        baos = new ByteArrayOutputStream();
        GZIPInputStream gzis = new GZIPInputStream(bais);
        byte[] bytes = new byte[1024];
        int len;
        while((len = gzis.read(bytes)) > 0) {
            baos.write(bytes, 0, len);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return(baos.toByteArray());
}

Think there is any effect since I'm not writing out to a gzip file?
Also I noticed that in the standard C# function that BitConverter reads the first four bytes and then the MemoryStream Write function is called with a start point of 4 and a length of input buffer length - 4. So is that effect the validity of the header?

Jim

Solution

I tryed it out, and I cant reproduce your 'Invalid GZip Header' issue. Here is what I did:

Java side

I took your Java compression method together with this java snippet:

public static String ToHexString(byte[] bytes){
    StringBuilder hexString = new StringBuilder();
        for (int i = 0; i < bytes.length; i++)
            hexString.append((i == 0 ? "" : "-") + 
                Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
    return hexString.toString();
}

So that this minimalistic java application, taking the bytes of a test string, compressing it, and converting it to a hex string of the compressed data...:

public static void main(String[] args){
    System.out.println(ToHexString(fromByteToGByte("asdf".getBytes())));
}

... outputs the following (I added annotations):

1f-8b-08-00-00-00-00-00-00-00-4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00
^------- GZip Header -------^ ^----------- Compressed data -----------^

C# side

I wrote two methods for compressing and uncompressing a byte array to another byte array (compression method is just for completeness, and my testings):

public static byte[] Compress(byte[] uncompressed)
{
    using (MemoryStream ms = new MemoryStream())
    using (GZipStream gzs = new GZipStream(ms, CompressionMode.Compress))
    {
        gzs.Write(uncompressed, 0, uncompressed.Length);
        gzs.Close();
        return ms.ToArray();
    }
}

public static byte[] Decompress(byte[] compressed)
{
    byte[] buffer = new byte[4096];
    using (MemoryStream ms = new MemoryStream(compressed))
    using (GZipStream gzs = new GZipStream(ms, CompressionMode.Decompress))
    using (MemoryStream uncompressed = new MemoryStream())
    {
        for (int r = -1; r != 0; r = gzs.Read(buffer, 0, buffer.Length))
            if (r > 0) uncompressed.Write(buffer, 0, r);
        return uncompressed.ToArray();
    }
}

Together with a small function that takes a hex string and turns it back to a byte array... (also just for testing purposes):

public static byte[] ToByteArray(string hexString)
{
    hexString = hexString.Replace("-", "");
    int NumberChars = hexString.Length;
    byte[] bytes = new byte[NumberChars / 2];
    for (int i = 0; i < NumberChars; i += 2)
        bytes[i / 2] = Convert.ToByte(hexString.Substring(i, 2), 16);
    return bytes;
}

... I did the following:

// Just hardcoded the output of the java program, convert it back to byte[]
byte[] fromjava = ToByteArray("1f-8b-08-00-00-00-00-00-00-00-" + 
                  "4b-2c-4e-49-03-00-bd-f3-29-51-04-00-00-00");

// Decompress it with my function above
byte[] uncompr = Decompress(fromjava);

// Get the string out of the byte[] and print it
Console.WriteLine(System.Text.ASCIIEncoding.ASCII
                    .GetString(uncompr, 0, uncompr.Length));

Et voila, the output is:

asdf

Works perfect for me. Maybe you should check your decompression method in your c# application.

You said in your previous question you are storing those byte arrays in a database, right? Maybe you want to check whether the bytes come back from the database the way you put them in.

OTHER TIPS

Posting this as an answer so the code looks decent. Note a couple things:
First, the round trip to the database did not appear to have any effect. Java on both sides produced exactly what I put in. Java in C# out worked fine with the Ionic API, as did C# in and Java out. Which brings me to the second point. Second, my original decompress was on the order of:

public static string Decompress(byte[] gzBuffer)
{
    using (MemoryStream ms = new MemoryStream())
    {
        int msgLength = BitConverter.ToInt32(gzBuffer, 0);
        ms.Write(gzBuffer, 4, gzBuffer.Length – 4);
        byte[] buffer = new byte[msgLength];
        ms.Position = 0;
        using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
        {
            zip.Read(buffer, 0, buffer.Length);
        }
        return Encoding.UTF8.GetString(buffer);
    } 
}

Which depended on the internal byte count, yours reads the whole file regardless of internal value. Don't know what the Ionic algorithm is. Yours works the same as the Java methods I've used. That's the only difference I see. Thanks very much for doing all that work. I will remember that way of doing it. Thanks, Jim

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow