Question

I've got a bit of code I've been using for a while to fetch data from a web server and a few months ago, I added compression support which seems to be working well for "regular" HTTP responses where the whole document is contained in the response. It does not seem to be working when I use a Range header, though.

Here is the code doing the real work:

    InputStream in = null;

    int bufferSize = 4096;

    int responseCode = conn.getResponseCode();

    boolean error = 5 == responseCode / 100
        || 4 == responseCode / 100;

    int bytesRead = 0;

    try
    {
        if(error)
            in = conn.getErrorStream();
        else
            in = conn.getInputStream();

        // Buffer the input
        in = new BufferedInputStream(in);

        // Handle compressed responses
        if("gzip".equalsIgnoreCase(conn.getHeaderField("Content-Encoding")))
            in = new GZIPInputStream(in);
        else if("deflate".equalsIgnoreCase(conn.getHeaderField("Content-Encoding")))
            in = new InflaterInputStream(in, new Inflater(true));

        int n;
        byte[] buffer = new byte[bufferSize];

        // Now, just write out all the bytes
        while(-1 != (n = in.read(buffer)))
        {
            bytesRead += n;
            out.write(buffer, 0, n);
        }
    }
    catch (IOException ioe)
    {
        System.err.println("Got IOException after reading " + bytesRead + " bytes");
        throw ioe;
    }
    finally
    {
        if(null != in) try { in.close(); }
        catch (IOException ioe)
        {
            System.err.println("Could not close InputStream");
            ioe.printStackTrace();
        }
    }

Hitting a URL with the header Accept-Encoding: gzip,deflate,identity works just great: I can see that the data is returned by the server in compressed format, and the above code decompressed it nicely.

If I then add a Range: bytes=0-50 header, I get the following exception:

Got IOException after reading 0 bytes
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at [my code]([my code]:511)

Line 511 in my code is the line containing the in.read() call. The response includes the following headers:

Content-Type: text/html
Content-Encoding: gzip
Content-Range: bytes 0-50/751
Content-Length: 51

I have verified that, if I don't attempt to decompress the response, I actually get 51 bytes in the response... it's not a server failure (at least that I can tell). My server (Apache httpd) does not support "deflate", so I can't test another compression scheme (at least not right now).

I've also tried to request much more data (like 700 bytes of the total 751 bytes in the target resource) and I get the same kind of error.

Is there something I'm missing?

Update Sorry, I forgot to include that I'm hitting Apache/2.2.22 on Linux. There aren't any server bugs I'm aware of. I'll have a bit of trouble verifying the compressed bytes that I retrieve from the server, as the "gzip" Content-Encoding is quite bare... e.g. I believe I can't just use "gunzip" on the command-line to decompress those bytes. I'll give it a try, though.

Was it helpful?

Solution 2

Sigh switching to another server (happens to be running Apache/2.2.25) shows that my code does in fact work. The original target server appears to be affected by AWS's current outage in the US-EAST availability zone. I'm going to chalk this up to network errors and close this question. Thanks to those who offered suggestions.

OTHER TIPS

You can use 'gunzip' to decompress it, just keep in mind that the first 50 bytes probably aren't enough for gzip to decompress anything (headers, dictionaries etc). Try this: wget -O- -q <URL> | head -c 50 | zcat with your URL to see whether normal gzip works where your code fails.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top