Question

I had expected this to work:

>>> import urllib.request as r
>>> import zlib
>>> r.urlopen( r.Request("http://google.com/search?q=foo", headers={"User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", "Accept-Encoding": "gzip"}) ).read()
b'af0\r\n\x1f\x8b\x08...(long binary string)'
>>> zlib.decompress(_)
Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    zlib.decompress(x)
zlib.error: Error -3 while decompressing data: incorrect header check

But it doesn't. Dive Into Python uses StringIO in this example, but that seems to be missing from Python 3. What's the right way of doing it?

Was it helpful?

Solution

It works fine with gzip (gzip and zlib are the same compression but with different headers/"wrapping". Your error has this information in the message).

import gzip
import urllib.request

request = urllib.request.Request(
    "http://google.com/search?q=foo",
    headers={
        "Accept-Encoding": "gzip",
        "User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", 
    })
response = urllib.request.urlopen(request)
gzipFile = gzip.GzipFile(fileobj=response)
gzipFile.read()

OTHER TIPS

In Python 3, StringIO is a class in the io module.

So for the example you linked to, if you change:

import StringIO
compressedstream = StringIO.StringIO(compresseddata)

to:

import io
compressedstream = io.StringIO(compresseddata)

it ought to work.

For anyone using Python 3.2 or later, there is an even simpler way to decompress a response than any of the answers here:

import gzip
import urllib.request

request = urllib.request.Request(
    "http://example.com/",
    headers={"Accept-Encoding": "gzip"})
response = urllib.request.urlopen(request)
result = gzip.decompress(response.read())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top