What is the best way to decompress a gzip'ed server response in Python 3?
-
23-08-2019 - |
Question
I had expected this to work:
>>> import urllib.request as r
>>> import zlib
>>> r.urlopen( r.Request("http://google.com/search?q=foo", headers={"User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", "Accept-Encoding": "gzip"}) ).read()
b'af0\r\n\x1f\x8b\x08...(long binary string)'
>>> zlib.decompress(_)
Traceback (most recent call last):
File "<pyshell#87>", line 1, in <module>
zlib.decompress(x)
zlib.error: Error -3 while decompressing data: incorrect header check
But it doesn't. Dive Into Python uses StringIO in this example, but that seems to be missing from Python 3. What's the right way of doing it?
Solution
It works fine with gzip
(gzip and zlib are the same compression but with different headers/"wrapping". Your error has this information in the message).
import gzip
import urllib.request
request = urllib.request.Request(
"http://google.com/search?q=foo",
headers={
"Accept-Encoding": "gzip",
"User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11",
})
response = urllib.request.urlopen(request)
gzipFile = gzip.GzipFile(fileobj=response)
gzipFile.read()
OTHER TIPS
In Python 3, StringIO
is a class in the io
module.
So for the example you linked to, if you change:
import StringIO
compressedstream = StringIO.StringIO(compresseddata)
to:
import io
compressedstream = io.StringIO(compresseddata)
it ought to work.
For anyone using Python 3.2 or later, there is an even simpler way to decompress a response than any of the answers here:
import gzip
import urllib.request
request = urllib.request.Request(
"http://example.com/",
headers={"Accept-Encoding": "gzip"})
response = urllib.request.urlopen(request)
result = gzip.decompress(response.read())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow