Pergunta

I use the request-module of python 2.7 to post a bigger chunk of data to a service I can't change. Since the data is mostly text, it is large but would compress quite well. The server would accept gzip- or deflate-encoding, however I do not know how to instruct requests to do a POST and encode the data correctly automatically.

Is there a minimal example available, that shows how this is possible?

Foi útil?

Solução

# Works if backend supports gzip

additional_headers['content-encoding'] = 'gzip'
request_body = zlib.compress(json.dumps(post_data))
r = requests.post('http://post.example.url', data=request_body, headers=additional_headers)

Outras dicas

I've tested the solution proposed by Robᵩ with some modifications and it works.

PSEUDOCODE (sorry I've extrapolated it from my code so I had to cut out some parts and haven't tested, anyway you can get your idea)

additional_headers['content-encoding'] = 'gzip'
s = StringIO.StringIO()
g = gzip.GzipFile(fileobj=s, mode='w')
g.write(json_body)
g.close()
gzipped_body = s.getvalue()
request_body = gzipped_body

r = requests.post(endpoint_url, data=request_body, headers=additional_headers)

For python 3:

from io import BytesIO
import gzip

def zip_payload(payload: str) -> bytes:
    btsio = BytesIO()
    g = gzip.GzipFile(fileobj=btsio, mode='w')
    g.write(bytes(payload, 'utf8'))
    g.close()
    return btsio.getvalue()

headers = {
    'Content-Encoding': 'gzip'
}
zipped_payload = zip_payload(payload)
requests.post(url, zipped_payload, headers=headers)

I can't get this to work, but you might be able to insert the gzip data into a prepared request:

#UNPROVEN
r=requests.Request('POST', 'http://httpbin.org/post', data={"hello":"goodbye"})
p=r.prepare()
s=StringIO.StringIO()
g=gzip.GzipFile(fileobj=s,mode='w')
g.write(p.body)
g.close()
p.body=s.getvalue()
p.headers['content-encoding']='gzip'
p.headers['content-length'] = str(len(p.body))  # Not sure about this
r=requests.Session().send(p)

I needed my posts to be chunked, since I had several very large files being uploaded in parallel. Here is a solution I came up with.

import requests
import zlib

"""Generator that reads a file in chunks and compresses them"""
def chunked_read_and_compress(file_to_send, zlib_obj, chunk_size):
    compression_incomplete = True
    with open(file_to_send,'rb') as f:
        # The zlib might not give us any data back, so we have nothing to yield, just
        # run another loop until we get data to yield.
        while compression_incomplete:
            plain_data = f.read(chunk_size)
            if plain_data:
                compressed_data = zlib_obj.compress(plain_data)
            else:
                compressed_data = zlib_obj.flush()
                compression_incomplete = False
            if compressed_data:
                yield compressed_data

"""Post a file to a url that is content-encoded gzipped compressed and chunked (for large files)"""
def post_file_gzipped(url, file_to_send, chunk_size=5*1024*1024, compress_level=6, headers={}, requests_kwargs={}):
    headers_to_send = {'Content-Encoding': 'gzip'}
    headers_to_send.update(headers)
    zlib_obj = zlib.compressobj(compress_level, zlib.DEFLATED, 31)
    return requests.post(url, data=chunked_read_and_compress(file_to_send, zlib_obj, chunk_size), headers=headers_to_send, **requests_kwargs)

resp = post_file_gzipped('http://httpbin.org/post', 'somefile')
resp.raise_for_status()

The accepted answer is probably wrong due to incorrect or missing headers:

additional_headers['content-encoding'] = 'gzip'
request_body = zlib.compress(json.dumps(post_data))

Using the zlib module's compressobj method that provides the wbits argument to specify the header format should work. The default value is MAX_WBITS=15 which means zlib header format. This is correct for Content-Encoding: deflate. For the compress method this argument is not available and the documentation does not mention which header (if any) is used unfortunately.

For Content-Encoding: gzip wbits should be something between 16 + (9 to 15), so 16+zlib.MAX_WBITS would be a good choice.

I checked how urllib3 decodes the response for these two cases and it implements a try-and-error mechanism for deflate (it tries raw and zlib header formats). That could explain why some people had problems with the solution from the accepted answer which others didn't have.


tl;dr

gzip

additional_headers['Content-Encoding'] = 'gzip'
compress = zlib.compressobj(wbits=16+zlib.MAX_WBITS)
body = compress.compress(data) + compress.flush()

deflate

additional_headers['Content-Encoding'] = 'deflate'
compress = zlib.compressobj()
body = compress.compress(data) + compress.flush()
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top