سؤال

I am using the Python Requests package to write a simple rest client. Here is my code -

r = requests.get(url, auth=(user, passwd), stream=True, verify=False)
print('headers: ')
pprint.pprint(r.headers)
print('status: ' + str(r.status_code))
print('text: ' + r.text)

Here is the output -

headers: 
    {'content-type': 'text/xml;charset=UTF-8',
     'date': 'Thu, 16 May 2013 03:26:06 GMT',
     'server': 'Apache-Coyote/1.1',
     'set-cookie': 'JSESSIONID=779FC39...5698; Path=/; Secure; HttpOnly',
     'transfer-encoding': 'chunked'}
status: 200

Traceback (most recent call last):
  File "C:\...\client.py", line 617, in _readinto_chunked
    chunk_left = self._read_next_chunk_size()
  File "C:\...\client.py", line 562, in _read_next_chunk_size
    return int(line, 16)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte

The response to that request is XML. Looks like it is chunked. Is there a special way to read chunked response? I would like to put the entire XML response in one string.

هل كانت مفيدة؟

المحلول

You only ever use stream=True when you're planning on iterating over the content of the response. If you are planning to print the response content immediately then stream=True will not give you any performance benefits. It will only ever defer loading the content into memory until you call r.text or r.content and then it will be loaded into memory. If you want to prevent loading the entire content into memory, check below. For the other issue, try this:

print('text:')
print(r.text)

or

print('text: ' + r.content)

If you're on 2.x, r.text is a unicode object which may not be able to be transformed to ASCII.

I'm not quite sure why chunked responses wouldn't work without stream=True, but the only way to use it properly (without downloading all of it at once like you do with r.content or r.text) is to use either iter_content or iter_lines. To collect all of the response content into one string, you can do the following:

contents = ''.join(r.iter_content(224))  # stole the number from your comment

On a related note: using the decode method on the returned string will provide highly inconsistent results. If your API allows it send the Accept-Encoding header so you can always be sure to get back data you can decode.

You're not already doing that, so I didn't suggest it, but if you're insistent on print the information, then you're going to need it, especially if it is an API to a internationally popular website.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top