Reading RestResponse in Chunks

https://stackoverflow.com//questions/25006227

20-12-2019
|

Question

To avoid MemoryError's in Python, I am trying to read in chunks. Been searching for half a day on how to read chunks form a RestResponse but to no avail.

The source is a file-like object using the Dropbox SDK for python.

Here's my attempt:

import dropbox
from filechunkio import FileChunkIO
import math

file_and_metadata = dropbox_client.metadata(path)

hq_file = dropbox_client.get_file(file_and_metadata['path'])

source_size = file_and_metadata['bytes']
chunk_size = 4194304
chunk_count = int(math.ceil(source_size / chunk_size))
for i in range(chunk_count + 1):
    offset = chunk_size * i
    bytes = min(chunk_size, source_size - offset)
    with FileChunkIO(hq_file, 'r', offset=offset,
                 bytes=bytes) as fp:
        with open('tmp/testtest123.mp4', 'wb') as f:
            f.write(fp)
            f.flush()

This results in "TypeError: coercing to Unicode: need string or buffer, RESTResponse found"

Any clues or solutions would be greatly appreciated.

Solution

Without knowing anything about FileChunkIO, or even knowing where your code is raising an exception, it's hard to be sure, but my guess is that it needs a real file-like object. Or maybe it does something silly, like checking the type so it can decide whether you're looking to chunk up a string or chunk up a file.

Anyway, according to the docs, RESTResponse isn't a full file-like object, but it implements read and close. And you can easily chunk something that implements read without any fancy wrappers. File-like objects' read methods are guaranteed to return b'' when you get to EOF, and can return fewer bytes than you asked for, so you don't need to guess how many times you need to read and do a short read at the end. Just do this:

chunk_size = 4194304
with open('tmp/testtest123.mp4', 'wb') as f:
    while True:
        buf = hq_file.read(chunk_size)
        if not buf:
            break
        f.write(buf)

(Notice that I moved the open outside of the loop. Otherwise, for each chunk, you're going to open and empty out the file, then write the next chunk, so at the end you'll end up with just the last one.)

If you want a chunking wrapper, there's a perfectly good builtin function, iter, that can do it for you:

chunk_size = 4194304
chunks = iter(lambda: hq_file.read(chunk_size), '')
with open('tmp/testtest123.mp4', 'wb') as f:
    f.writelines(chunks)

_{Note that the exact same code works in Python 3.x if you change that '' to b'', but that breaks Python 2.5.}

_{This might be a bit of an abuse of writelines, because we're writing an iterable of strings that aren't actually lines. If you don't like it, an explicit loop is just as simple and not much less concise.}

_{I usually write that as partial(hq_file.read, chunk_size) rather than lambda: hq_file.read(chunk_size_, but it's really a matter of preference; read the docs on partial and you should be able to understand why they ultimately have the same effect, and decide which one you prefer.}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow