Question

I'm developing a web app in Python where one of the use-cases is for a user to:

  • Upload a large file over HTTP POST and

  • Simultaneously download and display a response, which is a processed version of the file of a similar size.

The client is developed by us in C++ but I would like to use HTTP. The server doesn't need the whole file to begin to generate its response, it can start processing the data once the first 250KBs or so has arrived. The latency between the upload start and the first pieces of the response should be as low as possible (for example within 100ms of what you might reach with raw sockets for example)

Presumably it would be ideal to use chunked transfer encoding rather than multiple small HTTP requests? The length of the total request/response can't be known ahead of time but I suppose it could be split into multiple requests/responses of known size, is there a web server that would happily stream (rather than buffer + deliver) those chunks as they're being uploaded?

I've heard twisted is good with chunked transfer encoding but I'd prefer to use a more conventional web framework if possible, especially for the rest of my application (which, outside of this use-case doesn't need anything fancy like this).

Was it helpful?

Solution

WSGI supports this, I believe. Here we'll echo whatever they send us:

def application(environ, start_response):
    content_type = environ.get('CONTENT_TYPE', 'text/plain')
    headers = [('Content-Type', content_type)]
    if 'CONTENT_LENGTH' in environ:
        headers.append(('Content-Length', environ['CONTENT_LENGTH']))
    start_response('200 OK', headers)
    input = environ.get('wsgi.input')
    if input is None:
        yield ''
        return
    while True:
        datum = input.read(4096)  # or so
        if not datum:
            return
        yield datum

Web servers may elect to use each yield as a Transfer-Encoding: chunked chunk, though they are not required to.

OTHER TIPS

Have a look at: https://github.com/jakobadam/plupload-backends which has a Python WSGI implementation for plupload.

It works by (IIRC) by combing multiple large requests into one file which may or may not be using chunked transfer encoding.

I like web.py for simplicity. It can do chunked-transfer encoding.

http://webpy.org/cookbook/streaming_large_files

But...

That will only work for feeding the response in multiple parts. If you want to try to "stream" your data up to the server from your client you'll need you client to do multiple smaller POSTs. And you can't handle multiple responses from different POSTs via a single response... Possibly obviously?

This is a harder problem to solve than it may appear to be at first, but I'd still suggest building a ReSTful interface using web.py or some similar light weight framework.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top