Question

I am trying to create python intellectual proxy-server that should be able for streaming large request body content from client to the some internal storages (that may be amazon s3, swift, ftp or something like this). Before streaming server should requests some internal API server that determines parameters for uploading to internal storages. The main restriction is that it should be done in one HTTP operation with method PUT. Also it should work asynchronously because there will be a lot of file uploads.

What solution allows me to read chunks from upload content and starts streaming this chunks to internal storages befor user will have uploaded whole file? All python web applications that I know wait for a whole content will be received before give management to the wsgi applications/python web server.

One of the solutions that I found is tornado fork https://github.com/nephics/tornado . But it is unofficial and tornado developers don't hurry to include it into the main branch. So may be you know some existing solutions for my problem? Tornado? Twisted? gevents?

Was it helpful?

Solution 2

It seems I have a solution using gevent library and monkey patching:

from gevent.monkey import patch_all
patch_all()
from gevent.pywsgi import WSGIServer


def stream_to_internal_storage(data):
    pass


def simple_app(environ, start_response):
    bytes_to_read = 1024

    while True:
        readbuffer = environ["wsgi.input"].read(bytes_to_read)
        if not len(readbuffer) > 0:
            break
        stream_to_internal_storage(readbuffer)

    start_response("200 OK", [("Content-type", "text/html")])
    return ["hello world"]


def run():
    config = {'host': '127.0.0.1', 'port': 45000}

    server = WSGIServer((config['host'], config['port']), application=simple_app)
    server.serve_forever()


if __name__ == '__main__':
    run()

It works well when I try to upload huge file:

curl -i -X PUT --progress-bar --verbose --data-binary @/path/to/huge/file "http://127.0.0.1:45000"

OTHER TIPS

Here's an example of a server that does streaming upload handling written with Twisted:

from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString

from twisted.web.server import Request, Site
from twisted.web.resource import Resource

from twisted.application.service import Application
from twisted.application.internet import StreamServerEndpointService

# Define a Resource class that doesn't really care what requests are made of it.
# This simplifies things since it lets us mostly ignore Twisted Web's resource
# traversal features.
class StubResource(Resource):
    isLeaf = True

    def render(self, request):
        return b""

class StreamingRequestHandler(Request):
    def handleContentChunk(self, chunk):
        # `chunk` is part of the request body.
        # This method is called as the chunks are received.
        Request.handleContentChunk(self, chunk)
        # Unfortunately you have to use a private attribute to learn where
        # the content is being sent.
        path = self.channel._path

        print "Server received %d more bytes for %s" % (len(chunk), path)

class StreamingSite(Site):
    requestFactory = StreamingRequestHandler

application = Application("Streaming Upload Server")

factory = StreamingSite(StubResource())
endpoint = serverFromString(reactor, b"tcp:8080")
StreamServerEndpointService(endpoint, factory).setServiceParent(application)

This is a tac file (put it in streamingserver.tac and run twistd -ny streamingserver.tac).

Because of the need to use self.channel._path this isn't a completely supported approach. The API overall is pretty clunky as well so this is more an example that it's possible than that it's good. There has long been an intent to make this sort of thing easier (http://tm.tl/288) but it will probably be a long while yet before this is accomplished.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top