Question

I am writing a webservice in Django to handle image/video streams, but it's mostly done in an external program. For instance:

  1. client requests for /1.jpg?size=300x200
  2. python code parse 300x200 in django (or other WSGI app)
  3. python calls convert (part of Imagemagick) using subprocess module, with parameter 300x200
  4. convert reads 1.jpg from local disk, convert to size accordingly
  5. Writing to a temp file
  6. Django builds HttpResponse() and read the whole temp file content as body

As you can see, the whole temp file read-then-write process is inefficient. I need a generic way to handle similar external programs like this, not only convert, but others as well like cjpeg, ffmepg, etc. or even proprietary binaries.

I want to implement it in this way:

  1. python gets the stdout fd of the convert child process
  2. chain it to WSGI socket fd for output

I've done my homework, Google says this kind of zero-copy could be done with system call splice(). but it's not available in Python. So how to maximize performance in Python for these kind of scenario?

  1. Call splice() using ctypes?
  2. hack memoryview() or buffer() ?
  3. subprocess has stdout which has readinto(), could this be utilized somehow?
  4. How could we get fd number for any WSGI app?

I am kinda newbie to these, any suggestion is appreciated, thanks!

Was it helpful?

Solution 2

I find that WSGI could actually handle an fd as an interator response

Example WSGI app:

def image_app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'image/jpeg'), ('Connection', 'Close')])
    proc = subprocess.Popen([
        'convert',
        '1.jpg',
        '-thumbnail', '200x150',
        '-', //to stdout
    ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return proc.stdout

It wrapps the stdout as http response via a pipe

OTHER TIPS

If the goal is to increase performance, you ought to examine the bottlenecks on a case-by-case basis, rather than taking a "one solution fits all" approach.

For the convert case, assuming the images aren't insanely large, the bottleneck there will most likely be spawning a subprocess for each request.

I'd suggest avoiding creating a subprocess and a temporary file, and do the whole thing in the Django process using PIL with something like this...

import os
from PIL import Image
from django.http import HttpResponse

IMAGE_ROOT = '/path/to/images'

# A Django view which returns a resized image
# Example parameters: image_filename='1.jpg', width=300, height=200
def resized_image_view(request, image_filename, width, height):
    full_path = os.path.join(IMAGE_ROOT, image_filename)
    source_image = Image.open(full_path)
    resized_image = source_image.resize((width, height))
    response = HttpResponse(content_type='image/jpeg')
    resized_image.save(response, 'JPEG')
    return response

You should be able to get results identical to ImageMagick by using the correct scaling algorithm, which, in general is ANTIALIAS for cases where the rescaled image is less than 50% of the size of the original, and BICUBIC in all other cases.

For the case of videos, if you're returning a transcoded video stream, the bottleneck will likely be either CPU-time, or network bandwidth.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top