Threaded vs. asynchronous image processing?

https://stackoverflow.com/questions/4844166

27-10-2019
|

문제

I have a Python function which generates an image once it is accessed. I can either invoke it directly upon a HTTP request, or do it asynchronously using Gearman. There are a lot of requests.

Which way is better:

Inline - create an image inline, will result in many images being generated at once
Asynchronous - queue jobs (with Gearman) and generate images in a worker

Which option is better?

In this case "better" would mean the best speed / load combinations. The image generation example is symbolical, as this can also be applied to Database connections and other things.

해결책

I have a Python function which generates an image once it is accessed. I can either invoke it directly upon a HTTP request, or do it asynchronously using Gearman. There are a lot of requests.

You should not do it inside you request because then you can't throttle(your server could get overloaded). All big sites use a message queue to do the processing offline.

Which option is better?

In this case "better" would mean the best speed / load combinations. The image generation example is symbolical, as this can also be applied to Database connections and other things.

You should do it asynchronous because the most compelling reason to do it besides it speeds up your website is that you can throttle your queue if you are on high load. You could first execute the tasks with the highest priority.

I believe forking processes is expensive. I would create a couple worker processes(maybe do a little threading inside process) to handle the load. I would probably use redis because it is fast, actively developed(antirez/pietern commits almost everyday) and has a very good/stable python client library. blpop/rpush could be used to simulate a queue(job)

다른 팁

If your program is CPU bound in the interpreter then spawning multiple threads will actually slow down the result even if there are enough processors to run them all. This happens because the GIL (global interpreter lock) only allows one thread to run in the interpreter at a time.

If most of the work happens in a C library it's likely the lock is not held and you can productively use multiple threads.

If you are spawning threads yourself you'll need to make sure to not create too many - 10K threads at one would be bad news - so you'd need to setup a work queue that the threads read from instead of just spawning them in a loop.

If I was doing this I'd just use the standard multiprocessing module.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow