Question

I have a Pylons web application served by Apache (mod_wsgi, prefork). Because of Apache, there are multiple separate processes running my application code concurrently. Some of the non-critical tasks that the application does I want to defer for processing in background to improve "live" response times. So I'm thinking of task queue, many Apache processes adding tasks to this queue, a single separate Python process processing them one-by-one and removing from queue.

The queue should preferably be persisted to disk so queued unprocessed tasks are not lost because of power outage, server restart etc. The question is what would be a reasonable way to implement such queue?

As for the things I've tried: I started with simple SQLite database and single table in it for storing queue items. In load testing, when increasing level of concurrency, I started getting "database locked" errors, as expected. The quick'n'dirty fix was to replace SQLite with MySQL--it handles concurrency issues well but feels like an overkill for the simple thing I need to do. Queue-related DB operations also show up prominently in my profiling reports.

Was it helpful?

Solution

A message broker like Apache's ActiveMQ is an ideal solution here.

The pipeline could be following:

  • Application process that is responsible for handling HTTP requests generates replies quickly and sends low-priority, heavy tasks to AMQ queue.
  • One or more another processes are subscribed to consume AMQ queue and do what is intended to do with these heavy tasks.

The requirement of queue persistence is fulfilled out of the box since ActiveMQ stores messages that are not yet consumed in persistent storage. Furthermore it scales quite well since you're free to deploy multiple HTTP-apps, multiple consumer apps and AMQ itself on different machines each.

We use something like this in our project written in Python utilizing STOMP as underlying communication protocol.

OTHER TIPS

A web server (any web server) is multi-producer, single-consumer process.

A simple solution is to build a wsgiref or Werkzeug backend server to handle your backend requests.

Since this "backend" server is build using WSGI technology, it's very, very similar to the front-end web server. Except. It doesn't produce HTML responses (JSON is usually simpler). Other than that, it's very straightforward.

You design RESTful transactions for this backend. You use all of the various WSGI features for URI parsing, authorization, authentication, etc. You -- generally -- don't need session management, since RESTful servers don't usually offer sessions.

If you get into serious scalability issues, you simply wrap your backend server in lighttpd or some other web engine to create a multi-threaded backend.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top