Question

Suppose that I've written a wsgi application. I run this application on Apache2 on Linux with multi-threaded mod-wsgi configuration, so that my application is run in many threads per single process:

WSGIDaemonProcess mysite processes=3 threads=2 display-name=mod_wsgi
WSGIProcessGroup mysite
WSGIScriptAlias / /some/path/wsgi.py

The application code is:

def application(environ, start_response):
    from foo import racer
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [racer()] #call to racer creates a race condition?

module foo.py:

a = 1
def racer():
    global a
    a = a + 1
    return str(a)

Did I just create a race condition with variable a? I guess, a is a module-level variable, that exists in foo.py and is the same (shared) among threads?

More theoretical questions derived from this:

  1. Concurrent threads within the same process access and modify the same a variable so my example is not thread-safe?
  2. If my web-server is Apache, each thread of my application on Linux is created on C-level with pthreads API and the function, which the pthread must execute is some kind of python interpreter's main function? Or does Apache protect me somehow from this error?
  3. What if I were running this on a python-written web-server like Tornado's HTTPServer? Web server, written in python, implements threads as python-level threading.Thread objects, and runs application function in each thread. So, I suppose it's a race condition? (I also suppose, in this case I can abstract from underlying C-level pthreads below threading.Thread implementation and worry only about python functions, because the interpreter won't allow me to modify C-level shared data and screw its functioning. So the only way to break thread-safety for me is to deal with global variables? Is that right?)
Was it helpful?

Solution

Yes, you have a race condition there, but it's not related to the imports. The global state in foo.a is subject to a data race between a + 1 and a = ...; since two threads can see the same value for a, and thus compute the same successor.

The import machinery itself does protect against duplicate imports by multiple threads, by means of a process wide lock (see imp.lock_held()). Although this could, in theory, lead to a deadlock, this almost never happens, because few python modules lock other resources at import time.

This also suggests that it's probably safe to modify sys.path at will; since this usually happens only at import time (for the purpose of additional imports), and so that thread is already holds the import lock, other threads cannot cause imports that would also modify that state.

Fixing the race in racer() is quite easy, though:

import threading
a = 1
a_lock = threading.Lock()

def racer():
    global a
    with a_lock:
        my_a = a = a + 1
    return str(my_a)

which will be needed for any global, mutable state in your control.

OTHER TIPS

Read the mod_wsgi documentation about the various processes/thread configurations and in particular what it says about data sharing.

In particular it says:

Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top