memcache.get returns wrong object (Celery, Django)

Question 1

This has been bugging me for a while until I found this question and answer. I just want to add some things I've learnt.

You can easily reproduce this problem with a local memcached instance:

from django.core.cache import cache
import os

def write_read_test():
    pid = os.getpid()
    cache.set(pid, pid)
    for x in range(5):
        value = cache.get(pid)
        if value != pid:
            print "Unexpected response {} in process {}. Attempt {}/5".format(
                    value, pid, x+1)
    os._exit(0)

cache.set("access cache", "before fork")
for x in range(5):
    if os.fork() == 0:
        write_read_test()

What you can do is close the cache client as Django does in the request_finished signal:

https://github.com/django/django/blob/master/django/core/cache/init.py#L128

If you put a cache.close() after the fork, everything works as expected.

For celery you could connect to a signal that is fired after the worker is forked and execute cache.close().

This also affects gunicorn when preload is active and the cache is initialized before forking the workers.

For gunicorn, you could use post_fork in your gunicorn configuration:

def post_fork(server, worker):
    from django.core.cache import cache
    cache.close()

Question 2

Solved it finally:

Celery has dynamic scaling feature- it's capable to add/kill workers according to load
It does it via forking existing one
Opened sockets and files are copied to the forked process, so both processes share them, which leads to race condition, when one process reads response of another one. Simply, it's possible that one process reads response intended for second one, and vise-versa.
from django.core.cache import cache this object stores pre-connected memcached socket. Don't use it when your process could be dynamically forked.. and don't use stored connections, pools and other.
OR store them under current PID, and check it each time you're accessing cache