Per-request cache in Django?

https://stackoverflow.com/questions/8759188

14-04-2021
|

Question

I would like to implement a decorator that provides per-request caching to any method, not just views. Here is an example use case.

I have a custom tag that determines if a record in a long list of records is a "favorite". In order to check if an item is a favorite, you have to query the database. Ideally, you would perform one query to get all the favorites, and then just check that cached list against each record.

One solution is to get all the favorites in the view, and then pass that set into the template, and then into each tag call.

Alternatively, the tag itself could perform the query itself, but only the first time it's called. Then the results could be cached for subsequent calls. The upside is that you can use this tag from any template, on any view, without alerting the view.

In the existing caching mechanism, you could just cache the result for 50ms, and assume that would correlate to the current request. I want to make that correlation reliable.

Here is an example of the tag I currently have.

@register.filter()
def is_favorite(record, request):

    if "get_favorites" in request.POST:
        favorites = request.POST["get_favorites"]
    else:

        favorites = get_favorites(request.user)

        post = request.POST.copy()
        post["get_favorites"] = favorites
        request.POST = post

    return record in favorites

Is there a way to get the current request object from Django, w/o passing it around? From a tag, I could just pass in request, which will always exist. But I would like to use this decorator from other functions.

Is there an existing implementation of a per-request cache?

Solution

Using a custom middleware you can get a Django cache instance guaranteed to be cleared for each request.

This is what I used in a project:

from threading import currentThread
from django.core.cache.backends.locmem import LocMemCache

_request_cache = {}
_installed_middleware = False

def get_request_cache():
    assert _installed_middleware, 'RequestCacheMiddleware not loaded'
    return _request_cache[currentThread()]

# LocMemCache is a threadsafe local memory cache
class RequestCache(LocMemCache):
    def __init__(self):
        name = 'locmemcache@%i' % hash(currentThread())
        params = dict()
        super(RequestCache, self).__init__(name, params)

class RequestCacheMiddleware(object):
    def __init__(self):
        global _installed_middleware
        _installed_middleware = True

    def process_request(self, request):
        cache = _request_cache.get(currentThread()) or RequestCache()
        _request_cache[currentThread()] = cache

        cache.clear()

To use the middleware register it in settings.py, e.g:

MIDDLEWARE_CLASSES = (
    ...
    'myapp.request_cache.RequestCacheMiddleware'
)

You may then use the cache as follows:

from myapp.request_cache import get_request_cache

cache = get_request_cache()

Refer to the django low level cache api doc for more information:

Django Low-Level Cache API

It should be easy to modify a memoize decorator to use the request cache. Have a look at the Python Decorator Library for a good example of a memoize decorator:

Python Decorator Library

OTHER TIPS

I came up with a hack for caching things straight into the request object (instead of using the standard cache, which will be tied to memcached, file, database, etc.)

# get the request object's dictionary (rather one of its methods' dictionary)
mycache = request.get_host.__dict__

# check whether we already have our value cached and return it
if mycache.get( 'c_category', False ):
    return mycache['c_category']
else:
    # get some object from the database (a category object in this case)
    c = Category.objects.get( id = cid )

    # cache the database object into a new key in the request object
    mycache['c_category'] = c

    return c

So, basically I am just storing the cached value (category object in this case) under a new key 'c_category' in the dictionary of the request. Or to be more precise, because we can't just create a key on the request object, I am adding the key to one of the methods of the request object - get_host().

Georgy.

Years later, a super hack to cache SELECT statements inside a single Django request. You need to execute the patch() method from early on in your request scope, like in a piece of middleware.

from threading import local
import itertools
from django.db.models.sql.constants import MULTI
from django.db.models.sql.compiler import SQLCompiler
from django.db.models.sql.datastructures import EmptyResultSet
from django.db.models.sql.constants import GET_ITERATOR_CHUNK_SIZE


_thread_locals = local()


def get_sql(compiler):
    ''' get a tuple of the SQL query and the arguments '''
    try:
        return compiler.as_sql()
    except EmptyResultSet:
        pass
    return ('', [])


def execute_sql_cache(self, result_type=MULTI):

    if hasattr(_thread_locals, 'query_cache'):

        sql = get_sql(self)  # ('SELECT * FROM ...', (50)) <= sql string, args tuple
        if sql[0][:6].upper() == 'SELECT':

            # uses the tuple of sql + args as the cache key
            if sql in _thread_locals.query_cache:
                return _thread_locals.query_cache[sql]

            result = self._execute_sql(result_type)
            if hasattr(result, 'next'):

                # only cache if this is not a full first page of a chunked set
                peek = result.next()
                result = list(itertools.chain([peek], result))

                if len(peek) == GET_ITERATOR_CHUNK_SIZE:
                    return result

            _thread_locals.query_cache[sql] = result

            return result

        else:
            # the database has been updated; throw away the cache
            _thread_locals.query_cache = {}

    return self._execute_sql(result_type)


def patch():
    ''' patch the django query runner to use our own method to execute sql '''
    _thread_locals.query_cache = {}
    if not hasattr(SQLCompiler, '_execute_sql'):
        SQLCompiler._execute_sql = SQLCompiler.execute_sql
        SQLCompiler.execute_sql = execute_sql_cache

The patch() method replaces the Django internal execute_sql method with a stand-in called execute_sql_cache. That method looks at the sql to be run, and if it's a select statement, it checks a thread-local cache first. Only if it's not found in the cache does it proceed to execute the SQL. On any other type of sql statement, it blows away the cache. There is some logic to not cache large result sets, meaning anything over 100 records. This is to preserve Django's lazy query set evaluation.

EDIT:

The eventual solution I came up with has been compiled into a PyPI package: https://pypi.org/project/django-request-cache/

EDIT 2016-06-15:

I discovered a significantly simpler solution to this problem, and kindof facepalmed for not realizing how easy this should have been from the start.

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache which stores its data cache as an instance attribute, rather than
    a global. It's designed to live only as long as the request object that RequestCacheMiddleware attaches it to.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want BaseCache.__init__() to run, we *don't*
        # want LocMemCache.__init__() to run, because that would store our caches in its globals.
        BaseCache.__init__(self, {})

        self._cache = {}
        self._expire_info = {}
        self._lock = RWLock()

class RequestCacheMiddleware(object):
    """
    Creates a fresh cache instance as request.cache. The cache instance lives only as long as request does.
    """

    def process_request(self, request):
        request.cache = RequestCache()

With this, you can use request.cache as a cache instance that lives only as long as the request does, and will be fully cleaned up by the garbage collector when the request is done.

If you need access to the request object from a context where it's not normally available, you can use one of the various implementations of a so-called "global request middleware" that can be found online.

** Initial answer: **

A major problem that no other solution here solves is the fact that LocMemCache leaks memory when you create and destroy several of them over the life of a single process. django.core.cache.backends.locmem defines several global dictionaries that hold references to every LocalMemCache instance's cache data, and those dictionaries are never emptied.

The following code solves this problem. It started as a combination of @href_'s answer and the cleaner logic used by the code linked in @squarelogic.hayden's comment, which I then refined further.

from uuid import uuid4
from threading import current_thread

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


# Global in-memory store of cache data. Keyed by name, to provides multiple
# named local memory caches.
_caches = {}
_expire_info = {}
_locks = {}


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache with a destructor, ensuring that creating
    and destroying RequestCache objects over and over doesn't leak memory.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want
        # BaseCache.__init__() to run, we *don't* want LocMemCache.__init__() to run.
        BaseCache.__init__(self, {})

        # Use a name that is guaranteed to be unique for each RequestCache instance.
        # This ensures that it will always be safe to call del _caches[self.name] in
        # the destructor, even when multiple threads are doing so at the same time.
        self.name = uuid4()
        self._cache = _caches.setdefault(self.name, {})
        self._expire_info = _expire_info.setdefault(self.name, {})
        self._lock = _locks.setdefault(self.name, RWLock())

    def __del__(self):
        del _caches[self.name]
        del _expire_info[self.name]
        del _locks[self.name]


class RequestCacheMiddleware(object):
    """
    Creates a cache instance that persists only for the duration of the current request.
    """

    _request_caches = {}

    def process_request(self, request):
        # The RequestCache object is keyed on the current thread because each request is
        # processed on a single thread, allowing us to retrieve the correct RequestCache
        # object in the other functions.
        self._request_caches[current_thread()] = RequestCache()

    def process_response(self, request, response):
        self.delete_cache()
        return response

    def process_exception(self, request, exception):
        self.delete_cache()

    @classmethod
    def get_cache(cls):
        """
        Retrieve the current request's cache.

        Returns None if RequestCacheMiddleware is not currently installed via 
        MIDDLEWARE_CLASSES, or if there is no active request.
        """
        return cls._request_caches.get(current_thread())

    @classmethod
    def clear_cache(cls):
        """
        Clear the current request's cache.
        """
        cache = cls.get_cache()
        if cache:
            cache.clear()

    @classmethod
    def delete_cache(cls):
        """
        Delete the current request's cache object to avoid leaking memory.
        """
        cache = cls._request_caches.pop(current_thread(), None)
        del cache

EDIT 2016-06-15: I discovered a significantly simpler solution to this problem, and kindof facepalmed for not realizing how easy this should have been from the start.

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache which stores its data cache as an instance attribute, rather than
    a global. It's designed to live only as long as the request object that RequestCacheMiddleware attaches it to.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want BaseCache.__init__() to run, we *don't*
        # want LocMemCache.__init__() to run, because that would store our caches in its globals.
        BaseCache.__init__(self, {})

        self._cache = {}
        self._expire_info = {}
        self._lock = RWLock()

class RequestCacheMiddleware(object):
    """
    Creates a fresh cache instance as request.cache. The cache instance lives only as long as request does.
    """

    def process_request(self, request):
        request.cache = RequestCache()

With this, you can use request.cache as a cache instance that lives only as long as the request does, and will be fully cleaned up by the garbage collector when the request is done.

This one uses a python dict as the cache (not the django's cache), and is dead simple and lightweight.

Whenever the thread is destroyed, it's cache will be too automatically.
Does not require any middleware, and the content is not pickled and depickled on every access, which is faster.
Tested and works with gevent's monkeypatching.

The same can be probably implemented with threadlocal storage. I am not aware of any downsides of this approach, feel free to add them in the comments.

from threading import currentThread
import weakref

_request_cache = weakref.WeakKeyDictionary()

def get_request_cache():
    return _request_cache.setdefault(currentThread(), {})

You can always do the caching manually.

    ...
    if "get_favorites" in request.POST:
        favorites = request.POST["get_favorites"]
    else:
        from django.core.cache import cache

        favorites = cache.get(request.user.username)
        if not favorites:
            favorites = get_favorites(request.user)
            cache.set(request.user.username, favorites, seconds)
    ...

Answer given by @href_ is great.

Just in case you want something shorter that could also potentially do the trick:

from django.utils.lru_cache import lru_cache

def cached_call(func, *args, **kwargs):
    """Very basic temporary cache, will cache results
    for average of 1.5 sec and no more then 3 sec"""
    return _cached_call(int(time.time() / 3), func, *args, **kwargs)


@lru_cache(maxsize=100)
def _cached_call(time, func, *args, **kwargs):
    return func(*args, **kwargs)

Then get favourites calling it like this:

favourites = cached_call(get_favourites, request.user)

This method makes use of lru cache and combining it with timestamp we make sure that cache doesn't hold anything for longer then few seconds. If you need to call costly function several times in short period of time this solves the problem.

It is not a perfect way to invalidate cache, because occasionally it will miss on very recent data: int(..2.99.. / 3) followed by int(..3.00..) / 3). Despite this drawback it still can be very effective in majority of hits.

Also as a bonus you can use it outside request/response cycles, for example celery tasks or management command jobs.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow