How do I create a bounded memoization decorator in Python?

https://stackoverflow.com/questions/9389307

29-10-2019
|

Вопрос

Obviously, a quick search yields a million implementations and flavors of the memoization decorator in Python. However, I am interested in a flavor that I haven't been able to find. I would like to have it such that the cache of stored values can be of a fixed capacity. When new elements are added, if the capacity is reached, then the oldest value is removed and is replaced with the newest value.

My concern is that, if I use memoization to store a great many elements, then the program will crash because of a lack of memory. (I don't know how well-placed this concern may be in practice.) If the cache were of a fixed size, then a memory error would not be an issue. And many problems that I work on change as the program executes so that initial cached values would look very different from later cached values (and would be much less likely to recur later). That's why I'd like the oldest stuff to be replaced by the newest stuff.

I found the OrderedDict class and an example showing how to subclass it to specify a maximum size. I'd like to use that as my cache, rather than a normal dict. The problem is, I need the memoize decorator to take a parameter called maxlen that defaults to None. If it is None, then the cache is boundless and operates as normal. Any other value is used as the size for the cache.

I want it to work like the following:

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

and

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Below is the code that I have so far, but I don't see how to pass the parameter into the decorator while making it work both "naked" and with a parameter.

import collections
import functools

class BoundedOrderedDict(collections.OrderedDict):
    def __init__(self, *args, **kwds):
        self.maxlen = kwds.pop("maxlen", None)
        collections.OrderedDict.__init__(self, *args, **kwds)
        self._checklen()

    def __setitem__(self, key, value):
        collections.OrderedDict.__setitem__(self, key, value)
        self._checklen()

    def _checklen(self):
        if self.maxlen is not None:
            while len(self) > self.maxlen:
                self.popitem(last=False)

def memoize(function):
    cache = BoundedOrderedDict()  # I want this to take maxlen as an argument
    @functools.wraps(function)
    def memo_target(*args):
        lookup_value = args
        if lookup_value not in cache:
            cache[lookup_value] = function(*args)
        return cache[lookup_value]
    return memo_target

@memoize
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

if __name__ == '__main__':
    x = fib(50)
    print(x)

Edit: Using Ben's suggestion, I created the following decorator, which I believe works the way I imagined. It's important to me to be able to use these decorated functions with multiprocessing, and that has been an issue in the past. But a quick test of this code seemed to work correctly, even when farming out the jobs to a pool of threads.

def memoize(func=None, maxlen=None):
    if func:
        cache = BoundedOrderedDict(maxlen=maxlen)
        @functools.wraps(func)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = func(*args)
            return cache[lookup_value]
        return memo_target
    else:
        def memoize_factory(func):
            return memoize(func, maxlen=maxlen)
        return memoize_factory

Решение

@memoize
def some_function(spam, eggs):
    # This would use the boundless cache.
    pass

Here memoize is used as a function that is called on a single function argument, and returns a function. memoize is a decorator.

@memoize(200)  # or @memoize(maxlen=200)
def some_function(spam, eggs):
    # This would use the bounded cache of size 200.
    pass

Here memoize is used as a function that is called on a single integer argument and returns a function, and that returned function is itself used as a decorator i.e. it is called on a single function argument and returns a function. memoize is a decorator factory.

So to unify these two, you're going to have to write some ugly code. The way I would probably do it is to have memoize look like this:

def memoize(func=None, maxlen=None):
    if func:
        # act as decorator
    else:
        # act as decorator factory

This way if you want to pass parameters you always pass them as keyword arguments, leaving func (which should be a positional parameter) unset, and if you just want everything to default it will magically work as a decorator directly. This does mean @memoize(200) will give you an error; you could avoid that by instead doing some type checking to see whether func is callable, which should work well in practice but isn't really very "pythonic".

An alternative would be to have two different decorators, say memoize and bounded_memoize. The unbounded memoize can have a trivial implementation by just calling bounded_memoize with maxlen set to None, so it doesn't cost you anything in implementation or maintenance.

Normally as a rule of thumb I try to avoid mangling a function to implement two only-tangentially related sets of functionality, especially when they have such different signatures. But in this case it does make the use of the decorator is natural (requiring @memoize() would be quite error prone, even though it's more consistent from a theoretical perspective), and you're presumably going to implement this once and use it many times, so readibility at point of use is probably the more important concern.

Другие советы

You want write a decorator that takes an argument (the maximum length of the BoundedOrderedDict) and returns a decorator that will memoize your function with a BoundedOrderedDict of the appropriate size:

def boundedMemoize(maxCacheLen):
    def memoize(function):
        cache = BoundedOrderedDict(maxlen = maxCacheLen)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = function(*args)
            return cache[lookup_value]
        return memo_target
    return memoize

You can use it like this:

@boundedMemoize(100)
def fib(n):
    if n < 2: return 1
    return fib(n - 1) + fib(n - 2)

Edit: Whoops, missed part of the question. If you want the maxlen argument to the decorator to be optional, you could do something like this:

def boundedMemoize(arg):
    if callable(arg):
        cache = BoundedOrderedDict()
        @functools.wraps(arg)
        def memo_target(*args):
            lookup_value = args
            if lookup_value not in cache:
                cache[lookup_value] = arg(*args)
            return cache[lookup_value]
        return memo_target

    if isinstance(arg, int):
        def memoize(function):
            cache = BoundedOrderedDict(maxlen = arg)
            @functools.wraps(function)
            def memo_target(*args):
                lookup_value = args
                if lookup_value not in cache:
                    cache[lookup_value] = function(*args)
                return cache[lookup_value]
            return memo_target
        return memoize

From http://www.python.org/dev/peps/pep-0318/

The current syntax also allows decorator declarations to call a function that returns a decorator:

@decomaker(argA, argB, ...)
def func(arg1, arg2, ...):
    pass

This is equivalent to:

func = decomaker(argA, argB, ...)(func)

Also, I'm not sure if I would use OrderedDict for this, I would use a Ring Buffer, they are very easy to implement.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow