Question

I'm writing a class that parses HTML in order to provide an interface to a profile on a webpage. It looks something like this:

class Profile(BeautifulSoup):
    def __init__(self, page_source):
        super().__init__(page_source)

    def username(self):
        return self.title.split(':')[0]

Except more complex and time consuming. Since I know that the underlying profiles aren't going to be changing during the lifetime of a Profile object, I thought this would be a good place to cache results in order to avoid recalculating values that are already known. I implemented this with a decorator, and the result looks like this:

def cached_resource(method_to_cache):
    def decorator(self, *args, **kwargs):
        method_name = method_to_cache.__name__

        try:
            return self._cache[method_name]
        except KeyError:
            self._cache[method_name] = method_to_cache(self, *args, **kwargs)
            return self._cache[method_name]

    return decorator


class Profile(BeautifulSoup):
    def __init__(self, page_source):
        super().__init__(page_source)
        self._cache = {}

    @cached_resource
    def username(self):
        return self.title.split(':')[0]

When I give this code to pylint, it complains about cached_resource having access to a protected variable of a client class.

I realize that the distinction between public and private isn't a huge deal in Python, but I'm still curious -- have I done something bad here? Is it poor style to have decorators rely on implementation details of the classes they're associated with?

EDIT: I'm unclear about how the closure in Duncan's answer works, so maybe this is a little bit kludge-y, but would this be a simpler solution?

def cached_resource(method_to_cache):
    def decorator(self, *args, **kwargs):
    method_name = method_to_cache.__name__

    try:
        return self._cache[method_name]
    except KeyError:
        self._cache[method_name] = method_to_cache(self, *args, **kwargs)
    except AttributeError:
        self._cache = {}
        self._cache[method_name] = method_to_cache(self, *args, **kwargs)
    finally:
        return self._cache[method_name]

return decorator
Was it helpful?

Solution

There is a bit of a code smell about that, I think I would agree with pylint on this one though it is quite subjective.

Your decorator looks like it is a general-purpose decorator, but it is tied into the internal implementation detail of the class. If you tried to use it from another class it won't work without the initialisation of _cache in __init__. The linkage I don't like is that the knowledge of an attribute called '_cache' is shared between both the class and the decorator.

You could move the initialisation of _cache out of __init__ and into the decorator. I don't know if that would help pacify pylint and it still requires the class to know about and avoid using the attribute. A cleaner solution here (I think) would be to pass the name of the cache attribute into the decorator. That should break the linkage cleanly:

def cached_resource(cache_attribute):
  def decorator_factory(method_to_cache):
    def decorator(self, *args, **kwargs):
        method_name = method_to_cache.__name__
        cache = getattr(self, cache_attribute)
        try:
            return cache[method_name]
        except KeyError:
            result = cache[method_name] = method_to_cache(self, *args, **kwargs)
            return result

    return decorator
  return decorator_factory


class Profile(BeautifulSoup):
    def __init__(self, page_source):
        super().__init__(page_source)
        self._cache = {}

    @cached_resource('_cache')
    def username(self):
        return self.title.split(':')[0]

And if you don't like a lot of decorator calls repeating the name of the attribute then:

class Profile(BeautifulSoup):
    def __init__(self, page_source):
        super().__init__(page_source)
        self._cache = {}

    with_cache = cached_resource('_cache')

    @with_cache
    def username(self):
        return self.title.split(':')[0]

Edit: Martineau suggests this may be overkill. It could be if you don't actually need separate access to the _cache attribute inside the class (e.g. to have a cache reset method). In that case you could manage the cache entirely within the decorator, but if you are going to do that you don't need a cache dictionary on the instance at all, as you can store the cache in the decorator and key on the Profile instance:

from weakref import WeakKeyDictionary

def cached_resource(method_to_cache):
    cache = WeakKeyDictionary()
    def decorator(self, *args, **kwargs):
        try:
            return cache[self]
        except KeyError:
            result = cache[self] = method_to_cache(self, *args, **kwargs)
        return result
    return decorator

class Profile(BeautifulSoup):
    def __init__(self, page_source):
        super().__init__(page_source)
        self._cache = {}

    @cached_resource
    def username(self):
        return self.title.split(':')[0]

OTHER TIPS

What you did looks fine to me. The error is presumably because pylint can't figure out that cached_resource is only "accessing" self._cache via its inner function, which ultimately is a method of the class (assigned by the decorator).

It might be worth raising an issue on the pylint tracker for this. It could be tough to handle with static analysis but the current behavior doesn't seem right.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top