Question

I am trying to do something similar to this:

from   collections import defaultdict
import hashlib

def factory():
    key = 'aaa'
    return { 'key-md5' : hashlib.md5('%s' % (key)).hexdigest() }

a = defaultdict(factory)
print a['aaa']

(actually, the reason why I need access to the key in the factory is not to compute an md5, but for other reasons; this is just an example)

As you can see, in the factory I have no access to the key: I am just forcing it, which makes no sense whatsoever.

Is it possible to use defaultdict in a way that I can access the key in the factory?

Was it helpful?

Solution

__missing__ of defaultdict does not pass key to factory function.

If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.

Make your own dictionary class with custom __missing__ method.

>>> class MyDict(dict):
...     def __init__(self, factory):
...         self.factory = factory
...     def __missing__(self, key):
...         self[key] = self.factory(key)
...         return self[key]
... 
>>> d = MyDict(lambda x: -x)
>>> d[1]
-1
>>> d
{1: -1}

OTHER TIPS

Unfortunately not directly, as defaultdict specifies that default_factory must be called with no arguments:

http://docs.python.org/2/library/collections.html#collections.defaultdict

But it is possible to use defaultdict as a base class that has the behavior you want:

class CustomDefaultdict(defaultdict):
    def __missing__(self, key):
        if self.default_factory:
            dict.__setitem__(self, key, self.default_factory(key))
            return self[key]
        else:
            defaultdict.__missing__(self, key)

This works for me:

>>> a = CustomDefaultdict(factory)
>>> a
defaultdict(<function factory at 0x7f0a70da11b8>, {})
>>> print a['aaa']
{'key-md5': '47bce5c74f589f4867dbd57e9ca9f808'}
>>> print a['bbb']
{'key-md5': '08f8e0260c64418510cefb2b06eee5cd'}

In several cases where I wanted a defaultdict with the key in the factory, I found an lru_cache also solved my problem:

import functools

@functools.lru_cache(maxsize=None)
def use_func_as_dict(key='') # Or whatever type
    with open(key, 'r') as ifile:
        return ifile.readlines()

f1 = use_func_as_dict('test.txt')
f2 = use_func_as_dict('test2.txt')
# This will reuse the old value instead of re-reading the file
f3 = use_func_as_dict('test.txt')
assert f3 is f1

This actually makes more sense theoretically, since you're after a function of the input rather than a consistent dummy fallback.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top