A forgiving dictionary

https://stackoverflow.com/questions/3358580

03-10-2020
|

Pergunta

I am wondering how to create forgiving dictionary (one that returns a default value if a KeyError is raised).

In the following code example I would get a KeyError; for example

a = {'one':1,'two':2}
print a['three']

In order not to get one I would 1. Have to catch the exception or use get.

I would like to not to have to do that with my dictionary...

Solução

import collections
a = collections.defaultdict(lambda: 3)
a.update({'one':1,'two':2})
print a['three']

emits 3 as required. You could also subclass dict yourself and override __missing__, but that doesn't make much sense when the defaultdict behavior (ignoring the exact missing key that's being looked up) suits you so well...

Edit ...unless, that is, you're worried about a growing by one entry each time you look up a missing key (which is part of defaultdict's semantics) and would rather get slower behavior but save some memory. For example, in terms of memory...:

>>> import sys
>>> a = collections.defaultdict(lambda: 'blah')
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
99 6284

...the defaultdict, originally empty, now has the 99 previously-missing keys that we looked up, and takes 6284 bytes (vs. the 140 bytes it took when it was empty).

The alternative approach...:

>>> class mydict(dict):
...   def __missing__(self, key): return 3
... 
>>> a = mydict()
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
0 140

...entirely saves this memory overhead, as you see. Of course, performance is another issue:

$ python -mtimeit -s'import collections; a=collections.defaultdict(int); r=xrange(99)' 'for i in r: _=a[i]'
100000 loops, best of 3: 14.9 usec per loop

$ python -mtimeit -s'class mydict(dict):
>   def __missing__(self, key): return 0
> ' -s'a=mydict(); r=xrange(99)' 'for i in r: _=a[i]'
10000 loops, best of 3: 92.9 usec per loop

Since defaultdict adds the (previously-missing) key on lookup, it gets much faster when such a key is next looked up, while mydict (which overrides __missing__ to avoid that addition) pays the "missing key lookup overhead" every time.

Whether you care about either issue (performance vs memory footprint) entirely depends on your specific use case, of course. It is in any case a good idea to be aware of the tradeoff!-)

Outras dicas

New in version 2.5: If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.

http://docs.python.org/library/stdtypes.html

Here is how to subclass dict as suggested by NullUserException

>>> class forgiving_dict(dict):
...     def __missing__(self, key):
...         return 3
...
>>> a = forgiving_dict()
>>> a.update({'one':1,'two':2})
>>> print a['three']
3

One big difference between this answer and Alex's is that the missing key is not added to the dictionary

>>> print a
{'two': 2, 'one': 1}

Which is quite significant if you expect a lot of misses

You'll probably want to use a defaultdict (it requires atleast python2.5 I believe)

from collections import defaultdict
def default(): return 'Default Value'
d = defaultdict(default)
print(d['?'])

The function that is passed to the constructor tells the class what to return as a default value. See the documentation for additional examples.

Sometimes what you really want is .setdefault() which is not very intuitive, but it's a method that "returns the key specified, if it doesn't exist, set that key to this value".

Here is an example of setdefault() being used to good effect:

collection = {}
for elem in mylist:
    key = key_from_elem(elem)
    collection.setdefault(key, []).append(elem)

This will allow us to create a dictionary like: {'key1':[elem1, elem3], 'key2':[elem3]} without having to have an ugly check to see if there's a key already there and creating a list for it.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow