Question

I have recently found and started using default dictionaries to replace several more bulky constructs. I have read in 'the zen of python' that one of the key points of python is "There should be one-- and preferably only one --obvious way to do it."

Based on that criteria (or perhaps more practically based on memory usage, or speed) which of the following (or something totally different) would be best? I have a hunch that the first is correct, but would like other people's opinions.

my_dict = defaultdict(int)
for generic in iterable:
    my_dict[generic] +=1

or:

my_dict = {}
for generic in iterable:
    if generic not in my_dict:
        my_dict[generic] = 1
    else:
        my_dict[generic]+=1

or:

my_dict = {}
for generic in iterable:
    try:
        my_dict[generic] += 1
    except(KeyError):
        my_dict[generic] = 1

Same can be said of using my_dict = defaultdict(list) and using append functions. Assume that multiple for loops, or other conditionals are used rather than simply counting generic values from a single iterable as that would obviously have different features.

Was it helpful?

Solution 2

As Paulo Almeida commented, for the example you posted the "obvious" solution is to use a collections.Counter:

from collections import Counter
my_dict = Counter(iterable)

And that's it.

As for the other snippets you posted, and assuming the my_dict[key] += 1 was just for the example and your general question is about "how to best populate a dict": collections.defaultdict is the right choice for homogeneous dicts (same type of values for all keys) where the type has a default value (numeric zero, empty string, empty list...). The most common use case I can think of is for populating a dict of lists (or sets or other containers).

Now when neither collections.Counter nor collections.defaultdict solve your problem, you have three possible patterns:

  • look before
  • try/except KeyError
  • dict.setdefault(key, value)

The try/except solution will be faster if you expect the key to already exist - a try/except block is very quick to setup but costly when the exception is raised. As far as I'm concerned I don't recommand it unless you are very very very sure about what your data looks like now and what they will look like in the future.

The "look before" solution has an almost constant cost, and while not free it's still quite cheap. That's really your safest bet.

the dict.setdefault() solution has about the same cost as the "look before" one, BUT you also have the constant cost of instanciating a default object, that will often be thrashed immediatly. It was a common pattern some years ago but since the collection.defaultdict appeared it's of rather marginal use, not to say mostly useless.

OTHER TIPS

If you insist on using a dictionary or defaultdict, the first one is the best. For counting, however, there's a lovely class called Counter in collections:

>>> from collections import Counter
>>> c = Counter()
>>> for generic in iterable:
...     c.update(generic)

Or even shorter:

>>> c = Counter(iterable)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top