Question

Which is faster? Counter()+=Counter or Counter.update(Counter)?

And why is one faster than the other?

I've tried some simple profiling but i don't think it's enough to conclusively save that Counter+=Counter is faster than Counter.update(Counter):

from collections import Counter
import time
x = Counter(['abc','def', 'abc'])
y = Counter(['xyz', 'xyz', 'uvw', 'abc'])

start = time.time()
x.update(y)
end = time.time() - start
print x
print 'update:', end
print 

x = Counter(['abc','def', 'abc'])
start = time.time()
x+=y
end = time.time() - start
print x
print 'plus:', end

[out]:

Counter({'abc': 3, 'xyz': 2, 'def': 1, 'uvw': 1})
update: 4.48226928711e-05

Counter({'abc': 3, 'xyz': 2, 'def': 1, 'uvw': 1})
plus: 2.28881835938e-05
Was it helpful?

Solution

The Counter.update() method was designed to be faster. The __add__() method does more work because it has to eliminate non-negative values:

# heart of the update() loop in Python 2:
for elem, count in iterable.iteritems():
    self[elem] = self_get(elem, 0) + count

# heart of the __add__() loop in Python 2:
result = Counter()
for elem, count in self.items():
    newcount = count + other[elem]
    if newcount > 0:
        result[elem] = newcount
for elem, count in other.items():
    if elem not in self and count > 0:
        result[elem] = count
return result

As you can see, the __add__ method does considerable more work.

There is another difference in later versions of Python 3 which have an __iadd__() method that does a true in-place update that does less work than an __add__() method which creates a new counter followed by an assignment to replace the old counter:

def __iadd__(self, other):
    for elem, count in other.items():
        self[elem] += count
    return self._keep_positive()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top