Question

See below, why does the implementation of += blow away a key in my original counter?

>>> c = Counter({'a': 0, 'b': 0, 'c': 0})
>>> c.items()
[('a', 0), ('c', 0), ('b', 0)]
>>> c += Counter('abba')
>>> c.items()
[('a', 2), ('b', 2)]

I think that's impolite to say the least, there is quite a difference between "X was counted 0 times" and "we aren't even counting Xs". It seems like collections.Counter is not a counter at all, it's more like a multiset.

But counters are a subclass of dict and we're allowed to construct them with zero or negative values: Counter(a=0, b=-1). If it's actually a "bag of things", wouldn't this be prohibited, restricting init to accept an iterable of hashable items?

To further confuse matters, counter implements update and subtract methods which have different behaviour to + and - operators. It seems like this class is having an identity crisis!

Is a Counter a dict or a bag?

Was it helpful?

Solution 2

From the source;

def __add__(self, other):
    '''Add counts from two counters.

    >>> Counter('abbb') + Counter('bcc')
    Counter({'b': 4, 'c': 2, 'a': 1})

    '''
    if not isinstance(other, Counter):
        return NotImplemented
    result = Counter()
    for elem, count in self.items():
        newcount = count + other[elem]
        if newcount > 0:
            result[elem] = newcount
    for elem, count in other.items():
        if elem not in self and count > 0:
            result[elem] = count
    return result

It seems that Counter implemented as removing keys which sums to zero non-positive keys. Since default value is zero, and the source has also zero, the resulting dict doesn't contains that key.

Maybe you can get the same behavior with update:

a.update(b)

seems to do what you want. Probably slower tho, a hand-made implementation of the __add__ method would be much faster.

OTHER TIPS

Counters are a kind of multiset. From the Counter() documentation:

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

Emphasis mine.

Further on it tells you gives you some more detail about the multiset nature of Counters:

Note: Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions.

[...]

  • The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.

So Counter objects are both; dictionaries and bags. Standard dictionaries, however, don't support addition, but Counters do, so it's not as if Counters are breaking a precedence set by dictionaries here.

If you wanted to retain the zeros, use Counter.update() and pass in the result of Counter.elements() of the other object:

c.update(Counter('abba').elements())

Demo:

>>> c = Counter({'a': 0, 'b': 0, 'c': 0})
>>> c.update(Counter('abba').elements())
>>> c
Counter({'a': 2, 'b': 2, 'c': 0})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top