Question

Firstly, I have two list of strings:

['abc','abc','def','jkl']
['abc','def','def','pqr', 'pr', 'foo', 'bar']

And then I need counters of the lists that are normalized such that the sum of the values in each counter equals 1:

Counter({'abc': 0.8164965809277261, 'jkl': 0.4082482904638631, 'def': 0.4082482904638631})
Counter({'abc': 1.1498299142610595, 'def': 1.0749149571305296, 'jkl': 0.4082482904638631, 'pr': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})

The normalizing factor is

math.sqrt(sum(i*i for i in counter.values()))

I've tried the following by iterating throw the counter keys but is there any other way of achieving the say x+y Counter?

>>> from collections import Counter
>>> import math
>>> x = Counter(['abc','abc','def','jkl'])
>>> denominator = 1/math.sqrt(sum(math.pow(i,2) for i in x.values()))
>>> for i in x:
...     x[i]*=denominator
... 
>>> x
Counter({'abc': 0.8164965809277261, 'jkl': 0.4082482904638631, 'def': 0.4082482904638631})
>>> y = Counter(['abc','def','def','pqr', 'pr', 'foo', 'bar'])
>>> denominator2 = 1/math.sqrt(sum(math.pow(i,2) for i in y.values()))
>>> for i in y:
...     y[i]*=denominator2
... 
>>> y
Counter({'def': 0.6666666666666666, 'pr': 0.3333333333333333, 'abc': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})
>>> x+y
Counter({'abc': 1.1498299142610595, 'def': 1.0749149571305296, 'jkl': 0.4082482904638631, 'pr': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})
Was it helpful?

Solution

You need to sum the values, then divide each count by the sum:

total = sum(x.values(), 0.0)
for key in x:
    x[key] /= total

By starting the sum with 0.0 we make sure total is a floating point value, avoiding the Python 2 floor division behaviour of / with integer operands.

Demo:

>>> from collections import Counter
>>> x = Counter(['abc','abc','def','jkl'])
>>> total = sum(x.values(), 0.0)
>>> for key in x:
...     x[key] /= total
... 
>>> x
Counter({'abc': 0.5, 'jkl': 0.25, 'def': 0.25})
>>> y = Counter(['abc','def','def','pqr', 'pr', 'foo', 'bar'])
>>> total = sum(y.values(), 0.0)
>>> for key in y:
...     y[key] /= total
... 
>>> y
Counter({'def': 0.2857142857142857, 'pr': 0.14285714285714285, 'abc': 0.14285714285714285, 'bar': 0.14285714285714285, 'pqr': 0.14285714285714285, 'foo': 0.14285714285714285})

If you need to sum the counters, you'd need to re-normalize the resulting counter separately; summing two normalized counters means you have a new counter whole values sum to 2, for example.

OTHER TIPS

Normalization of a Counter object (c1) of a List object (l1) is dividing each counts by the total elements in the list that is the lenght of the list (total). This is less costly comparing with calculating the total counts in (c1) like sum(c1.values(), 0.0).

The following example on the first list given can be used:

l1 = ['abc','abc','def','jkl']
c1 = Counter(l1)
# Normalization
total = 1.0 * len(l1) # converting to float to avoid floor division in Python 2.X
for k in c1:
    c1[k] /= total
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top