Check out collections.Counter
and set
. Counter
is very handy for creating tallies (aka. counting) and set
is great for removing duplicates from a sequence.
from collections import Counter
phrases = ['cat dog', 'cat cat', 'dog', 'cat cat cat']
all_counts = Counter()
occurrence_counts = Counter()
for phrase in phrases:
words = phrase.split()
distinct_words = set(words)
all_counts.update(words)
occurrence_counts.update(distinct_words)
all_counts['cat'] # 6
occurrence_counts['cat'] # 3
update()
updates the tallies based on what you pass it.
Play around with set
a bit by running python from from command line and you should get an idea for what is going on above:
$ python
>>> animals = [ 'bird', 'bird', 'cat' ]
>>> set(animals)
set(['bird', 'cat'])
>>> list(set(animals))
['bird', 'cat']