Word frequency count based on two words using python

https://stackoverflow.com/questions/18952894

29-06-2022
|

Question

There are many resources online that shows how to do a word count for single word like this and this and this and others...
But I was not not able to find a concrete example for two words count frequency .

I have a csv file that has some strings in it.

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

So I want the output to be like :

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

Of course I will have to strip all the comma, interrogation points.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

I will also remove some stop words which I found here just to get more concrete data from the text.

How can I achieve this results using python?

Thanks!

Solution

>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow