Find the 10 most frequently occurring words with morphology [closed]

https://stackoverflow.com/questions/15971578

python
text-analysis
text-mining

03-04-2022
|

سؤال

Please tell me how to solve this problem better.

I have the text of the Russian and I want to find the 10 most common words with morphology. Maybe there is any open source libraries to solve this issue in python?

المحلول

You can use one of Python morphology analyzers for Russian to normalize the word:

https://github.com/kmike/pymorphy2
https://github.com/kmike/pymorphy
https://github.com/irokez/Pyrus

There is also a Porter stemmer for Russian in https://github.com/nltk/nltk. Also, you could employ http://company.yandex.ru/technologies/mystem/ from a command line.

I'd recommend pymorphy2 for your task, but I'm a bit biased :)

نصائح أخرى

PyStemmer and NLTK are the two obvious libraries here.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow