Domanda

Please tell me how to solve this problem better.

I have the text of the Russian and I want to find the 10 most common words with morphology. Maybe there is any open source libraries to solve this issue in python?

È stato utile?

Soluzione

You can use one of Python morphology analyzers for Russian to normalize the word:

There is also a Porter stemmer for Russian in https://github.com/nltk/nltk. Also, you could employ http://company.yandex.ru/technologies/mystem/ from a command line.

I'd recommend pymorphy2 for your task, but I'm a bit biased :)

Altri suggerimenti

PyStemmer and NLTK are the two obvious libraries here.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top