質問

I am using Gensim python toolkit to build tf-idf model for documents. So I need to create a dictionary for all documents first. However, I found Gensim does not use stemming before creating the dictionary and corpus. Am I right ?

役に立ちましたか?

解決

You are correct. Gensim doesn't do anything special other than convert what you give it into different models.

Here is the relevant quote and the link that it is from:

The ways to process documents are so varied and application- and language-dependent that I decided to not constrain them by any interface. Instead, a document is represented by the features extracted from it, not by its “surface” string form: how you get to the features is up to you.

From Strings to Vectors

他のヒント

I was also struggling with the same case. To overcome i first stammed documents using NLTK and later processed it with gensim. Probably it can be a easier and handy way to perform your task.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top