I want to cluster words based on their semantic similarity. Currently I have a list of documents with detected noun phrases in them. I want to make cluster out of these obtained nouns within the documents and unsupervisedly cluster them semantically?

I have looked at wordnet and gensim libraries. Any suggestions as to which can really help in getting the required cluster of words based on their semantic similarity?

有帮助吗?

解决方案

For similarity based on phrase co-occurrence (phrases appearing more often together in documents will be more similar), you can use gensim.

Check out the Latent Semantic Analysis and Latent Dirichlet Allocation there: http://radimrehurek.com/gensim/tut2.html#available-transformations

Depending on what exactly you want your clusters to do, you can either use the LSI/LDA topics directly as clusters. Or cluster the obtained latent phrase vectors etc.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top