Pergunta

I want to cluster words based on their semantic similarity. Currently I have a list of documents with detected noun phrases in them. I want to make cluster out of these obtained nouns within the documents and unsupervisedly cluster them semantically?

I have looked at wordnet and gensim libraries. Any suggestions as to which can really help in getting the required cluster of words based on their semantic similarity?

Foi útil?

Solução

For similarity based on phrase co-occurrence (phrases appearing more often together in documents will be more similar), you can use gensim.

Check out the Latent Semantic Analysis and Latent Dirichlet Allocation there: http://radimrehurek.com/gensim/tut2.html#available-transformations

Depending on what exactly you want your clusters to do, you can either use the LSI/LDA topics directly as clusters. Or cluster the obtained latent phrase vectors etc.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top