Pergunta

I have done text processing with TF-IDF method and as an output got a list of normalized vectors [0, 1] for each document. Such as below:

Document 1
word1:1.0, word2:0.9, ..., word_n:0

Document 2
word2:1.0, word1:0.4, ..., word_n:0
...
etc

The above is basically a list of key-values where key is a term and values are TF-IDF values, where value 1 means that the term matches the document the most compared to other terms in the set.

My question is, to what form should I transform these vectors in order to properly use fuzzy c-means clustering on them? I feel like it should be 2D matrix of something, but can't figure it out.

At the very end I would like to have a trained model which on a given input could say to what documents (based on the membership values) it belongs with the highest chance.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top