How to use vectors produced by TF-IDF as an input for fuzzy c-means?
-
01-11-2019 - |
Pergunta
I have done text processing with TF-IDF method and as an output got a list of normalized vectors [0, 1] for each document. Such as below:
Document 1
word1:1.0, word2:0.9, ..., word_n:0
Document 2
word2:1.0, word1:0.4, ..., word_n:0
...
etc
The above is basically a list of key-values where key is a term and values are TF-IDF values, where value 1 means that the term matches the document the most compared to other terms in the set.
My question is, to what form should I transform these vectors in order to properly use fuzzy c-means clustering on them? I feel like it should be 2D matrix of something, but can't figure it out.
At the very end I would like to have a trained model which on a given input could say to what documents (based on the membership values) it belongs with the highest chance.
Nenhuma solução correta
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange