Latent semantic analysis
describes this relation pretty well.
It also explains how one uses
first the full doc-term matrix, then the reduced one,
to map lists (vectors) of terms to near-match docs -- i.e. why reduce.
See also
making-sense-of-PCA-eigenvectors-eigenvalues.
(The many different answers there suggest that no single one is intuitive for everybody.)
What is the significance of covariance matrix constructed through term document matrix in PCA?
-
28-11-2021 - |
Pergunta
I'm working on neural networks and for reducing the dimensions of the term-document matrix constructed through documents and the various terms in it bearing the values of tf-idf , I need to apply PCA. Something Like this
Term 1 Term 2 Term 3 Term 4. ..........
Document 1
Document 2 tfidf values of terms per document
Document 3
.
.
.
.
.
PCA works by getting the mean of the data and then subtracting the mean and then using the following formula for the covariance matrix
Let the matrix M be the term-document matrix of dimension NxN
The Covariance matrix becomes
( M x transpose(M))/N-1
We then calculate the eigen values and the eigen vectors to feed as feature vectors in neural networks. What I'm not able to comprehend is the importance of covariance matrix and what dimensions is it finding the covariance of.
Because if we consider simple 2 dimensions X,Y,can be understood. What dimensions are being correlated here?
Thank you
Solução