Question

i'm trying to do factor analysis for a co-occurrence matrix(C) , which is computed from the term-document matrix(TD) as follows: C=TD*TD'

In theory C should be positive semi-definite , but it isn't and the factor analysis algorithm can't work with it because of this.I can't change the algo because of speed reasons).

I look it up and it might be a numerical stability issue : A simple algorithm for generating positive-semidefinite matrices - answer 2.

What's a good way to proceed here ?

Was it helpful?

Solution

I would do an eigendecomposition of the matrix:

C=Q D Q^-1

If your matrix really is positive semidefinite, then all of the eigenvalues (the entries on the diagonal of D) should be non-negative. (This is probably the test that your factor analysis algorithm is doing as well to see if the matrix is positive semidefinite.)

If you're suffering numeric problems, some of the eigenvalues will probably be barely smaller than zero. Try setting these entries to zero, compute Q D Q^-1 to get a new, corrected C, then submit that to your factor analysis algorithm.

On the other hand, if you find that your matrix C has truly negative eigenvalues, then you know that you're going wrong somewhere in the construction of C.

OTHER TIPS

Not being able to comment, I will have to echo SplittingField's comment, with the nitpick that forming C=TD*TD' squares the condition number of TD, not double it. Equivalent and much more stable to finding the eigendecomposition of C would be to perform singular value decomposition (SVD) on TD. You get the condition number as a bonus: the ratio of the largest singular value to the smallest singular value is the condition number of the matrix, and the base-10 logarithm of that is an estimate on how much decimal digits you stand to lose had you gone through with using C in your calculations (of course, if the tiniest singular value is 0, your problem is singular!)

First of all, there are techniques for repairing "pathologies" of matrices with negative eigenvalues -- recall, the matrix originates through a series of computations and its commonly these steps which result in the pathologies in the first place. It's not really appropriate to accept the fact that because a matrix has small, near-zero negative eigenvalues, it's a "bad" matrix. Rather, do some work on repairing the pathologies. With regard to SVD, I agree that it is one of the better approaches -- however, I don't commonly use it in work so much because it's so expensive computationally. However, if you have one or more matrix elements that are zero, i.e. the matrix is sparse, then you must use SVD since it will be one of the only methods that will work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top