Question

I calculate the TFIdf(term frequency,inverse document frequency) and i have seen that after this step it is necessary to reduce the dimension of My Matrix with using methods like LSI ,chi -square test...,

I haven't any idea how i can implement chi square test in java for dimensionality reduction of matrix TFIDF,if there is some library to do this or tutorial in which they explain how i can do this, tell me please

Was it helpful?

Solution

I don't think you want to do chi-square; that's not a technique for dimension reduction.

What you want to do is SVD, or singular value decomposition. That is the technique used in LSI/LSA for dimensionality reduction.

Wikipedia suggests using a library called 'S-Space Pacakage' for LSA in Java. I haven't used it myself, but you may want to look into it.

http://code.google.com/p/airhead-research/

OTHER TIPS

use gensims library for LSA, LDA. It can practically perform LSA for any large dataset. It does not load the entire corpus into memory at once but does a lazy read.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top