How can Latent Semantic Indexing be used for feature selection?

Question

LSA is conceptually similar to PCA, but is used in different settings.

The goal of PCA is to transform data into new, possibly less-dimensional space. For example, if you wanted to recognize faces and use 640x480 pixel images (i.e. vectors in 307200-dimensional space), you would probably try to reduce this space to something reasonable to both - make it computationally simpler and make data less noisy. PCA does exactly this: it "rotates" axes of your high-dimensional space and assigns "weight" to each of new axes, so that you can throw away least important of them.

LSA, on other hand, is used to analyze semantic similarity of words. It can't handle images, or bank data, or some other custom dataset. It is designed specifically for text processing, and works specifically with term-document matrices. Such matrices, however, are often considered too large, so they are reduced to form lower-rank matrices in a way very similar to PCA (both of them use SVD). Feature selection, though, is not performed here. Instead, what you get is feature vector transformation. SVD provides you with some transformation matrix (let's call it S), which, being multiplied by input vector x gives new vector x' in a smaller space with more important basis. This new basis is your new features. Though, they are not selected, but rather obtained by transforming old, larger basis.

For more details on LSA, as long as implementation tips, see this article.