Question

I want to use Latent Semantic Analysis for a small app I'm building, but I don't want to build up the matrices myself. (Partly because the documents I have wouldn't make a very good training collection, because they're kinda short and heterogeneous, and partly because I just got a new computer and I'm finding it a bitch to install the linear algebra and such libraries I would need.)

Are there any "default"/pre-built LSA implementations available? For example, things I'm looking for include:

  • Default U,S,V matrices (i.e., if D is a term-document matrix from some training set, then D = U S V^T is the singular value decomposition), so that given any query vector q, I can use these matrices to compute the LSA projection of q myself.
  • Some black-box LSA algorithm that, given a query vector q, returns the LSA projection of q.
Was it helpful?

Solution

You'd probably be interested in the Gensim framework for Python; notably, it has an example on building the appropriate matrices from English Wikipedia.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top