In Latent Dirichlet Allocation (LDA), is it reasonable to reconstruct the original bag-of-words using the document and word representations?

datascience.stackexchange https://datascience.stackexchange.com/questions/10743

  •  16-10-2019
  •  | 
  •  

Question

In Latent Dirichlet Allocation (LDA), is it reasonable to reconstruct the original bag-of-words using the document-by-topic and topic-word inferred matrices?

I understand that I will not get frequencies by reconstructing the original matrix, but is the non-zeros after reconstruction valid?

Was it helpful?

Solution

It is possible to produce a corpus from the learned LDA parameters ($\theta$ and $\phi$) according to the generative model of LDA but it is not realistic to expect that you would recreate the original documents (in bag-of-words form). To be more specific, it is possible - but highly improbable - that you would generate the bag-of-words documents corresponding to the input corpus.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top