In Latent Dirichlet Allocation (LDA), is it reasonable to reconstruct the original bag-of-words using the document and word representations?
-
16-10-2019 - |
Question
In Latent Dirichlet Allocation (LDA), is it reasonable to reconstruct the original bag-of-words using the document-by-topic and topic-word inferred matrices?
I understand that I will not get frequencies by reconstructing the original matrix, but is the non-zeros after reconstruction valid?
Solution
It is possible to produce a corpus from the learned LDA parameters ($\theta$ and $\phi$) according to the generative model of LDA but it is not realistic to expect that you would recreate the original documents (in bag-of-words form). To be more specific, it is possible - but highly improbable - that you would generate the bag-of-words documents corresponding to the input corpus.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange