Question

From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no information on how the trained model can be applied on a test data to predict the topic distribution. Or should we write our own program to use the output of conditional probablities to find the topics over a test data set?

Was it helpful?

Solution

Please have a look at publication by 2009 Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.

Mallet has implementation of left-to-right estimator method.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top