Mahout LDA how to predict the topic on test data set?

https://stackoverflow.com/questions/12525104

03-07-2021
|

Question

From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no information on how the trained model can be applied on a test data to predict the topic distribution. Or should we write our own program to use the output of conditional probablities to find the topics over a test data set?

Solution

Please have a look at publication by 2009 Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.

Mallet has implementation of left-to-right estimator method.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow