Question

When I am running the command bin\mallet train-topics --input input.tutorial.mallet --num-topics 40 --num-iterations 100 --optimize-interval 50 --optimize-burn-in 200 --output-state input.gz --output-topic-keys inputkeys.txt --output-doc-topics input-proportion.txt

I am getting different results at every instance of running the command.

Output:

0 AJAY_DASARI 19 0.062051649928263994 39 0.03263988522238164 35 0.03263988522238164 33 0.03263988522238164 32 0.03263988522238164 23 0.03263988522238164 ............... 1 BALVINDERSINGH 21 0.06297779395704405 36 0.04805242082271569 22 0.04805242082271569 35 0.03312704768838733 32 0.03312704768838733 31 0.03312704768838733 30 0.03312704768838733 26 0.03312704768838733 24 0.03312704768838733 15 0.03312704768838733 13 ................

How to get the same result every time when the command is used

Was it helpful?

Solution

When you train the model, use the option --random-seed INTEGER (different from 0, otherwise it uses the clock) to fix the random seed. It should give you consistent results over multiple runs.

There was a bug with this feature, which is now fixed in the development release.
See MALLET's download page to build the most current version.

OTHER TIPS

It is a probabilistic/statistical approach based on sampling so you should not expect the same scores and same word per row each time u run the command... also I believe the number of iterations is a little small. Try setting it to 1000.

Hope it helps.

The only way to get the same answer every time would be to seed the random number generator identically.

MALLET uses Gibbs' Sampling to infer properties of the topic model: this is a Markov Chain Monte Carlo method which uses a random number generator to iteratively resample some parameters in the model based on the current value of all others. In certain cases you can average the quantity of interest over different iterations to make it more stable: however, the topics themselves cannot be averaged over iterations because of something called identifiability. See the following Griffiths and Steyvers paper, particularly the footnote on p5230.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top