Why does the same LDA parameters and corpus generate different topics everytime?
Because LDA uses randomness in both training and inference steps.
And how do i stabilize the topic generation?
By resetting the numpy.random
seed to the same value every time a model is trained or inference is performed, with numpy.random.seed
:
SOME_FIXED_SEED = 42
# before training/inference:
np.random.seed(SOME_FIXED_SEED)
(This is ugly, and it makes Gensim results hard to reproduce; consider submitting a patch. I've already opened an issue.)