Getting uniform distribution over topics from gensim's LDA?
-
16-10-2019 - |
Pergunta
I am trying to learn topics distribution for each document in a corpus.
I have term-document matrix (sparse matrix of dim: num_terms * no_docs) as input to the LDA model (with num_topics=100) and when I try to infer vectors for each document I am getting uniform distribution over them. This is highly unlikely since documents are of different topics.
The relevant code snippet is:
#input : scipy sparse term-doc matrix (no_terms * no_docs)
corpus = gensim.matutils.Sparse2Corpus(term_doc)
lda = gensim.models.LdaModel(corpus, 100)
vec_gen = lda[corpus]
vecs = [vec for vec in vec_gen]
Now for each vector in vecs I am getting same probability for each topic.
Can anyone point out where I am going wrong?
Solução
I solved this issue. There is a parameter for minimum probability in gensim's LDA which is set to 0.01 by default. So topics with prob. < 0.01 are pruned from output.
Once I set min. prob to a very low value the results had all topics and their corresponding probability.