Getting uniform distribution over topics from gensim's LDA?

https://datascience.stackexchange.com/questions/13907

16-10-2019
|

Pergunta

I am trying to learn topics distribution for each document in a corpus.

I have term-document matrix (sparse matrix of dim: num_terms * no_docs) as input to the LDA model (with num_topics=100) and when I try to infer vectors for each document I am getting uniform distribution over them. This is highly unlikely since documents are of different topics.

The relevant code snippet is:

#input : scipy sparse term-doc matrix (no_terms * no_docs)

corpus = gensim.matutils.Sparse2Corpus(term_doc)

lda = gensim.models.LdaModel(corpus, 100)

vec_gen = lda[corpus]

vecs = [vec for vec in vec_gen]

Now for each vector in vecs I am getting same probability for each topic.

Can anyone point out where I am going wrong?

Solução

I solved this issue. There is a parameter for minimum probability in gensim's LDA which is set to 0.01 by default. So topics with prob. < 0.01 are pruned from output.

Once I set min. prob to a very low value the results had all topics and their corresponding probability.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange