Question

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus?

from gensim import corpora, models

documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]

stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)

for i in lda.show_topics():
    print i
Was it helpful?

Solution

There is a variable call topn in show_topics() where you can specify the number of top N words you require from the words distribution over each topic. see http://radimrehurek.com/gensim/models/ldamodel.html

So instead of the default lda.show_topics(). You can use the len(dictionary) for the full word distributions for each topic:

for i in lda.show_topics(topn=len(dictionary)):
    print i

OTHER TIPS

There are two variable call num_topics and num_words in show_topics(),for num_topics number of topics, return num_words most significant words (10 words per topic, by default). see http://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.show_topics

So you can use the len(lda.id2word) for the full words distributions for each topic,and the lda.num_topics for the all topics in your lda model.

for i in lda.show_topics(formatted=False,num_topics=lda.num_topics,num_words=len(lda.id2word)):
    print i

The below code will print your words as well as their probability. I have printed top 10 words. You can change num_words = 10 to print more words per topic.

for words in lda.show_topics(formatted=False,num_words=10):
    print(words[0])
    print("******************************")
    for word_prob in words[1]:
        print("(",dictionary[int(word_prob[0])],",",word_prob[1],")",end = "")
    print("")
    print("******************************")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top