문제

How to sum up the number of words frequency using fd.items() from FreqDist?

>>> fd = FreqDist(text) 
>>> most_freq_w = fd.keys()[:10] #gives me the most 10 frequent words in the text
>>> #here I should sum up numbers of each of these 10 freq words appear in the text

e.g. if each word in most_freq_w appear 10 times, the result should be 100

!!! I don't need that number of all words in the text, just the 10 most frequent

도움이 되었습니까?

해결책

I'm not familiar with nltk, but since FreqDist derives from dict, then the following should work:

v = fd.values()
v.sort()
count = sum(v[-10:])

다른 팁

To find the number of times a word appears in the corpus(your piece of text):

raw="<your file>"
tokens = nltk.word_tokenize(raw)
fd = FreqDist(tokens)
print fd['<your word here>'] 

It has a pretty print feature

    fd.pprint() 

will do it.

If FreqDist is a mapping of words to their frequencies:

sum(map(fd.get, most_freq_w))
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top