使用freqdist，python总结单词频率的数量

https://stackoverflow.com/questions/4206979

25-09-2019
|

题

如何使用freqdist的fd.items（）总结频率的数量？

>>> fd = FreqDist(text) 
>>> most_freq_w = fd.keys()[:10] #gives me the most 10 frequent words in the text
>>> #here I should sum up numbers of each of these 10 freq words appear in the text

例如，如果每个单词 most_freq_w 出现10次，结果应该是 100

!!! 我不需要文本中的所有单词，只有10个最频繁的单词

解决方案

我不熟悉 nltk, ，但是由于 FreqDist 源自于 dict, ，那么以下内容应起作用：

v = fd.values()
v.sort()
count = sum(v[-10:])

其他提示

要查找一个单词在语料库中出现的次数（您的文本）：

raw="<your file>"
tokens = nltk.word_tokenize(raw)
fd = FreqDist(tokens)
print fd['<your word here>']

它具有漂亮的印刷功能

    fd.pprint()

会做的。

如果 FreqDist 是对其频率的单词映射：

sum(map(fd.get, most_freq_w))

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow