I have a sequence of words and I want to eliminate all the stop words from it using nltk. The code snippet for the same is given below:

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

However, I am getting a run time error.

" except LookupError: raise e "

I have imported nltk.

Anything I am missing?

有帮助吗?

解决方案

First download and ensure that your stopwords are already downloaded, see http://www.nltk.org/data:

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top