문제

I have a sequence of words and I want to eliminate all the stop words from it using nltk. The code snippet for the same is given below:

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

However, I am getting a run time error.

" except LookupError: raise e "

I have imported nltk.

Anything I am missing?

도움이 되었습니까?

해결책

First download and ensure that your stopwords are already downloaded, see http://www.nltk.org/data:

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top