Question

I have a sequence of words and I want to eliminate all the stop words from it using nltk. The code snippet for the same is given below:

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

However, I am getting a run time error.

" except LookupError: raise e "

I have imported nltk.

Anything I am missing?

Was it helpful?

Solution

First download and ensure that your stopwords are already downloaded, see http://www.nltk.org/data:

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top