Question

I'm trying to retrieve documents in multiple directories and classify them. The NLTK book shows the example for categorizing files in two folders within the movie_reviews corpus, 'pos' and 'neg':

from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
              for category in movie_reviews.categories()
              for fileid in movie_reviews.fileids(category)]

I attempted to do something similar for a couple of folders within the same directory:

reviews= "C:\Users\Alpine\Documents\Reviews" #Folders: Good, Bad
documents = [(list(reviews.words(fileid)), category)
              for category in reviews.categories()
              for fileid in reviews.fileids(category)]

However I get Attribute Error: 'str' object has no attribute 'categories' at for category in reviews.categories().

Is this method exclusive for files in the nltk corpus? Is there an alternative?

Was it helpful?

Solution

The problem is in confusing movie_reviews and reviews

movie_review is defined by importing from nltk.corpus and has a method words.

reviews is a variable to which you have assigned a string. And the string does not have a method words, as you were told by the error message.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top