If you're reading your own textfile, then there's nothing much to do with NLTK
, you can simply use file.readlines()
:
good_reviews = """This is great!
Wow, it amazes me...
An hour of show, a lifetime of enlightment
"""
bad_reviews = """Comme si, Comme sa.
I just wasted my foo bar on this.
An hour of s**t, ****.
"""
with open('/tmp/good_reviews.txt', 'w') as fout:
fout.write(good_reviews)
with open('/tmp/bad_reviews.txt', 'w') as fout:
fout.write(bad_reviews)
reviews = []
with open('/tmp/good_reviews.txt', 'r') as fingood, open('/tmp/bad_reviews.txt', 'r') as finbad:
reviews = ([(review, 'good_review') for review in fingood.readlines()] + [(review, 'bad_review') for review in finbad.readlines()])
print reviews
[out]:
[('This is great!\n', 'good_review'), ('Wow, it amazes me...\n', 'good_review'), ('An hour of show, a lifetime of enlightment\n', 'good_review'), ('Comme si, Comme sa.\n', 'bad_review'), ('I just wasted my foo bar on this.\n', 'bad_review'), ('An hour of s**t, ****.\n', 'bad_review')]
If you're going to use the NLTK movie review corpus, see Classification using movie review corpus in NLTK/Python