Your data file is not a CSV -- the words are separated by whitespace, not commas. So you don't need the CSV module for this. Instead, just read each line from the file and use row = line.split()
to split the line on whitespace.
def remove_stopwords(filename):
new_text_list = []
cachedStopWords = set(stopwords.words("english"))
with open(filename, "rU") as f:
next(f) # skip one line
for line in f:
row = line.split()
text = ' '.join([word for word in row
if word not in cachedStopWords])
print(text)
new_text_list.append(text)
By the way, checking membership in a set
is an O(1) operation, while checking membership in a list
is an O(n) operation. So it's advantageous to make cachedStopWords
a set:
cachedStopWords = set(stopwords.words("english"))