Question

I started out with a CSV file with one column and many rows where each row contains a sentence. I wrote some python to remove the stopwords and generated a new csv file with the same format (1 column many rows of sentences, but now the sentences have their stopwords removed.) The only part of my code that is not working is writing to a new csv.

Instead of writing one sentence to one column, i got multiple columns where each row in one column contains one character of the sentence..

Here is an example of my new_text_list:

['"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"', 
'"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"']

Here is an example of the output csv:

col1 col2
"      W
W      e
"      W
W      e
l
l

What am i doing wrong?

Here is my code:

def remove_stopwords(filename):
  new_text_list=[]
  cachedStopWords = set(stopwords.words("english"))
  with open(filename,"rU") as f:
    next(f)
    for line in f:
      row = line.split()
      text = ' '.join([word for word in row
                             if word not in cachedStopWords])
      # print text
      new_text_list.append(text)
  print new_text_list

  with open("output.csv",'wb') as g:
    writer=csv.writer(g)
    for val in new_text_list:
      writer.writerows([val])
Was it helpful?

Solution

with open("output.csv", 'wb') as g:
    writer = csv.writer(g)
    for item in new_text_list:
        writer.writerow([item])  # writerow (singular), not writerows (plural)

or

with open("output.csv", 'wb') as g:
    writer = csv.writer(g)
    writer.writerows([[item] for item in new_text_list])

When you use writerows, the argument should be an iterator of rows, where each row is a iterator of field values. Here, the field value is item. So a row could be the list, [item]. Thus, writerows can take a list of lists as its argument.

writer.writerows([val])

did not work because [val] is just a list containing a string, not a list of lists.

Now strings are also sequences -- a sequence of characters:

In [164]: list('abc')
Out[164]: ['a', 'b', 'c']

So writerows took [val] to be a list containing a row, val. Each character represented a field value. So the characters in your string got splattered. For example,

import csv
with open('/tmp/out', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(['Hi, there'])

yields

H,i,",", ,t,h,e,r,e

OTHER TIPS

Using the official python documentation on csv. I managed to write and read your sample data as below...

    l = ['"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"',
 '"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"']

with open('output.csv', 'wb') as csvfile:
    writer = csv.write(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
       for i in l:
           write.writerow(i)

I then read the file as below:

with open('output.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in reader:
        print ''.join(row)

and got this output:

"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that's Best Business Today! Try Now FREE 30 Days! Track sales expenses �"

"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that's Best Business Today! Try Now FREE 30 Days! Track sales expenses �"

I hope this helps...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top