I'm having an issue with using DictWriter to write dicts to a csv. I specify the headers and insert the data into the csv however certain columns are not being populated. Specifically, productID, userID, and helpfulness. The other issue is rows are being duplicated a couple of times before moving to the next entry.
I can confirm the missing data is in the dicts just by simply printing them but they are being lost (and other data duplicated) in the write.
my code is below and am using a dataset from here: http://snap.stanford.edu/data/web-FineFoods.html
import csv
list_of_dicts = []
dict_of_data = {}
filename = open('file.txt')
lines = filename.readlines()
cleanlines = [ line.strip() for line in lines ]
list_of_lists = []
group = []
print "cleaning the spaces"
for line in cleanlines:
if line != '':
group.append(line)
else:
list_of_lists.append(group)
group = []
list_of_dicts = []
print "done cleaning spaces...making a dict for each group"
print "Also splitting each entry by ':' and '/'"
for group in list_of_lists:
try:
# Create a new dict for each group.
group_dict = {}
for line in group:
#Split my ':' then by '/'
longkey, value = line.split(': ', 1)
# get second half
shortkey = longkey.split('/')[1]
group_dict[shortkey] = value
list_of_dicts.append(group_dict)
#print list_of_dicts
except ValueError:
#There could be inconsistent data
pass
print "Finished! Setting the header for the CSV"
writer = csv.DictWriter(open('parsed.csv', 'w'),
['productID','userID', 'profileName', 'helpfulness', 'review', 'time', 'summary', 'text'],
delimiter=',',
extrasaction='ignore')
writer.writeheader()
for review in list_of_dicts:
writer.writerow(review)
This is what I get (sample) as you can see - data is also being duplicated:
productID,userID,profileName,helpfulness,review,time,summary,text
,,dll pa,0/0,,1182627213,Not as Advertised,"Product arrived labeled as
Jumbo Salted Peanuts...the peanuts were actually small sized unsalted.
Not sure if this was an error or if the vendor intended to represent
the product as ""Jumbo""." ,,dll pa,0/0,,1182627213,Not as
Advertised,"Product arrived labeled as Jumbo Salted Peanuts...the
peanuts were actually small sized unsalted. Not sure if this was an
error or if the vendor intended to represent the product as
""Jumbo""."