質問

I'm having an issue with using DictWriter to write dicts to a csv. I specify the headers and insert the data into the csv however certain columns are not being populated. Specifically, productID, userID, and helpfulness. The other issue is rows are being duplicated a couple of times before moving to the next entry.

I can confirm the missing data is in the dicts just by simply printing them but they are being lost (and other data duplicated) in the write.

my code is below and am using a dataset from here: http://snap.stanford.edu/data/web-FineFoods.html

import csv
list_of_dicts = []
dict_of_data = {}

filename = open('file.txt')
lines = filename.readlines()

cleanlines = [ line.strip() for line in lines ]

list_of_lists = []
group = []


print "cleaning the spaces"
for line in cleanlines:
    if line != '':       
        group.append(line)
    else:     
        list_of_lists.append(group)
        group = []

list_of_dicts = []

print "done cleaning spaces...making a dict for each group"
print "Also splitting each entry by ':' and '/'"
for group in list_of_lists:
    try:
        # Create a new dict for each group.
        group_dict = {}
        for line in group:
            #Split my ':' then by '/'
            longkey, value = line.split(': ', 1)
            # get second half
            shortkey = longkey.split('/')[1]
            group_dict[shortkey] = value
            list_of_dicts.append(group_dict)
           #print list_of_dicts
    except ValueError:
        #There could be inconsistent data
        pass
print "Finished! Setting the header for the CSV"
writer = csv.DictWriter(open('parsed.csv', 'w'),
                        ['productID','userID', 'profileName', 'helpfulness', 'review', 'time', 'summary', 'text'],
                        delimiter=',',
                        extrasaction='ignore')

writer.writeheader()
for review in list_of_dicts:
    writer.writerow(review)

This is what I get (sample) as you can see - data is also being duplicated:

productID,userID,profileName,helpfulness,review,time,summary,text ,,dll pa,0/0,,1182627213,Not as Advertised,"Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as ""Jumbo""." ,,dll pa,0/0,,1182627213,Not as Advertised,"Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as ""Jumbo""."

役に立ちましたか?

解決

The duplicated lines in the CSV are due to a indentation error:

for group in list_of_lists:
    group_dict = {}
    for line in group:
        ...
        group_dict[shortkey] = value            
        list_of_dicts.append(group_dict)   #1

should be

for group in list_of_lists:
    group_dict = {}
    for line in group:
        ...
        group_dict[shortkey] = value            
    list_of_dicts.append(group_dict)  #2

  1. Inserts an item in list_of_dicts once for each line in group.
  2. Inserts an item in list_of_dicts once for each group in lists_of_lists.
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top