Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv
module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv
) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by
function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.