Question

I have a dictionary of dictionaries, and I'm trying to output the information within them in a certain way so that it will be usable for downstream analysis. Note: All the keys in dict are in also in list.

for item in list:
    for key, value in dict[item].items():
        print item, key, value

This is the closest I've gotten to what I want, but it's still a long way off. Ideally what I want is:

     item1  item2  item3  item4
key1 value  value  value  value
key2 value  value  value  value
key2 value  value  value  value

Is this even possible?

Was it helpful?

Solution

First, if I understand your structure, the list is just a way of ordering the keys for the outer dictionary, and a lot of your complexity is trying to use these two together to simulate an ordered dictionary. If so, there's a much easier way to do that: use collections.OrderedDict. I'll come back to that at the end.


First, you need to get all of the keys of your sub-dictionaries, because those are the rows of your output.

From comments, it sounds like all of the sub-dictionaries in dct have the same keys, so you can just pull the keys out of any arbitrary one of them:

keys = dct.values()[0].keys()

If each sub-dictionary can have a different subset of keys, you'll need to instead do a first pass over dct to get all the keys:

keys = reduce(set.union, map(set, dct.values()))

Some people find reduce hard to understand, even when you're really just using it as "sum with a different operator". For them, here's how to do the same thing explicitly:

keys = set()
for subdct in dct.values():
    keys |= set(subdct)

Now, for each key's row, we need to get a column for each sub-dictionary (that is, each value in the outer dictionary), in the order specified by using the elements of the list as keys into the outer dictionary.

So, for each column item, we want to get the outer-dictionary value corresponding to the key in item, and then in the resulting sub-dictionary, get the value corresponding to the row's key. That's hard to say in English, but in Python, it's just:

dct[item][key]

If you don't actually have all the same keys in all of the sub-dictionaries, it's only slightly more complicated:

dct[item].get(key, '')

So, if you didn't want any headers, it would look like this:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow(dct[item].get(key, '') for item in lst)

To add a header column, just prepend the header (in this case, key) to each of those rows:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow([key], [dct[item].get(key, '') for item in lst])

Notice that I turned the genexp into a list comprehension so I could use list concatenation to prepend the key. It's conceptually cleaner to leave it as an iterator, and prepend with itertools.chain, but in trivial cases like this with tiny iterables, I think that's just making the code harder to read:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow(chain([key], (dct[item].get(key, '') for item in lst)))

You also want a header row. That's even easier; it's just the items in the list, with a blank column prepended for the header column:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    w.writerow([''] + lst)
    for key in keys:
        w.writerow([key] + [dct[item].get(key, '') for item in lst])

However, there are two ways to make things even simpler.

First, you can use an OrderedDict, so you don't need the separate key list. If you're stuck with the separate list and dict, you can still build an OrderedDict on the fly to make your code easier to read. For example:

od = collections.OrderedDict((item, dct[item]) for item in lst)

And now:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    w.writerow([''] + od.keys())
    for key in keys:
        w.writerow([key] + [subdct.get(key, '') for subdct in od.values()])

Second, you could just build the transposed structure:

transposed = {key_b: {key_a: dct[key_a].get(key_b, '') for key_a in dct} 
              for key_b in keys}

And then iterate over it in the obvious order (or use a DictWriter to handle the ordering of the columns for you, and use its writerows method to deal with the rows, so the whole thing becomes a one-liner).

OTHER TIPS

To store objects in Python so that you can re-use them later, you can you use the shelve module. This a module that lets you write objects to a shelf file and re-open it and retrieve the objects later, but it's operating system-dependent, so it won't work if say you made it on a Mac and later you want to open it on a Windows machine.

import shelve

shelf = shelve.open("filename", flag='c') 
#with flag='c', you have to delete the old shelf if you want to overwrite it

dict1 = #something
dict2 = #something

shelf['key1'] = dict1
shelf['key2'] = dict2

shelf.close()

To read objects from a shelf:

shelf_reader = shelve.open("filename", flag='r')
for k in shelf_reader.keys():
    retrieved = shelf_reader[k]
    print(retrieved) #prints the retrieved dictionary

shelf_reader.close()

It may be a matter of opinion, but I think one of the best (and by far easieset) ways to serialize a (nested) dictionnary is using the JSON format:

{ "key1" : { "subkey1" : "value1",
             "subkey2" : "value2" },
  "key2" : {"subkey3" : "value3"} }

The best is that this can be done (either for encoding your values or decoding them) in a single line using the builtin json module !

Let's consider your dictionnary is the dico variable:

import json
save_file = open('save_file', 'w')
save_file.write( json.dumps(dico) )

Et voilà :-) !

If the data is guaranteed to be loaded back into Python, I'd suggest simply using pickle instead of worrying about the format. If it's going to be loaded into another standard language, then consider using json instead - there are libraries for most languages to parse JSON format data.

That said if you really need to invent your own format, you could do something like this to store all keys from all sub-dictionaries in CSV format:

import csv
dict_keys = sorted(dict.keys())
with open("output.csv", "wb") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["Key"] + dict_keys)
    all_keys = reduce(set.union, (set(d) for d in dict.values()))
    for key in sorted(all_keys):
        writer.writerow([key] + [dict[k].get(key, "") for k in dict_keys])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top