Question

Consider a list of dicts:

items = [
    {'a': 1, 'b': 9, 'c': 8},
    {'a': 1, 'b': 5, 'c': 4},
    {'a': 2, 'b': 3, 'c': 1},
    {'a': 2, 'b': 7, 'c': 9},
    {'a': 3, 'b': 8, 'c': 2}
]

Is there a pythonic way to extract and group these items by their a field, such that:

result = {
    1 : [{'b': 9, 'c': 8}, {'b': 5, 'c': 4}]
    2 : [{'b': 3, 'c': 1}, {'b': 7, 'c': 9}]
    3 : [{'b': 8, 'c': 2}]
}

References to any similar Pythonic constructs are appreciated.

Was it helpful?

Solution

Use itertools.groupby:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> {k: list(g) for k, g in groupby(items, itemgetter('a'))}
{1: [{'a': 1, 'c': 8, 'b': 9},
     {'a': 1, 'c': 4, 'b': 5}],
 2: [{'a': 2, 'c': 1, 'b': 3},
     {'a': 2, 'c': 9, 'b': 7}],
 3: [{'a': 3, 'c': 2, 'b': 8}]}

If item are not in sorted order then you can either sort them and then use groupby or you can use collections.OrderedDict(if order matters) or collections.defaultdict to do it in O(N) time:

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for item in items:
...     d.setdefault(item['a'], []).append(item)
...     
>>> dict(d.items())
{1: [{'a': 1, 'c': 8, 'b': 9},
     {'a': 1, 'c': 4, 'b': 5}],
 2: [{'a': 2, 'c': 1, 'b': 3},
     {'a': 2, 'c': 9, 'b': 7}],
 3: [{'a': 3, 'c': 2, 'b': 8}]}

Update:

I see that you only want the those keys to be returned that we didn't use for grouping, for that you'll need to do something like this:

>>> group_keys = {'a'}
>>> {k:[{k:d[k] for k in d.viewkeys() - group_keys} for d in g]
                                   for k, g in groupby(items, itemgetter(*group_keys))}
{1: [{'c': 8, 'b': 9},
     {'c': 4, 'b': 5}],
 2: [{'c': 1, 'b': 3},
     {'c': 9, 'b': 7}],
 3: [{'c': 2, 'b': 8}]}

OTHER TIPS

Note: This code assumes the the data is already sorted. If it is not, we have to sort it manually

from itertools import groupby
print {key:list(grp) for key, grp in groupby(items, key=lambda x:x["a"])}

Output

{1: [{'a': 1, 'b': 9, 'c': 8}, {'a': 1, 'b': 5, 'c': 4}],
 2: [{'a': 2, 'b': 3, 'c': 1}, {'a': 2, 'b': 7, 'c': 9}],
 3: [{'a': 3, 'b': 8, 'c': 2}]}

To get the result in the same format you asked for,

from itertools import groupby
from operator import itemgetter
a_getter, getter, keys = itemgetter("a"), itemgetter("b", "c"), ("b", "c")

def recon_dicts(items):
    return dict(zip(keys, getter(items)))

{key: map(recon_dicts, grp) for key, grp in groupby(items, key=a_getter)}

Output

{1: [{'c': 8, 'b': 9}, {'c': 4, 'b': 5}],
 2: [{'c': 1, 'b': 3}, {'c': 9, 'b': 7}],
 3: [{'c': 2, 'b': 8}]}

If the data is not sorted already, you can either use the defaultdict method in this answer, or you can use sorted function to sort based on a, like this

{key: map(recon_dicts, grp)
   for key, grp in groupby(sorted(items, key=a_getter), key=a_getter)}

References:

  1. operator.itemgetter

  2. itertools.groupby

  3. zip, map, dict, sorted

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top