Question

If I use groupby function, e.g. Data.groupby(['id','company']).size(), it will give a result like:

id   company 
1    a        2
     b        3
     c        6
2    d        1
     e        5

but how can I extract the numbers [2,1] (the first elements of each zeroth-index-level-group, according to the sorted order of the first-index-level-group)?

Was it helpful?

Solution

First, let:

agg_df = Data.groupby(['id','company']).size()

Assuming you want the result from the first entry for each group of elements having the same value for the zeroth level of the MultiIndex, and that each group is sorted by the first index level as you prefer. (After the updated comment, this appears to be the desired output)

unique_zeroth_level = dict(agg_df.index.values).keys()
group_first_vals = [
    agg_df.select(lambda x: x[0]==idx_val, axis=0).head(1).values[0] 
    for idx_val in unique_zeroth_level]

Assuming you're asking for the unique elements of the zeroth level of the resulting MultiIndex

In this particular case, since the returned result is a Series, you can make use of a trick using unstack:

agg_df.unstack(level=0).columns.values

or use a dict constructor

dict(agg_df.index.values).keys()

Assuming you want the result for (1, 'a') and (2, 'd') in particular, and that you want to access them by the index values (not just as a consequence of those being the lexicographically first entries in their respective groups)

agg_df.ix[[(1, 'a'), (2, 'd')]]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top