In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level

StackOverflow https://stackoverflow.com/questions/20413313

Question

I'm taking a Data Mining course at university right now, but I'm a wee bit stuck on a multi-index sorting problem.

The actual data involves about 1 million reviews of movies, and I'm trying to analyze that based on American zip codes, but to test out how to do what I want, I've been using a much smaller data set of 250 randomly generated ratings for 10 movies and instead of zip codes, I'm using age groups.

So this is what I have right now, it's a multiindexed DataFrame in Pandas with two levels, 'group' and 'title'

                        rating
group       title   
            Alien       4.000000
            Argo        2.166667
Adults      Ben-Hur     3.666667
            Gandhi      3.200000
            ...         ...

            Alien       3.000000
            Argo        3.750000
Coeds       Ben-Hur     3.000000
            Gandhi      2.833333
            ...         ...

            Alien       2.500000
            Argo        2.750000
Kids        Ben-Hur     3.000000
            Gandhi      3.200000
            ...         ...

What I'm aiming for is to sort the titles based on their rating within the group (and only show the most popular 5 or so titles within each group)

So something like this (but I'm only going to show two titles in each group):

                        rating
group       title   
            Alien       4.000000
Adults      Ben-Hur     3.666667

            Argo        3.750000
Coeds       Alien       3.000000

            Gandhi      3.200000
Kids        Ben-Hur     3.000000

Anyone know how to do this? I've tried sort_order, sort_index, etc and swapping the levels, but they mix up the groups too. So it then looks like:

                          rating
group         title 
Adults        Alien      4.000000
Coeds         Argo       3.750000
Adults        Ben-Hur    3.666667
Kids          Gandhi     3.666667
Coeds         Alien      3.000000
Kids          Ben-Hur    3.000000

I'm kind of looking for something like this: Multi-Index Sorting in Pandas, but instead of sorting based on another level, I want to sort based on the values. Kind of like if that person wanted to sort based on his sales column.

Thanks!

Was it helpful?

Solution

You're looking for sort:

In [11]: s = pd.Series([3, 1, 2], [[1, 1, 2], [1, 3, 1]])

In [12]: s.sort()

In [13]: s
Out[13]: 
1  3    1
2  1    2
1  1    3
dtype: int64

Note; this works inplace (i.e. modifies s), to return a copy use order:

In [14]: s.order()
Out[14]: 
1  3    1
2  1    2
1  1    3
dtype: int64

Update: I realised what you were actually asking, and I think this ought to be an option in sortlevels, but for now I think you have to reset_index, groupby and apply:

In [21]: s.reset_index(name='s').groupby('level_0').apply(lambda s: s.sort('s')).set_index(['level_0', 'level_1'])['s']
Out[21]: 
level_0  level_1
1        3          1
         1          3
2        1          2
Name: 0, dtype: int64

Note: you can set the level names to [None, None] afterwards.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top