d.xs(1)[0:3]
0 1
0 -0.716206 0.119265
1 -0.782315 0.097844
2 2.042751 -1.116453
pandas: slice a MultiIndex DataFrame by range of secondary index
-
29-11-2021 - |
Question
It has been posted that slicing on the second index can be done on a multi-indexed pandas Series:
import numpy as np
import pandas as pd
buckets = np.repeat(range(3), [3,5,7])
sequence = np.hstack(map(range,[3,5,7]))
s = pd.Series(np.random.randn(len(sequence)),
index=pd.MultiIndex.from_tuples(zip(buckets, sequence)))
print s
0 0 0.021362
1 0.917947
2 -0.956313
1 0 -0.242659
1 0.398657
2 0.455909
3 0.200061
4 -1.273537
2 0 0.747849
1 -0.012899
2 1.026659
3 -0.256648
4 0.799381
5 0.064147
6 0.491336
Then to get the first three rows for the first index=1, you simply say:
s[1].ix[range(3)]
0 -0.242659
1 0.398657
2 0.455909
This works fine for 1-dimensional Series, but not for DataFrames:
buckets = np.repeat(range(3), [3,5,7])
sequence = np.hstack(map(range,[3,5,7]))
d = pd.DataFrame(np.random.randn(len(sequence),2),
index=pd.MultiIndex.from_tuples(zip(buckets, sequence)))
print d
0 1
0 0 1.217659 0.312286
1 0.559782 0.686448
2 -0.143116 1.146196
1 0 -0.195582 0.298426
1 1.504944 -0.205834
2 0.018644 -0.979848
3 -0.387756 0.739513
4 0.719952 -0.996502
2 0 0.065863 0.481190
1 -1.309163 0.881319
2 0.545382 2.048734
3 0.506498 0.451335
4 0.872743 -0.070985
5 -1.160473 1.082550
6 0.331796 -0.366597
d[1].ix[range(3)]
0 0 0.312286
1 0.686448
2 1.146196
Name: 1
It gives you the "1th" column of data, and the first three rows, irrespective of the first index level. How can you get the first three rows for the first index=1 for a multi-indexed DataFrame?
Solution
OTHER TIPS
.loc is more efficient and is evaluated simultaneously
s.loc[pd.IndexSlice[1],:3] will return 0th level = 1 and [0:3] entry.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow