Question

After calling dropna on a multi index dataframe, the levels metadata in the index does not appear to be updated. Is this a bug?

In [1]: import pandas

In [2]: print pandas.__version__
0.10.1

In [3]: df_multi = pandas.DataFrame(index=[[1, 2],['a', 'b',]], 
                                    data=[[float('nan'), 5], [6, 7]])

In [4]: print df_multi
      0  1
1 a NaN  5
2 b   6  7

In [5]: df_multi = df_multi.dropna(axis=0, how='any')

In [6]: print df_multi
     0  1
2 b  6  7

In [7]: print df_multi.index
MultiIndex
[(2, b)]

In [8]: print df_multi.index.levels
[Int64Index([1, 2], dtype=int64), Index([a, b], dtype=object)]

Note above that the MultiIndex only has (2, b), but it reports 1 and 'a' are in the index.levels.

The workaround I have is to reindex with a "clean" Multi-Index as follows:

In [10]: c_clean = pandas.MultiIndex.from_tuples(df_multi.index)

In [11]: df_multi = df_multi.reindex(c_clean)

In [12]: print df_multi
     0  1
2 b  6  7

In [13]: print df_multi.index.levels
[Int64Index([2], dtype=int64), Index([b], dtype=object)]

Edit:

This problem also occurs during a slicing with .ix, and probably with other indexing operations as well.

Was it helpful?

Solution

This is a known situtation archived here https://github.com/pydata/pandas/issues/2655

People are currently contemplating how to deal with it.

My work-around is to use index.get_level_values(level), because a dropna(how='all') might only remove some of an axis but not all, but I might need all remaining values of one of the levels of a multi-index.

For some reason the return of index.get_level_values(level) is correct, while index.levels has not been updated (maybe too costly for speed reasons?).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top