Question

I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers.

The following works in finding and removing all zeros:

df = events[event_no][events[event_no] != 0]

I have the following hierarchical series:

   1    2/09/2010   0
        3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
        6/09/2010   0
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5
        5/05/2007   0

and want:

   1    3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5

I have read Deleting DataFrame row in Pandas based on column value and Filter columns of only zeros from a Pandas data frame but have been unsuccessful in solving this problem.

Was it helpful?

Solution

How is your dataframe looks like. Anyway, shouldn't make any difference, simple Boolean indexing should do it:

In [101]:print df

Out [101]:
                   c1
first second         
1     2/09/2010   0.0
      3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
      6/09/2010   0.0
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5
      5/05/2007   0.0


In [102]:

is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge], 
              argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]



Out[102]:
                   c1
first second         
1     3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5

If you have a series instead of a dataframe, say the series is s, you can either:

Convert it to a dataframe:

df=pd.DataFrame(s, columns=['c1'])

Or:

In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge], 
              argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first  second   
1      3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
dtype: float64

BTW, I generate the series by:

In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first  second   
1      2/09/2010     0.0
       3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
       6/09/2010     0.0
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
       5/05/2007     0.0
dtype: float64

Do I have the same structure right?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top