Question

I have a pandas DataFrame with a 2-level MultiIndex. Both levels of the MultiIndex are identical date ranges, spaced daily. I want to resample the DataFrame on a weekly basis, for both levels of the MultiIndex, but I'm having trouble. Please see below.

For the sake of example, let's make each index go back 2 weeks:

d0 = date.today() - timedelta(days=14)
dates = pd.date_range(d0, date.today())
date_index = pd.MultiIndex.from_product([dates, dates], names=['cohort_date', 'event_date'])
df = pd.DataFrame(np.random.randint(0, 100, 225), index=date_index)

If I resample df directly, I get the following TypeError:

df.resample('W', how='sum')
[...]
TypeError: Only valid with DatetimeIndex or PeriodIndex

Fair enough, I unstack and resample on the first level, which gives half my answer:

df2 = df.unstack().resample('W', how='sum').T
print df2

cohort_date   2014-07-20  2014-07-27  2014-08-03
  event_date                                    
0 2014-07-16         177         424         115
  2014-07-17         408         392         197
  2014-07-18         174         435         222
  2014-07-19         180         392         141
  2014-07-20         304         252         155
  2014-07-21         242         236         228
  2014-07-22         139         159          77
  2014-07-23         117         293          68
  2014-07-24         308         353         246
  2014-07-25         254         471         160
  2014-07-26         258         240         144
  2014-07-27         297         360         148
  2014-07-28         284         303         202
  2014-07-29         218         399         144
  2014-07-30         227         286         160

Now, if I attempt to resample the second axis (also index by date, in theory), I get the same error:

df2.unstack().resample('W', how='sum')
[...]
TypeError: Only valid with DatetimeIndex or PeriodIndex

I'm at a loss right now and I would appreciate any help in resampling by week on each dimension.

Was it helpful?

Solution

This requires 0.14.1 (it might work in 0.14.0 as well)

Note I think their is a slight issue as this should work by specifying the level (rather than resetting and using it as a column).

Docs are here

In [22]: df.reset_index().groupby([pd.Grouper(key='cohort_date',freq='W'),pd.Grouper(key='event_date',freq='W')]).sum()
Out[22]: 
                           0
cohort_date event_date      
2014-07-20  2014-07-20  1292
            2014-07-27  1665
            2014-08-03   764
2014-07-27  2014-07-20  1521
            2014-07-27  2317
            2014-08-03  1071
2014-08-03  2014-07-20   871
            2014-07-27  1006
            2014-08-03   306
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top