Pregunta

I'm experiencing a rather strange behavior of the resampling function of a pandas time-series (Python). I use the latest version of pandas (0.12.0)

Take the following time series:

dates = [datetime(2011, 1, 2, 1), datetime(2011, 1, 2, 2), datetime(2011, 1, 2, 3),
          datetime(2011, 1, 2, 4), datetime(2011, 1, 2, 5), datetime(2011, 1, 2, 6)]
ts = Series(np.arange(6.), index=dates)

Then try resampling to 66s and to 65s. This is the result I get:

In [45]: ts.resample('66min')
Out[45]:
2011-01-02 01:00:00    0.5
2011-01-02 02:06:00    2.0
2011-01-02 03:12:00    3.0
2011-01-02 04:18:00    4.0
2011-01-02 05:24:00    5.0
Freq: 66T, dtype: float64

In [46]: ts.resample('65min')
Out[46]:
2011-01-02 01:00:00     0
2011-01-02 02:05:00   NaN
2011-01-02 03:10:00   NaN
2011-01-02 04:15:00   NaN
2011-01-02 05:20:00   NaN
2011-01-02 06:25:00   NaN
Freq: 65T, dtype: float64

I do understand the behavior when resampling to 66s. It always takes the mean (default) of all the values in the respective interval. I do not understand and don't know how to influence the behavior for 65s.

This is a simplified problem. The background is a more complex data correction process, involving resampling.

Any ideas?

¿Fue útil?

Solución

Perhaps you want interpolate instead of resample. Here's one way:

In [53]: index = pd.date_range(freq='66T', start=ts.first_valid_index(), periods=5)

In [54]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index]
Out[54]: 
2011-01-02 01:00:00    0.0
2011-01-02 02:06:00    1.1
2011-01-02 03:12:00    2.2
2011-01-02 04:18:00    3.3
2011-01-02 05:24:00    4.4
Freq: 66T, dtype: float64

In [55]: index = pd.date_range(freq='65T', start=ts.first_valid_index(), periods=5)

In [56]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index]
Out[56]: 
2011-01-02 01:00:00    0.000000
2011-01-02 02:05:00    1.083333
2011-01-02 03:10:00    2.166667
2011-01-02 04:15:00    3.250000
2011-01-02 05:20:00    4.333333
Freq: 65T, dtype: float64

That said, it seems like resample could be improved. At first glance, the behavior you've demonstrated is mysterious and, I agree, unhelpful. Worth discussing.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top