Pregunta

In my field (gas markets) a season is a period spanning 2 quarters. April to September (both included) is what we call a summer and the rest of the year is a winter.

Using pandas, I am trying to resample daily data into seasons and depending on the start of the daily index, I seem to get different results. Basically if the start is in Q2 or Q4, resample works as expected, but not if index starts in Q1 or Q3. Note that the same does not happen with the end date as resample seems to behave correctly there.

Anyway here is a sample code :

import pandas as pd
import numpy as np


april_start_dates = pd.DatetimeIndex(freq = 'D', start = '2014-04-01', end = '2015-01-01')

good_case = pd.DataFrame(np.random.randn(april_start_dates.size), index = april_start_dates)

for d in good_case.resample('2QS-APR').index:
    print d.strftime('%d-%b-%Y')

'''
Correct output
01-Apr-2014
01-Oct-2014
'''

jan_start_dates = pd.DatetimeIndex(freq = 'D', start = '2014-01-01', end = '2015-01-01')

bad_case = pd.DataFrame(np.random.randn(jan_start_dates.size), index = jan_start_dates)

for d in bad_case.resample('2QS-APR').index:
    print d.strftime('%d-%b-%Y')

'''
Wrong output ?      Expected
01-Jan-2014         01-Oct-2013
01-Jul-2014         01-Apr-2014
01-Jan-2015         01-Oct-2014
'''

good_case has the correct dates, one in April the other one in October :

Correct output
01-Apr-2014
01-Oct-2014

But that's not the case for bad_case, where the dates don't fall in April or October as one would expect from the anchored offset '2QS-APR'. What I'd expect to see for bad_case is this (the first date is Oct13 as its the start of the seasonal period containing 01 Jan 2014):

Expected
01-Oct-2013
01-Apr-2014
01-Oct-2014

Note that the averaging is wrong too so shifting the labels using loffset doesn't seem like a good enough option.

Am I missing something? What can I do differently to get what I want?

Thanks.

¿Fue útil?

Solución

It looks like that might be a bug to me. I filed an issue.

What's going on is it thinks that January is on the offset. I don't think this should be true, if the n in the offset is supposed to work like you expect.

 [~/]
 [18]: from pandas.tseries.offsets import QuarterBegin

 [~/]
 [19]: ts = pd.Timestamp('2014-1-1')

 [~/]
 [20]: offset = QuarterBegin(2, startingMonth=4)

 [~/]
 [21]: offset.onOffset(ts)
 [21]: True

You can get your expected output by doing this, but it's a hack, and I wouldn't expect it to work in the future. I'm not sure n is working as it should (or we both misunderstand how it should work)

 bad_case.resample('2Q-APR').shift(-1, freq='2QS-APR')
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top