Using resample to align multiple timeseries in pandas
-
26-07-2021 - |
Pregunta
Here's the setup code:
import pandas
from datetime import datetime
a_values = [1728, 1635, 1733]
a_index = [datetime(2011, 10, 31), datetime(2012, 1, 31), datetime(2012, 4, 30)]
a = pandas.Series(data=a_values, index=a_index)
aa_values = [6419, 5989, 6006]
aa_index = [datetime(2011, 9, 30), datetime(2011, 12, 31), datetime(2012, 3, 31)]
aa = pandas.Series(data=aa_values, index=aa_index)
apol_values = [1100, 1179, 969]
apol_index = [datetime(2011, 8, 31), datetime(2011, 11, 30), datetime(2012, 2, 29)]
apol = pandas.Series(data=apol_values, index=apol_index)
Here's what the data looks like in a table (3rd value for APOL isn't shown):
The goal is to align the data to calendar quarter markers so the 3 data sets can be compared. Just glancing at the below dates, Mar 2012, Dec 2011, and Sep 2011 seem like reasonable markers for alignment.
Here's the output with fill_method='ffill':
In [6]: a.resample('Q', fill_method='ffill')
Out[6]:
2011-12-31 1728
2012-03-31 1635
2012-06-30 1733
Freq: Q-DEC
In [7]: aa.resample('Q', fill_method='ffill')
Out[7]:
2011-09-30 6419
2011-12-31 5989
2012-03-31 6006
Freq: Q-DEC
In [8]: apol.resample('Q', fill_method='ffill')
Out[8]:
2011-09-30 1100
2011-12-31 1179
2012-03-31 969
Freq: Q-DEC
Which looks like this:
Notice how the most recent numbers in each series don't line up.
And here's the output with fill_method='bfill':
In [9]: a.resample('Q', fill_method='bfill')
Out[9]:
2011-12-31 1635
2012-03-31 1733
2012-06-30 NaN
Freq: Q-DEC
In [10]: aa.resample('Q', fill_method='bfill')
Out[10]:
2011-09-30 6419
2011-12-31 5989
2012-03-31 6006
Freq: Q-DEC
In [11]: apol.resample('Q', fill_method='bfill')
Out[11]:
2011-09-30 1179
2011-12-31 969
2012-03-31 NaN
Freq: Q-DEC
Which looks like this:
Again, the most recent numbers in the series don't line up.
Is this the expected output of resample()
in this scenario?
What can I do to get results where the most recent 3 numbers above are aligned and everything else follows appropriately?
EDIT: Here's what the desired output looks like:
Solución
df1 = DataFrame({'a':a})
df2 = DataFrame({'aa':aa})
df3 = DataFrame({'apol':apol})
df=df1.append([df2,df3]).sort_index()
print df.resample('Q-APR',loffset='-1m').T
Output:
2011-09-30 2011-12-31 2012-03-31
a 1728 1635 1733
aa 6419 5989 6006
apol 1100 1179 969