Calculating a cumulative deviation from mean monthly value in pandas series

https://stackoverflow.com/questions/20799732

21-09-2022
|

Question

How would I use pandas to calculate a cumulative deviation from a mean monthly rainfall value?

I am given daily rainfall data (e.g. s, below) which I can convert to a pd.Series and resample into monthly periods (sum; e.g. sm, below). But I then want to calculate the difference between each monthly value and the mean for the month. I have added a synthetic example:

rng = pd.period_range(20010101, 20131231, freq='D')
s = pd.Series(np.random.normal(2.5,2,size=len(rng)), index=rng)
sm = s.resample('M', how='sum')

For example, for January 2010 I would like to calculate the difference between the value for that month and the average monthly rainfall for January (over a long period). Then I want a cumulative sum of that difference.

I have tried to use the groupby function:

sm.groupby(lambda x: x.month).mean()

But not successfully. I want each monthly value in 'sm' to have the average for all similar months to be subtracted, then a cumulative sum of that series created. This could be in one step I guess.

How could I achieve this efficiently?
Thanks

Solution

This is closely related to an example in the docs. This is untested code, but you want something like this:

monthly_rainfall = daily_rainfall.resample('D', how=np.sum)

To group all Januarys over all the years together (and so on for each month):

grouped = monthly_rainfall.groupby(lambda x: x.month)

Then

deviation = grouped.transform(lambda x: x - x.mean())
deviation.cumsum()

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow