pandas count values for last 7 days from each date
Question
There are two Dataframes. First is like this:
print df1
id date month is_buy
0 17 2015-01-16 2015-01 1
1 17 2015-01-26 2015-01 1
2 17 2015-01-27 2015-01 1
3 17 2015-02-11 2015-02 1
4 17 2015-03-14 2015-03 1
5 18 2015-01-28 2015-01 1
6 18 2015-02-12 2015-02 1
7 18 2015-02-25 2015-02 1
8 18 2015-03-04 2015-03 1
In second data frame there are some aggregated data by month from the first one:
df2 = df1[df1['is_buy'] == 1].groupby(['id', 'month']).agg({'is_buy': np.sum})
print df2
id month buys
0 17 2015-01 3
1 17 2015-02 1
2 17 2015-03 1
3 18 2015-01 1
4 18 2015-02 2
5 18 2015-03 1
I'm trying to get new df2 column named 'last_week_buys' with aggregated buys by last 7 days from first day of each df1['month']. In other words, I want to get this:
id month buys last_week_buys
0 17 2015-01 3 NaN
1 17 2015-02 1 2
2 17 2015-03 1 0
3 18 2015-01 1 NaN
4 18 2015-02 2 1
5 18 2015-03 1 1
Are there any ideas to get this column?
Solution
The main obstacle is figuring out whether a date is within the last 7 days of the month. I'd recommend something hacky like the following:
from datetime import datetime, date, timedelta
def last7(datestr):
orig = datetime.strptime(datestr,'%Y-%m-%d')
plus7 = orig+timedelta(7)
return plus7.month != orig.month
Once you have that, it's relatively simple to adapt your previous code:
df3 = df1[df1['is_buy'] == 1 && last7(df1['date'])].groupby(['id', 'month']).agg({'is_buy': np.sum})
Now we just join together df2
and df3
and we're done.
OTHER TIPS
You can also do something like this:
patterns = df [['Total','Date']]
patterns = purchase_patterns.set_index("Date")
resample = patterns.resample ('D' , how = sum)
#to extract the last items of the list
last_7 = resample[-7:]
# and to get the total
last_7 = resample[-7:].sum()
A reference for data slicing is here: http://chris.friedline.net/2015-12-15-rutgers/lessons/python2/02-index-slice-subset.html