Pandas groupby count returns wrong count

https://stackoverflow.com/questions/23433115

14-07-2023
|

質問

I'm trying to plot a roll up of incidents in each month from a simple file in the following format.

4/7/13  1
4/15/13 1
4/16/13 1
4/17/13 1
4/20/13 1
5/2/13  1
5/3/13  1
5/3/13  1
5/6/13  1
5/9/13  1
5/12/13 1
5/16/13 1
5/16/13 1
5/16/13 1
5/26/13 1
5/29/13 1
6/5/13  1
6/7/13  1
6/14/13 1
6/24/13 1
6/25/13 1
6/26/13 1
6/26/13 1
6/28/13 1
6/30/13 1

So, i'd like a roll up like

4/30/13     5
5/31/13     11
6/30/13     8

I tried the following code with:

import pandas as pd
import datetime
import numpy as np

grouper = pd.TimeGrouper('1M')
# set index of dataframe to date
a1 = df.set_index('Date')
# create a series object with just the column i want to rollup.
seriesO = a1['Outlier ']
grouped1 = seriesO.groupby(grouper).aggregate(np.size)
grouped1

The result is:

2013-04-30     0
2013-05-31    48
2013-06-30     9

Any ideas??

解決

This is not recommended to do in <= 0.13.1 (but works properly in master/0.14). as it requires making sure that things are sorted (and is not documented anywhere).

In [13]: s.groupby(pd.TimeGrouper('1M')).agg(np.size)
Out[13]: 
0
2013-04-30     5
2013-05-31    11
2013-06-30     9
Freq: M, dtype: int64

Preferred method is the following (will work in any version)

In [14]: s.resample('1M',how='count')
Out[14]: 
0
2013-04-30     5
2013-05-31    11
2013-06-30     9
Freq: M, dtype: int64

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow