Вопрос

I have this dataframe:

startTime     endTime  emails_received
index                                             
2014-01-24 14:00:00  1390568400  1390569600    684
2014-01-24 14:00:00  1390568400  1390569300    700
2014-01-24 14:05:00  1390568700  1390569300    438
2014-01-24 14:05:00  1390568700  1390569900    586
2014-01-24 16:00:00  1390575600  1390576500    752
2014-01-24 16:00:00  1390575600  1390576500    743
2014-01-24 16:00:00  1390575600  1390576500    672
2014-01-24 16:00:00  1390575600  1390576200    712
2014-01-24 16:00:00  1390575600  1390576800    708

I run resample("10min",how="median").dropna() and I get:

                  startTime     endTime  emails_received
start                                             
2014-01-24 14:00:00  1390568550  1390569450    635
2014-01-24 16:00:00  1390575600  1390576500    712

which is correct. Is there any way I can also get the standard deviation from the mean easily via pandas?

Это было полезно?

Решение

You just need to call .std() on your DataFrame. Here is an illustrative example.

Creating a DatetimeIndex

In [38]: index = pd.DatetimeIndex(start='2000-1-1',freq='1T', periods=1000)

Creating a DataFrame with 2 columns

In [45]: df = pd.DataFrame({'a':range(1000), 'b':range(1000,3000,2)}, index=index)

Head, Std and Mean of the DataFrame

In [47]: df.head()
Out[47]: 
                     a     b
2000-01-01 00:00:00  0  1000
2000-01-01 00:01:00  1  1002
2000-01-01 00:02:00  2  1004
2000-01-01 00:03:00  3  1006
2000-01-01 00:04:00  4  1008

In [48]: df.std()
Out[48]: 
a    288.819436
b    577.638872
dtype: float64

In [49]: df.mean()
Out[49]: 
a     499.5
b    1999.0
dtype: float64

Downsample and perform the calculate the same statistical scores

In [54]: df = df.resample(rule="10T",how="median")

In [55]: df
Out[55]: 

DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(1), int64(1)

In [56]: df.head()
Out[56]: 
                        a     b
2000-01-01 00:00:00   4.5  1009
2000-01-01 00:10:00  14.5  1029
2000-01-01 00:20:00  24.5  1049
2000-01-01 00:30:00  34.5  1069
2000-01-01 00:40:00  44.5  1089

In [57]: df.std()
Out[57]: 
a    290.11492
b    580.22984
dtype: float64

In [58]: df.mean()
Out[58]: 
a     499.5
b    1999.0
dtype: float64

Downsampling by std()

In [62]: df2 = df.resample(rule="10T", how=np.std)

In [63]: df2
Out[63]: 

DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(2)

In [64]: df2.head()
Out[64]: 
                           a         b
2000-01-01 00:00:00  3.02765  6.055301
2000-01-01 00:10:00  3.02765  6.055301
2000-01-01 00:20:00  3.02765  6.055301
2000-01-01 00:30:00  3.02765  6.055301
2000-01-01 00:40:00  3.02765  6.055301

Following is the information from the docstring for the .std() method.

Return standard deviation over requested axis.
NA/null values are excluded

Parameters
----------
axis : {0, 1}
    0 for row-wise, 1 for column-wise
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA
level : int, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a DataFrame

Returns
-------
std : Series (or DataFrame if level specified)

        Normalized by N-1 (unbiased estimator).
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top