Question

I'm trying to remove all "old" values from a pandas TimeSeries, e.g. all values which are more than 1 day old (relative to the newest value).

Naively, I tried something like this:

from datetime import timedelta
def trim(series):
    return series[series.index.max() - series.index < timedelta(days=1)]

Gives an error:

TypeError: ufunc 'subtract' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

Clearly, the problem is with this expression: series.index.max() - series.index

I then found this works:

def trim(series):
    return series[series.index > series.index.max() - timedelta(days=1)]

Can somebody please explain why the latter works while the former raises an error?

EDIT: I am using pandas version 0.12.0

Was it helpful?

Solution

Here's an example in 0.13 (to_timedelta is not avaiable in 0.12, so you would have to do np.timedelta64(4,'D'))

In [12]: rng = pd.date_range('1/1/2011', periods=10, freq='D')

In [13]: ts = pd.Series(randn(len(rng)), index=rng)

In [14]: ts
Out[14]: 
2011-01-01   -0.348362
2011-01-02    1.782487
2011-01-03    1.146537
2011-01-04   -0.176308
2011-01-05   -0.185240
2011-01-06    1.767135
2011-01-07    0.615911
2011-01-08    2.459799
2011-01-09    0.718081
2011-01-10   -0.520741
Freq: D, dtype: float64

In [15]: x = ts.index.to_series().max()-ts.index.to_series()

In [16]: x
Out[16]: 
2011-01-01   9 days
2011-01-02   8 days
2011-01-03   7 days
2011-01-04   6 days
2011-01-05   5 days
2011-01-06   4 days
2011-01-07   3 days
2011-01-08   2 days
2011-01-09   1 days
2011-01-10   0 days
Freq: D, dtype: timedelta64[ns]

In [17]: x[x>pd.to_timedelta('4 days')]
Out[17]: 
2011-01-01   9 days
2011-01-02   8 days
2011-01-03   7 days
2011-01-04   6 days
2011-01-05   5 days
Freq: D, dtype: timedelta64[ns]

OTHER TIPS

You could use Truncating and Fancy Indexing as follows:

ts.truncate(before='Some Date')

Example:

rng = pd.date_range('1/1/2011', periods=72, freq='D')
ts = pd.Series(randn(len(rng)), index=rng)

ts.truncate(before=(ts.index.max() - dt.timedelta(days=1)).strftime('%m-%d-%Y'))

This should truncate everything before the old date. You can also add a after argument to whittle it down further if you desire.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top