Question

I have a dataset that looks like this (1D python list):

[0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]

I'm trying to find cutoff points for variations, based on the previous window.

I'm looking for an output of:

[4, 9, 19, 23]

Asuming my window needs to be of at least 3, variation must occur at least for 3 consecutive elements and some noise in the data, I came up with :

  • Fill up window with at least 2 elements
  • Calculate standard deviation, add all subsequent points that are within stddev to that window. Recalculate every time you add a new point.
  • When a point is outside of stddev (for ex here, the first occurence of 4), make sure the next point is also outside of stddev (first occurence of 5), and if so, append a new index with the first deviant point (4 here). If not keep adding to current window.
  • The new 'deviant' values become the window to compare against, repeat.

Is there a better way to do this, or a built-in numpy function to help out?

thanks.

Edit

The proposed solution by @qwwqwwq works well, but I have a another small constraint - I realized that my list values don't have the same weight. Assuming this new dataset :

[(10, 0), (20, 0), (15, 0), (20, 0), (8, 4), (10, 5), (15, 6), (15, 6), (10, 4), (5, 0),(5, 0), (20, 0), (10, 0), (8, 0),(5, 0), (10, 2), (5, 0), (5, 0), (5,0), (10,6) ,(5, 4), (5,5), (10, 6), (10, 0),(10,0) ,(10,0) ,(10,0) ,(10,0)]
  • Where pos 0 is a time duration in seconds
  • pos 1 is my value
  • minimum time to consider the peak is 30 seconds

How could I replace widths = np.array([2] with my minimum time?

I'm aware I could take slope_down_begin_points , check the closest slope_down_begin_points and see if the sum of points' duration between the two is > minimum time. I'm not very familiar with signal, hopefully there's something better?

Edit 2

Another simpler and more naive way of doing this is also to group >0 values together and slice out [0] and [-1] values as the edges.

for k, g in groupby(x, key=lambda v: v[1] == 0):
    print k,g
    group = list(g)
    # only consider if long enough
    if sum([z[0] for z in group]) > some_minumum_time:
        # do stuff
Was it helpful?

Solution

The best approach I can think of for this problem is to fit a spline to the array, take the derivative, and then find all local maxima. These local maxima should represent the boundaries of peaks, which I think is what you are after. My approach:

from scipy import signal
from scipy import interpolate
import numpy as np
from numpy import linspace

x = [0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]
s = interpolate.UnivariateSpline( linspace(0,len(x)-1,len(x)), np.array(x) )
ds = s.derivative()

slope_down_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ -ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p-1] >= 1 ]

slope_up_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p+1] >= 1 ]

slope_up_begin_points + slope_down_begin_points
>> [4, 9, 16, 19, 23]

16 is included in this approach because it is a little micro-peak of its own, if you fiddle with the find_peaks_cwt/UnivariateSpline parameters you should be able to filter it out..

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top