You can vectorize the whole thing with numpy. I have built this random dataset of (aprox.) 12,000 indices between 0 and 199,999,999, and an equally long list of random floats between 0 and 1:
indices = np.unique(np.random.randint(2e8,size=(12000,)))
values = np.random.rand(len(indices))
Then I construct an array of indices of total window size 2*win+1
around each of the indices
, and a corresponding array of how much is contributed to the moving average by that point:
win = 10
avg_idx = np.arange(-win, win+1) + indices[:, None]
avg_val = np.tile(values[:, None]/(2*win+1), (1, 2*win+1))
All that is left is figuring out repeated indices and adding contributions to the moving average together:
unique_idx, _ = np.unique(avg_idx, return_inverse=True)
mov_avg = np.bincount(_, weights=avg_val.ravel())
You can now get the list of indices at which, e.g. the moving average exceeds 0.5, as:
unique_idx[mov_avg > 0.5]
As for performance, first turn the above code into a function:
def sparse_mov_avg(idx, val, win):
avg_idx = np.arange(-win, win+1) + idx[:, None]
avg_val = np.tile(val[:, None]/(2*win+1), (1, 2*win+1))
unique_idx, _ = np.unique(avg_idx, return_inverse=True)
mov_avg = np.bincount(_, weights=avg_val.ravel())
return unique_idx, mov_avg
and here are some timings for several window sizes, for the test data described at the beginning:
In [2]: %timeit sparse_mov_avg(indices, values, 10)
10 loops, best of 3: 33.7 ms per loop
In [3]: %timeit sparse_mov_avg(indices, values, 100)
1 loops, best of 3: 378 ms per loop
In [4]: %timeit sparse_mov_avg(indices, values, 1000)
1 loops, best of 3: 4.33 s per loop