This will give you an array with the length of all consecutive chunks:
np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],)))
Some tests:
>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)
>>> a = [0, 1, 2, 4, 5, 6]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
([len(a)-1],)))
array([3, 3], dtype=int64)
To check if any is at least 4 items long, sinply wrap the above code in np.any(... >= 4)
.
To see how this works, lets work out the results from the inside out for my first example:
>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]
First, we figure out the deltas between consecutive items:
>>> np.diff(a)
array([1, 1, 1, 1, 1, 3, 1, 1, 3, 3, 1, 1, 1, 1])
Then, we identify positions where the delta is not 1
, that is, positions where a chunk of consecutive items begins or ends:
>>> np.diff(a) != 1
array([False, False, False, False, False, True, False, False, True,
True, False, False, False, False], dtype=bool)
And we extract the positions of the True
s:
>>> np.nonzero(np.diff(a) != 1)
(array([5, 8, 9], dtype=int64),)
The above index marks last items in a consecutive streak. Python slices are defined as start
to last+1
, so we could increment that array by one, add a zero at the beginning and the length of the array at the end and have all starting and ending indices of consecutive sequences, i.e.:
>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))
array([ 0, 6, 9, 10, 15], dtype=int64)
Taking the differences from consecutive indices will give us the desired length of each consecutive chunk. Because all we care for is the differences, rather than adding one to the indices, in my original answer I chose to to prepend -1
and append len(a)-1
:
>>> np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],))
array([-1, 5, 8, 9, 14], dtype=int64)
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)
Say that in this array you identify that you want the indices of the chunk of 5
items, the one in position 3
of this array. To recover the start and stop indices of that chunk you can simply do:
>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))[3:3+2]
array([10, 15], dtype=int64)
>>> a[10:15]
[17, 18, 19, 20, 21]