Say, we have a list aa = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
.
aa = sorted(aa)
aa
[84,86,87,89,90,997,999,1000,1002,2014]
To define differences among neighbors: ad = np.ediff1d(aa)
.
ad
array([2, 1, 2, 1, 907, 2, 1, 2, 1012])
To cut off indices of numbers beyond the range:
np.where(ad > rng)[0] + 1
, where rng = 7
is range within which the numbers are kept:
np.where(ad > 7)[0] + 1
array([5, 9])
To split the array by indices into required subarrays:
np.split(aa, np.where(ad > rng)[0] + 1)
. So, the function is:
def splitarr(aa,rng):
ad = np.ediff1d(aa)
return np.split(aa, np.where(ad > rng)[0] + 1)
splitarr(aa, 7)
[array([84, 86, 87, 89, 90]), array([ 997, 999, 1000, 1002]), array([2014])]
The length of allowable sequence could be set by the filter:
np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]
where splarr = np.split(aa, np.where(ad > rng)[0] + 1)
.
That returns indices of arrays of allowed length. map(len, splarr) returns lengths of all arrays inside list splarr
; np.fromiter()
converts map
to numpy
to enable >= lim
. lim
is the threshold for array length, only this length and above is OK. So, the final function is:
def splitarr(aa,rng,lim):
splarr = np.split(aa, np.where(np.ediff1d(aa) > rng)[0] + 1)
return [splarr[i] for i in np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]]
splitarr(aa, 7, 2)
[array([84, 86, 87, 89, 90]), array([ 997, 999, 1000, 1002])]
splitarr(aa,7,1)
[array([84, 86, 87, 89, 90]), array([ 997, 999, 1000, 1002]), array([2014])]
splitarr(aa, 1, 1)
[array([84]), array([86, 87]), array([89, 90]), array([997]), array([ 999, 1000]), array([1002]), array([2014])]
splitarr(aa,1,2)
[array([86, 87]), array([89, 90]), array([ 999, 1000])]
splitarr(aa,2,2)
[array([84, 86, 87, 89, 90]), array([ 997, 999, 1000, 1002])]
Instead of a comment under the 'best answer': it works only with the range 7, and is incorrect for any other value.