finding max number of consecutive elements using vectorization

Question 1

This will give you an array with the length of all consecutive chunks:

np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],)))

Some tests:

>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)

>>> a = [0, 1, 2, 4, 5, 6]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([3, 3], dtype=int64)

To check if any is at least 4 items long, sinply wrap the above code in np.any(... >= 4).

To see how this works, lets work out the results from the inside out for my first example:

>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]

First, we figure out the deltas between consecutive items:

>>> np.diff(a)
array([1, 1, 1, 1, 1, 3, 1, 1, 3, 3, 1, 1, 1, 1])

Then, we identify positions where the delta is not 1, that is, positions where a chunk of consecutive items begins or ends:

>>> np.diff(a) != 1
array([False, False, False, False, False,  True, False, False,  True,
        True, False, False, False, False], dtype=bool)

And we extract the positions of the Trues:

>>> np.nonzero(np.diff(a) != 1)
(array([5, 8, 9], dtype=int64),)

The above index marks last items in a consecutive streak. Python slices are defined as start to last+1, so we could increment that array by one, add a zero at the beginning and the length of the array at the end and have all starting and ending indices of consecutive sequences, i.e.:

>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))
array([ 0,  6,  9, 10, 15], dtype=int64)

Taking the differences from consecutive indices will give us the desired length of each consecutive chunk. Because all we care for is the differences, rather than adding one to the indices, in my original answer I chose to to prepend -1 and append len(a)-1:

>>> np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],))
array([-1,  5,  8,  9, 14], dtype=int64)
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)

Say that in this array you identify that you want the indices of the chunk of 5 items, the one in position 3 of this array. To recover the start and stop indices of that chunk you can simply do:

>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))[3:3+2]
array([10, 15], dtype=int64)
>>> a[10:15]
[17, 18, 19, 20, 21]

Question 2

How about the following recursive solution? (only works for 1dim arrays, of course)

I find it sickingly elegant. I'm not saying you should use it, but I had a good time coming up with it.

import numpy as np

def is_consecutive(arr, n):
    if n <= len(arr) <= 1:
        return True
    if len(arr) < n:
        return False
    diffs1idx = np.where(np.diff(arr) == 1)[0]
    return is_consecutive(diffs1idx, n-1)

print is_consecutive([1,2], 3)  # False
print is_consecutive([1,2,3], 3)  # True
print is_consecutive([5,1,2,3], 3)  # True
print is_consecutive([4,9,1,5,7], 3)  # False
print is_consecutive([4,9,1,2,3, 7, 9], 3)  # True
print is_consecutive(np.arange(100), 100)  # True
print is_consecutive(np.append([666], np.arange(100)), 100)  # True
print is_consecutive(np.append([666], np.arange(100)), 101)  # False

(Please don't ask me how it works... I don't understand recursion...)

Question 3

Yes. You started correctly:

from numpy import array, diff, where

numbers = array([0, 1, 3, 4, 5, 5])
differences = diff(numbers)

You're interested in consecutive numbers:

consecutives = differences == 1

And you want cases where two are in a row. You can compare the array with its offset:

(consecutives[1:] & consecutives[:-1]).any()
#>>> True

To get the number of occurances, use .sum() instead of .any().

And if you want the indexes, just use numpy.where:

[offset_indexes] = where(consecutives[1:] & consecutives[:-1])
offset_indexes
#>>> array([2])

EDIT: It seems you've edited the wanted length from 3 to 4. This invalidates my code, but you only need to set

consecutives[1:] & consecutives[:-1]

to

consecutives[2:] & consecutives[1:-1] & consecutives[:-2]

Here's a pointless generic version:

from numpy import arange, array, diff, where

def doubling_step_shifts(shifts):
    """
    When you apply a mask of some kind of all rotations,
    often the size of the last prints will allow shifts
    larger than 1. This is a helper for that.

    A mask is assumed to exist before invocation, as
    this is typically called repeatedly on the mask or
    a copy.
    """
    # Total shift
    subtotal = 1
    step = 1

    # While the shifts won't overflow
    while subtotal + step < shifts:
        yield step
        subtotal += step
        step *= 2

    # Make up the remainder
    if shifts - subtotal > 0:
        yield shifts - subtotal

def consecutive_indexes_of_length(numbers, length):
    # Constructing "consecutives" creates a
    # minimum mask of 1, whereas this would need
    # a mask of 0, so we special-case these
    if length <= 1:
        return arange(numbers.size)

    # Mask of consecutive numbers
    consecutives = diff(numbers) == 1
    consecutives.resize(numbers.size)

    # Recursively reapply mask to cover lengths too short
    for i in doubling_step_shifts(length-1):
        consecutives[:-i] &= consecutives[i:]

    # Reextend those lengths
    for i in doubling_step_shifts(length):
        consecutives[i:] = consecutives[i:] | consecutives[:-i]

    # Give the indexes
    return where(consecutives)[0]

EDIT: Made a lot faster (numpy.roll is slow).

And some tests:

numbers = array([1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21])

consecutive_indexes_of_length(numbers, 1)
#>>> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
consecutive_indexes_of_length(numbers, 2)
#>>> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 14])
consecutive_indexes_of_length(numbers, 3)
#>>> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 14])
consecutive_indexes_of_length(numbers, 4)
#>>> array([ 0,  1,  2,  3,  4,  5, 10, 11, 12, 13, 14])
consecutive_indexes_of_length(numbers, 5)
#>>> array([ 0,  1,  2,  3,  4,  5, 10, 11, 12, 13, 14])
consecutive_indexes_of_length(numbers, 6)
#>>> array([0, 1, 2, 3, 4, 5])
consecutive_indexes_of_length(numbers, 7)
#>>> array([], dtype=int64)

Why? No reason. It's O(n log k) where n is the number of elements in the list and k is GROUPSIZE~~, so don't use it for massively large GROUPSIZE. However,~~ it should be reasonably fast for pretty much all group sizes.

EDIT: It's now quite fast. I bet Cython'll be faster, but this is fine as it is.

This implementation has the advantage of being relatively simple, extendable and using very primitive operations. This might not be faster than a Cython loop except for very small inputs.