Question

I have a list of numbers in Python. It looks like this:

a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]

I want to keep all the numbers which are within + or - 7 of each other and discard the rest. Is there a simple way to do this in Python? I have read about the list filter method but am not sure how to get what I want working with it.

I am new to Python.

Update

Output would ideally be [84, 86, 87, 89, 90] and another list [997, 999, 1000, 1002]. I want to retrieve the sequences and not the outliers. Hope this makes sense.

Was it helpful?

Solution

This is algorithm problem, try this:

def get_blocks(values):
    mi, ma = 0, 0
    result = []
    temp = []
    for v in sorted(values):
        if not temp:
            mi = ma = v
            temp.append(v)
        else:
            if abs(v - mi) < 7 and abs(v - ma) < 7:
                temp.append(v)
                if v < mi:
                    mi = v
                elif v > ma:
                    ma = v
            else:
                if len(temp) > 1:
                    result.append(temp)
                mi = ma = v
                temp = [v]
    return result

a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
print get_blocks(a)

Output:

[[84, 86, 87, 89, 90], [997, 999, 1000, 1002]]

OTHER TIPS

For any problem like this my first port of call is the Python itertools module. The pairwise function from that link that I use in the code is available in the more-itertools module.

from more_itertools import pairwise
results = []
chunk = []
a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
a.sort()
for v1, v2 in pairwise(a):
    if v2 - v1 <= 7:
        chunk.append(v1)
    elif chunk:
        chunk.append(v1)
        results.append(chunk)
        chunk = []
print(results)
[[84, 86, 87, 89, 90], [997, 999, 1000, 1002]]

I haven't tested for edge cases, so it's buyer beware :)

If your problems allows transitive relations, i.e. x is in the group as long as it's at most 7 away from any element in the group, then this seems to me like a graph theory problem. To be more specific, you need to find all connected components.

The problem itself is pretty easy to solve with a recursive algorithms. You would first create a dictionary in which every key would be one of the elements and every value would be a list of elements which are at most 7 apart from that element. For your example, you would have something like this:

for element in elements:
    connections[element] = []
    visited[element] = False
    for another in elements:
        if abs(element - another) <= limit:
            connections[element].append(another)

Which would give you something like this

{
    84: [86, 87, 89, 90],
    86: [84, 87, 89, 90],
    ...
    997: [999, 1000, 1002]
    ...
    2014: []
}

Now you need to write a recursive function which will take as input an element and a list, and it will keep adding elements in a list as long as it can find an element which is at most 7 apart from the current element.

def group_elements(element, group):
    if visited[element]:
        return
    visited[element] = True
    group.append(element)
    for another in connections[element]:
        group_elements(another, group)

Somewhere in the code you also need to remember which elements you have already visited to make sure that you don't get into an infinite loop.

visited = {}

You need to call that function for every element in your list.

groups = []
for element in elements:
    if not visited[element]:
        group = []
        group_elements(element, group)
        groups.append(group)
print group

This code should give the following output for your input:

[[87, 84, 86, 89, 90], [2014], [1000, 1002, 997, 999]]
a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
temp=a[0]
result=[]
temp1=[]
counter =len(a)

for i in a:
    if i in range(temp-7,temp+7):
        temp1.append(i)
        if counter==1:
            result.append(temp1)
    else:
        if temp1:
            result.append(sorted(temp1))
        temp1=[]
        temp=i
    counter=counter-1
print result

Say, we have a list aa = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999].

aa = sorted(aa)
aa
[84,86,87,89,90,997,999,1000,1002,2014]

To define differences among neighbors: ad = np.ediff1d(aa).

ad
array([2, 1, 2, 1, 907, 2, 1, 2, 1012])

To cut off indices of numbers beyond the range: np.where(ad > rng)[0] + 1, where rng = 7 is range within which the numbers are kept:

np.where(ad > 7)[0] + 1
array([5, 9])

To split the array by indices into required subarrays: np.split(aa, np.where(ad > rng)[0] + 1). So, the function is:

def splitarr(aa,rng):
    ad = np.ediff1d(aa)
    return np.split(aa, np.where(ad > rng)[0] + 1)

splitarr(aa, 7)    
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002]), array([2014])]

The length of allowable sequence could be set by the filter:

np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]

where splarr = np.split(aa, np.where(ad > rng)[0] + 1). That returns indices of arrays of allowed length. map(len, splarr) returns lengths of all arrays inside list splarr; np.fromiter() converts map to numpy to enable >= lim. lim is the threshold for array length, only this length and above is OK. So, the final function is:

def splitarr(aa,rng,lim):
    splarr = np.split(aa, np.where(np.ediff1d(aa) > rng)[0] + 1)
    return [splarr[i] for i in np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]]

splitarr(aa, 7, 2)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002])]
splitarr(aa,7,1)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002]), array([2014])] 
splitarr(aa, 1, 1)
[array([84]), array([86, 87]), array([89, 90]), array([997]), array([ 999, 1000]), array([1002]), array([2014])]
splitarr(aa,1,2)
[array([86, 87]), array([89, 90]), array([ 999, 1000])]
splitarr(aa,2,2)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002])]

Instead of a comment under the 'best answer': it works only with the range 7, and is incorrect for any other value.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top