Question

Currently have the code below which works perfectly for taking 2 strings and identifying regions that match as shown in the third line of output.

I want to say to the next part of the code from 0 to wherever the first match string ends remove this section from s2, so for the given example remove from 0 to 9. However only do this if it starts at 0. I am unsure how to work with nested lists so explanation of what your code does would be great.

from collections import defaultdict
from itertools import groupby

def class_chars(chrs):
    if 'N' in chrs:
        return 'unknown'
    elif chrs[0] == chrs[1]:
        return 'match'
    else:
        return 'not_match'

s1 = 'aaaaaaaaaaN123bbbbbbbbbbQccc'
s2 = 'aaaaaaaaaaN456bbbbbbbbbbPccc'
n = 0
consec_matches = []
chars = defaultdict(int)

for k, group in groupby(zip(s1, s2), class_chars):
    elems = len(list(group))
    chars[k] += elems
    if k == 'match':
        consec_matches.append((n, n+elems-1))
    n += elems

print chars
print consec_matches
print [x for x in consec_matches if x[1]-x[0] >= 9]

Output:

defaultdict(<type 'int'>, {'not_match': 4, 'unknown': 1, 'match': 23})
[(0, 9), (14, 23), (25, 27)]
[(0, 9), (14, 23)]
Was it helpful?

Solution

Not sure I fully get what you want but you can point me in the direction using the following:

In [12]: l=[(0, 9), (14, 23), (25, 27)]

In [13]: flatten_l= [x for y in l for x in y]

In [14]: flatten_l
Out[14]: [0, 9, 14, 23, 25, 27]

# access second tuple arg if first is equal to 0
In [15]: get_num_after=[y[1] for y in l for x in y if x ==0 ] 
In [16]: get_num_after
Out[16]: [9]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top