Question

So I have this sequence and I'm trying to use regex's search function to find the region before the occurrence of the query and the region after.

This is what I have

sequence = 'abcdefghijklmnopqrstuvwxyz'
query = 'jklmnop'

This is what I want to end up with

before = 'abcdefghi'
after = 'qrstuvwxyz'

I tried it for the before one and it's not working. I thought this would split it into 3 groups

sequence = 'abcdefghijklmnopqrstuvwxyz'
query = 'jklmnop'
print re.search('\w+(query)\w+',sequence).group(0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'NoneType' object has no attribute 'group'
Was it helpful?

Solution

Writing query inside a string doesn't stores the value of query inside that string, it simply creates a new string with the string 'query' in it.

>>> print '\w+(query)\w+'
\w+(query)\w+

You should use string formatting:

>>> sequence = 'abcdefghijklmnopqrstuvwxyz'
>>> query = 'jklmnop'
>>> '(\w+)({})(\w+)'.format(query)
'(\\w+)(jklmnop)(\\w+)'

>>> re.search('(\w+)({})(\w+)'.format(query),sequence).group(1)
'abcdefghi'
>>> re.search('(\w+)({})(\w+)'.format(query),sequence).group(3)
'qrstuvwxyz'

It's better to use re.split for such purposes:

>>> strs = 'abcdefghijklmnopqrstuvwxyz'
>>> import re
>>> before, after = re.split('jklmnop',strs)
>>> before
'abcdefghi'
>>> after
'qrstuvwxyz'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top