Question

So I'm working on a problem where I have to find various string repeats after encountering an initial string, say we take ACTGAC so the data file has sequences that look like:

AAACTGACACCATCGATCAGAACCTGA

So in that string once we find ACTGAC then I need to analyze the next 10 characters for the string repeats which go by some rules. I have the rules coded but can anyone show me how once I find the string that I need, I can make a substring for the next ten characters to analyze. I know that str.partition function can do that once I find the string, and then the [1:10] can get the next ten characters.

Thanks!

Was it helpful?

Solution

You almost have it already (but note that indexes start counting from zero in Python).

The partition method will split a string into head, separator, tail, based on the first occurence of separator.

So you just need to take a slice of the first ten characters of the tail:

>>> data = 'AAACTGACACCATCGATCAGAACCTGA'
>>> head, sep, tail = data.partition('ACTGAC')
>>> tail[:10]
'ACCATCGATC'

Python allows you to leave out the start-index in slices (in defaults to zero - the start of the string), and also the end-index (it defaults to the length of the string).

Note that you could also do the whole operation in one line, like this:

>>> data.partition('ACTGAC')[2][:10]
'ACCATCGATC'

OTHER TIPS

So, based on marcog's answer in Find all occurrences of a substring in Python , I propose:

>>> import re
>>> data = 'AAACTGACACCATCGATCAGAACCTGAACTGACTGACAAA'
>>> sep = 'ACTGAC'
>>> [data[m.start()+len(sep):][:10] for m in re.finditer('(?=%s)'%sep, data)]
['ACCATCGATC', 'TGACAAA', 'AAA']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top