
So I'm working on a problem where I have to find various string repeats after encountering an initial string, say we take ACTGAC so the data file has sequences that look like:


So in that string once we find ACTGAC then I need to analyze the next 10 characters for the string repeats which go by some rules. I have the rules coded but can anyone show me how once I find the string that I need, I can make a substring for the next ten characters to analyze. I know that str.partition function can do that once I find the string, and then the [1:10] can get the next ten characters.


Was it helpful?


You almost have it already (but note that indexes start counting from zero in Python).

The partition method will split a string into head, separator, tail, based on the first occurence of separator.

So you just need to take a slice of the first ten characters of the tail:

>>> head, sep, tail = data.partition('ACTGAC')
>>> tail[:10]

Python allows you to leave out the start-index in slices (in defaults to zero - the start of the string), and also the end-index (it defaults to the length of the string).

Note that you could also do the whole operation in one line, like this:

>>> data.partition('ACTGAC')[2][:10]


So, based on marcog's answer in Find all occurrences of a substring in Python , I propose:

>>> import re
>>> sep = 'ACTGAC'
>>> [data[m.start()+len(sep):][:10] for m in re.finditer('(?=%s)'%sep, data)]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top