문제

from a string say dna = 'ATAGGGATAGGGAGAGAGCGATCGAGCTAG' i got substring say dna.format = 'ATAGGGATAG','GGGAGAGAG' i only want to print substring whose length is divisible by 3 how to do that? im using modulo but its not working !

import re
if mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
print re.findall("ATA"(.*?)"AGA" , mydna)
if len(mydna)%3 == 0
   print mydna

corrected code

import re
mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
re.findall("ATA"(.*?)"AGA" , mydna.format)
if len(mydna.format)%3 == 0:
   print mydna.format

this still doesnt give me substring with length divisible by three . . any idea whats wrong ?

im expecting only substrings which has length divisible by three to be printed

도움이 되었습니까?

해결책

You can also use the regular expression for that:

re.findall('ATA((...)*?)AGA', mydna)

the inner braces match 3 letters at once.

다른 팁

For including overlap substrings, I have the following lengthy version. The idea is to find all starting and ending marks and calculate the distance between them.

mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start() and (end.start()-start.start())%3 == 0]
['ATAGGGATAGGG', 'ATAGGG']

Show all substrings, including overlapping ones:

[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) for end in re.finditer('(?=AGA)',mydna) if end.start()>start.start()]
['ATAGGGATAGGG', 'ATAGGGATAGGGAG', 'ATAGGGATAGGGAGAGAGC', 'ATAGGG', 'ATAGGGAG', 'ATAGGGAGAGAGC']

Using modulo is the correct procedure. If it's not working, you're doing it wrong. Please provide an example of your code in order to debug it.

re.findAll() will return you an array of matching strings, You need to iterate on each of those and do a modulo on those strings to achieve what you want.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top