If you want your regex to stop matching at the first TAA|TAG|TGA
, but still only succeed if there are at least nine three letter chunks, the following may help:
>>> import re
>>> regexp = r'ATG(?:(?!TAA|TAG|TGA)...){9,}?(?:TAA|TAG|TGA)'
>>> re.findall(regexp, 'ATGAAAAAAAAAAAAAAAAAAAAAAAAAAATAG')
['ATGAAAAAAAAAAAAAAAAAAAAAAAAAAATAG']
>>> re.findall(regexp, 'ATGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAG')
['ATGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAG']
>>> re.findall(regexp, 'ATGAAATAGAAAAAAAAAAAAAAAAAAAAATAG')
[]
This uses a negative lookahead (?!TAA|TAG|TGA)
to ensure that a three character chunk is not a TAA|TAG|TGA
before it matches the three character chunk.
Note though that a TAA|TAG|TGA
that does not fall on a three character boundary will still successfully match:
>>> re.findall(regexp, 'ATGAAAATAGAAAAAAAAAAAAAAAAAAAATAG')
['ATGAAAATAGAAAAAAAAAAAAAAAAAAAATAG']