You can use split and startwith list comprehension as follows:
contig="CCCCAAAACCCCAAAACCCCAAAACCCCTAcGAaTCCCcTCATAATTGAAAGACTTAAACTTTAAAACCCTAGAAT"
splitbase="CCCCAAAA"
halfBase="CCCC"
splittedContig=contig.split(splitbase)
cnt=len(splittedContig)-1
print cnt+sum([0.5 for e in splittedContig if e.startswith(halfBase)])
Output:
3.5
- split the strings based on
CCCCAAAA
. It would give the list, in the list elementsCCCCAAAA
will be removed - length of splitted - 1 gives the number of occurrence of
CCCCAAAA
- in the splitted element, look for elements starts with
CCCC
. If found add 0.5 to count for each occurence.