Frage

how can I calculate the number of gaps in sequences:

for example:

s1='G _ A A T T C A G T T A'
s2='G G _ A _ T C _ G _ _ A'
s3='G A A T T C A G T _ T _'

her the number of '_' is 8

I try the following:

def count():
    gap=0
    for i in range(0, len(s1), 3):
        for x,y,z in zip(s1,s2,s3):
            if (x=='_') or (y=='_')or (z=='_') :
                gap=gap+1
        return gap

it gives 6 not 8

War es hilfreich?

Lösung 2

Your code returns 7 which is the total count of all the underscores minus the extra underscore in the third to last position. You can fix that by removing the or-test (which short-circuits the tests when a match is found).

Also note there is no need to triple-zip the code or to loop with a stride-of-three.

Here is a cleaned-up version of your original code:

def count():
    gap=0
    for x,y,z in zip(s1,s2,s3):
        if (x == '_'):               # these if-stmts don't short-circuit
            gap += 1
        if (y == '_'):
            gap += 1
        if (z == '_'):
            gap += 1
    return gap

There are other ways to do this faster (i.e. the str.count method) but I wanted to show you how to repair and clean-up your original logic. That ought to put you on the right track when you do other analytics.

Andere Tipps

Strings have a count() method:

s1.count('_') + s2.count('_') + s3.count('_')

The two _'s in the 10th position only get counted twice. You should get 7, rather than 6.

The simple solution is sum([item.count('_') for item in [s1,s2,s3]])

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top