Question

I am working on a bash script for comparing several positions with given start position/end positions. I have two different files (with different size):

  • File 1: start and end position (tab seperated)
  • File 2: single position

Bash is really slow while processing for loops and I had the idea of using python for this approach.

python - << EOF


posList=posString.split()
endList=endString.split()
startList=startString.split()

for j, val2  in enumerate(posList):
        for i, val1 in enumerate(startList):
                if val1 >= val2 and endList[i] <= val2:
                        print "true", val2
                else:
                        print "false", val2

EOF

I have three strings as input (position, start, end) and split them into lists. With the two nested loops I iterate over the bigger position file and then over the star/end file. If my conditions are fullfilled (if pos > start and position < end) I would like to print something.

My input files are string, whitespace seperated with numbers.

Maybe I'm absolutly on the wrong way, I hope not, but with this idea it takes too long to work with it.

Thanks a lot for your help.

Was it helpful?

Solution

If you start by sorting the positions and the ranges, you can save a lot of time:

range_sorted_list = sorted(zip(start_list, end_list))
range_sorted_iter = iter(range_sorted_list)
pos_sorted_list = sorted(pos_list)

start, end = next(range_sorted_iter)

try:        
    for pos in pos_sorted_list:
        while pos >= end:
            start, end = next(range_sorted_iter)
        if start <= pos < end:
            print "True", pos
        elif pos < start:
            print "False", pos
except StopIteration:
    pass

This will allow you to only go over the arrays once, instead of once for every position.

OTHER TIPS

Itertools is the way to go. The product function uses vector operations to make the execution more efficient. itertools

from itertools import product

posList=posString.split()
endList=endString.split()
startList=startString.split()

for (j, val2),(i,val1) in product(enumerate(posList),enumerate(startList)):
       if val1 >= val2 and endList[i] <= val2:
                print "true", val2
       else:
                print "false", val2,
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top