문제

I am working on a bash script for comparing several positions with given start position/end positions. I have two different files (with different size):

  • File 1: start and end position (tab seperated)
  • File 2: single position

Bash is really slow while processing for loops and I had the idea of using python for this approach.

python - << EOF


posList=posString.split()
endList=endString.split()
startList=startString.split()

for j, val2  in enumerate(posList):
        for i, val1 in enumerate(startList):
                if val1 >= val2 and endList[i] <= val2:
                        print "true", val2
                else:
                        print "false", val2

EOF

I have three strings as input (position, start, end) and split them into lists. With the two nested loops I iterate over the bigger position file and then over the star/end file. If my conditions are fullfilled (if pos > start and position < end) I would like to print something.

My input files are string, whitespace seperated with numbers.

Maybe I'm absolutly on the wrong way, I hope not, but with this idea it takes too long to work with it.

Thanks a lot for your help.

도움이 되었습니까?

해결책

If you start by sorting the positions and the ranges, you can save a lot of time:

range_sorted_list = sorted(zip(start_list, end_list))
range_sorted_iter = iter(range_sorted_list)
pos_sorted_list = sorted(pos_list)

start, end = next(range_sorted_iter)

try:        
    for pos in pos_sorted_list:
        while pos >= end:
            start, end = next(range_sorted_iter)
        if start <= pos < end:
            print "True", pos
        elif pos < start:
            print "False", pos
except StopIteration:
    pass

This will allow you to only go over the arrays once, instead of once for every position.

다른 팁

Itertools is the way to go. The product function uses vector operations to make the execution more efficient. itertools

from itertools import product

posList=posString.split()
endList=endString.split()
startList=startString.split()

for (j, val2),(i,val1) in product(enumerate(posList),enumerate(startList)):
       if val1 >= val2 and endList[i] <= val2:
                print "true", val2
       else:
                print "false", val2,
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top