Вопрос

I have a huge blast output file in tabular format. I want to sort my data according to protein names, to see which seq-s align to that particular protein. Let's say I have

con19 sp|Q24K02|IDE_BOVIN 3
con19 sp|P35559|IDE_RAT   2
con15 sp|Q24K02|IDE_BOVIN 8
con15 sp|P14735|IDE_HUMAN 30
con16 sp|Q24K02|IDE_BOVIN 45
con16 sp|P35559|IDE_RAT   23

I want to get an output,both are OK

sp|Q24K02|IDE_BOVIN con19 3            sp|Q24K02|IDE_BOVIN con19 3
                    con15 8            sp|Q24K02|IDE_BOVIN con15 8
                    con16 45           sp|Q24K02|IDE_BOVIN con16 45
sp|P35559|IDE_RAT   con19 2            sp|P35559|IDE_RAT   con19 2          
                    con16 23           sp|P35559|IDE_RAT   con16 23
sp|P14735|IDE_HUMAN con15 30           sp|P14735|IDE_HUMAN con15 30



f1 = open('file.txt','r')
lines=f1.readlines()
for line in lines:
    a=sorted(lines)
    r=open('file.txt','w')
    r.writelines(a)
f1.close       
Это было полезно?

Решение

The problem is that you are calling sorted once for each line (i.e. inside the loop), not for the entire set of lines. Try this instead:

f1 = open('file.txt','r')
a=sorted(f1.readlines(), key=lambda l:l.split('|')[1])
r=open('file.txt','w')
r.writelines(a)
f1.close       

Другие советы

You need to sort on the middle element, just sorting the lines themselves will do an alphabetical sort i.e. on the first element. Try this instead:

with open('infile.txt') as f_in, open('outfile.txt', 'w') as f_out:
    f_out.write(''.join(sorted(f_in, key=lambda x: x.split()[1:2])))
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top