문제

I have created a dictionary of contigs and their lengths in file1. I also have file2 which is blast output in tabular format, which contains contig alignments (but not all of them) and some additional information like where match starts and finishes etc. In order to calculate query and subject coverage, I need to associate those lengths from file1, to length in file2. How to do that? Thanks

도움이 되었습니까?

해결책

Assuming file1 is:

contig1 134
contig2 354
contig3 345

Your script would look like

import re

contigDict={}
with open('file1') as c1:
    text=c1.readlines()
for line in text:
    key,value = line.split()
    contigDict[key]=value
with open('file2') as c2:
    scrambled_text=c2.read()

contigs = re.findall(r'contig\d+',scrambled_text)
output = {}
for contig in contigs:
    output[contig]=contigDict[contig]
with open('file3',w) as w:
    for key in output.keys():
        w.write(key+'\t'+output[key]+'\n')

다른 팁

this is working

import re
r=open('result.txt','w')
subjectDict={}
with open('file1.txt') as c1:
    text=c1.readlines()
for line in text:
    key,value = line.split()
    subjectDict[key]=value
with open('file2.txt') as c2:
    lines=c2.readlines()
for line in lines:
    new_list=re.split(r'\t+',line)
    s_name=new_list[0]
    subjects = re.findall(r'contig\d+',s_name)
    output = {}
    for subject in subjects:
        output[subject]=subjectDict[subject]
        r.writelines(subjectDict[subject]+'\n')
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top