Since your first text file contains all of the "fields" for the output we can reduce the logic and number of steps slightly.
First we open the two input files and read them into lists:
with open('file1.txt', 'r') as a, open('file2.txt','r') as b:
fileA = [l.rstrip('\n').split('\t')[1:5] for l in a.readlines()]
fileB = [l.rstrip('\n').split('\t')[1:] for l in b.readlines()]
So now we have two lists, fileA
and fileB
. You'll notice the slice notation on both of them. Since fileA
has all of the values you want for the output it is now ready, it just needs filtered against the second list. I've also removed the first item from both lists so we can use the EMT...
values for comparison.
Now we can check if fileB
contains (not in it's entirety) fileA
and write the matches to the results file:
with open('results.txt','w') as o:
for line in fileA:
if any(line[0] in l for l in fileB):
o.write('%s\n' % '\t'.join(line))
results.txt
is once again tab-delimited with the corresponding matches:
EMT15298 GO:0003674 molecular_function PF08268
EMT20601 GO:0005515 protein binding PF08268