Question

I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
save_file.close()

Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments. Hope this was clear. Thanks!

Jon

Was it helpful?

Solution

This should get all records. The novelty compared with the original is the

for blast_record in blast_records

which is a python idiom to iterate through items in a "list-like" object, such as the blast_records (checking the CBIXML module documentation showed that parse() indeed returns an iterator)

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for blast_record in blast_records:
  for alignment in blast_record.alignments:
      for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
  #here possibly to output something to file, between each blast_record
save_file.close()

OTHER TIPS

I used this code for extract all the results

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
    print "QUERY: %s" % record.query
    for align in record.alignments :
        print " MATCH: %s..." % align.title[:60]
        for hsp in align.hsps :
            print " HSP, e=%f, from position %i to %i" \
                % (hsp.expect, hsp.query_start, hsp.query_end)
            if hsp.align_length < 60 :
                 print "  Query: %s" % hsp.query
                 print "  Match: %s" % hsp.match
                 print "  Sbjct: %s" % hsp.sbjct
            else :
                 print "  Query: %s..." % hsp.query[:57]
                 print "  Match: %s..." % hsp.match[:57]
                 print "  Sbjct: %s..." % hsp.sbjct[:57]


print "Done"

or for less details

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"

I used this site

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top