Biopython: estrarre gli ID di sequenza da un file di output Blast

https://stackoverflow.com/questions/1684470

18-09-2019
|

Domanda

Ho un file di output BLAST in formato XML. È 22 sequenze di interrogazione con 50 risultati riportati da ogni sequenza. E voglio estrarre tutti i colpi 50x22. Questo è il codice ho attualmente, ma estrae solo i 50 colpi dalla prima query.

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
save_file.close()

Qualcuno ha qualche suggerimento da estrarre tutti i colpi? Credo che devo usare qualcosa di diverso allineamenti. Spero che questo era chiaro. Grazie!

Jon

Soluzione

Questo dovrebbe ottenere tutti i record. La novità rispetto al originale è il

for blast_record in blast_records

che è un linguaggio pitone per scorrere elementi in una "lista-like", come le blast_records (controllo CBIXML documentazione del modulo ha mostrato che parse () restituisce infatti un iteratore)

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for blast_record in blast_records:
  for alignment in blast_record.alignments:
      for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
  #here possibly to output something to file, between each blast_record
save_file.close()

Altri suggerimenti

Ho usato questo codice per estrarre tutti i risultati

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
    print "QUERY: %s" % record.query
    for align in record.alignments :
        print " MATCH: %s..." % align.title[:60]
        for hsp in align.hsps :
            print " HSP, e=%f, from position %i to %i" \
                % (hsp.expect, hsp.query_start, hsp.query_end)
            if hsp.align_length < 60 :
                 print "  Query: %s" % hsp.query
                 print "  Match: %s" % hsp.match
                 print "  Sbjct: %s" % hsp.sbjct
            else :
                 print "  Query: %s..." % hsp.query[:57]
                 print "  Match: %s..." % hsp.match[:57]
                 print "  Sbjct: %s..." % hsp.sbjct[:57]


print "Done"

o meno dettagli

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"

Ho usato questo sito

http: //www2.warwick. ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow