Question

I am processing a large BLAST file together with a large FASTA file, and I need to load several lines of FASTA for one block of BLAST (let's say it is one line).

I would expect that in the second loop (line) of BLAST, it will continue on the next line from the last-processed FASTA line, but it is loading all the same FASTA lines. Why? And how can I load the next line? Is it really necessary to add some indexing?

with open(fastaname,'r') as fastafile:
  with open(blastfilename,'r') as blastfile:
    for line in blastfile:
      while True:  
        fastaline = fastafile.readline()[:-1]  
            if fastaline[0]=='>':  
                break  
            fastaseq += fastaline
      somefunction(line,fastaseq)

FASTA has the typical format:

>name_of_seqence\n  
ACGATCATCGTAGCTGCATGACTGCA\n  
GATCGATCTGATCGATGCAGTCAGTA\n
>name_of_seqence\n  
GCACGCGACCACGATCATTGACTGCA\n  
CAAAAGATCTGATCGATGCAGTCAGT\n
CAGTCGATGCTAGTCGATGCTCGATA\n
etc.

I need each sequence for every line of the next BLAST sequence.

Was it helpful?

Solution

I don't see anything that would ever clear the lines you're appended to fastaseq after finishing a line from your blast file. Try this:

with open(fastaname,'r') as fastafile:
    with open(blastfilename,'r') as blastfile:
        for line in blastfile:
            fastaseq = '' # or whatever data type this is 
            while True:  
                fastaline = fastafile.readline()[:-1]  
                if fastaline[0]=='>':  
                    break  
            fastaseq += fastaline
        somefunction(line,fastaseq)

As it is your first blast line would get the first set of fasta lines, the second blast line would get the first and second sets, etc etc.

OTHER TIPS

An alternative to parsing a FASTA file yourself is to use a preexisting parser, like Biopython's SeqIO.

You could also use zip() to iterate over two iterables side-by-side.

Using Biopython, your code would look like this:

from Bio import SeqIO

with open(fastaname, 'r') as fastafile, open(blastname, 'r') as blastfile:
    for record, line in zip(SeqIO.parse(fastafile, 'fasta'), blastfile):
        somefunction(line, record.seq)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top