문제

I am currently writing a library that uses the -outfmt 10 option of Blast, which give you a CSV instead of the pretty human readable format.

Like

tblastn -db dmel_a -query somequery.faa -outfmt 10

The problem is, I want to access the db source file so I can extract some sequences after processing. The only way I know how to do this, is to use the remove -outfmt 10 and run the blast twice. Then I parse the human readable output for the line that says:

Database: Source.fas

But, that only works if title is not specified when creating the database in makeblastdb. The stitle of outfmt 10 seems to be the fasta header line anyway. I cannot just look for the database name and then a .fna, .fas, .faa because you can name the database differently than the source file.

Is there another way to extract the fasta source file from the blast database name? I do not see one in the list of outfmt options. Or am I blind today?

도움이 되었습니까?

해결책

Found a solution that worked based on a Biostar question, and a blasted bioinformatics blog post. Requires Blast+ 2.2.28 if your fasta doesnt follow NCBI naming exactly.

When you create the blast database, use the -parse_seqids flag. Then with blastdbcmd, you can extract a range of the sequence

blastdbcmd -db t/blastTest/dmel -range 1-10 -entry some_seq_id
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top