Question

I'm just getting started with Python and BioPython and don't have too much programming experience. I'd appreciate any help you guys could give me.

I'm trying to extract CDS and/or rRNA sequences from genbank. It's important that I'm only getting the open reading frame which is why I'm not just pulling the whole sequence. When I run the code below it kicks back an error saying:

no records found in handle

for the line of code that reads: record = SeqIO.read(handle, "genbank"). I'm not sure how to correct this issue. I've included the code I'm using below.

Also, if there is an easier way of doing this, or published code, I'd appreciate if you guys let me know.

Thanks!

# search sequences by a combination of keywords
# need to find (number of) results to set 'retmax' value
handle = Entrez.esearch(db = searchdb, term = searchterm)
records = Entrez.read(handle)
handle.close()
# repeat search with appropriate 'retmax' value
all_handle = Entrez.esearch(db = searchdb, term = searchterm, retmax = records['Count'])
records = Entrez.read(all_handle)

print " "
print "Number of sequences found:", records['Count'] #printing to make sure that code is working thus far. 
print " "

locations = [] # store locations of target sequences
sequences = [] # store target sequences

for i in range(0,int(records['Count'])) :
    handle = Entrez.efetch(db = searchdb, id = records['IdList'][i], rettype = "gb", retmode = "xml") 
    record = SeqIO.read(handle, "genbank")
    for feature in record.features:
        if feature.type==searchfeaturetype: #searches features for proper feature type
            if searchgeneproduct in feature.qualifiers['product'][0]: #searches features for proper gene product
                if str(feature.qualifiers) not in locations: # no repeat location entries
                    locations.append(str(feature.location)) # appends location entry
                    sequences.append(feature.extract(record.seq)) # append sequence
Was it helpful?

Solution

You are requesting xml from genbank when SeqIO.read expects the format to be the genbank flat file format. Try changing your efetch line to this:

handle = Entrez.efetch(db = searchdb, id = records['IdList'][i], rettype = "gb", retmode = "txt") 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top