how to get a specific protein sequence using entrez.efetch?

Question

Good point, database entries in XML do vary between proteins submitted by various authors.

I have made an algorithm to "hunt" for protein sequences from the XML tree:

import os
import sys
from Bio import Entrez
from Bio import SeqIO

gi          = '1293613'         # example gene id                   
Entrez.email= "you@email.com"   # Always tell NCBI who you are
protina     = Entrez.efetch(db="protein", id=gi, retmode="xml") # fetch the xml
protinaXML  = Entrez.read(protina)[0]

seqs = []           # store candidate protein seqs
def seqScan(xml):   # recursively collect protein seqs
    if str(type(xml))=="<class 'Bio.Entrez.Parser.ListElement'>":
        for ele in xml:
            seqScan(ele)
    elif str(type(xml))=="<class 'Bio.Entrez.Parser.DictionaryElement'>":
        for key in xml.keys():
            seqScan(xml[key])
    elif str(type(xml))=="<class 'Bio.Entrez.Parser.StringElement'>":
        #   v___THIS IS THE KEYWORD FILTER_____v
        if (xml.startswith('M') and len(xml))>10: # 1) all proteins start with M (methionine)
            seqs.append(xml)                      # 2) filters out authors starting with M

seqScan(protinaXML) # run the recursive sequence collection
print(seqs)         # print the goods!

Note: in rare cases (depending on the "keyword filter") it may humorously grab unwanted strings such as Authors names starting with 'M' whose abbreviated names are more than 10 characters long (picture below):

enter image description here

Hope that helps!