optimize python key-searching in hierarchichal dictionary

Question

You cannot use a defaultdict because your __init__ methods require arguments.

This is probably one of your bottlenecks:

def updateNgenes(self):
#Updating the number of genes
    self.ngenes = len(self.genes.keys())

len(self.genes.keys()) creates a list of all keys before calculating length. This means that every time you add a gene, you create a list and throw it away. This list creation gets more and more expensive the more genes you have. To avoid creating an intermediate list, just do len(self.genes).

Better yet would be to make ngenes a property so it is only calculated when you need it.

@property
def ngenes(self):
    return len(self.genes)

The same can be done with nproteins in the Gene class.

Here is your code refactored:

class Species:
    '''This structure contains all the information needed for all genes.
    One specie have several genes, one gene several proteins'''

    def __init__(self, name):
        self.name = name #name of the GENE
        self.genes = {}

    def addProtein(self, gene, protname, len):
        #Converting a line from the input file into a protein and/or an exon
        if gene not in self.genes:
            self.genes[gene] = Gene(gene) 
        self.genes[gene].proteins[protname] = Protein(protname, len)

    @property
    def ngenes(self):
        return len(self.genes)

class Protein:
    #The class protein contains information about the length of the protein and a list with it's exons (with it's own attributes)
    def __init__(self, name, len):
        self.name = name
        self.len = len

class Gene:
    #The class gene contains information about the gene and a dict with it's proteins (with it's own attributes)
    def __init__(self, name):
        self.name = name
        self.proteins = {}

    @property
    def nproteins(self):
        return len(self.proteins)