Question

I'm attempting to write a genetic algorithm framework in Python, and am running into issues with shallow/deep copying. My background is mainly C/C++, and I'm struggling to understand how these connections are persisting.

What I am seeing is an explosion in the length of an attribute list within a subclass. My code is below...I'll point out the problems.

This is the class for a single gene. Essentially, it should have a name, value, and boolean flag. Instances of Gene populate a list within my Individual class.

# gene class
class Gene():
  # constructor
  def __init__(self, name, is_float):
    self.name_     = name
    self.is_float_ = is_float
    self.value_    = self.randomize_gene()


  # create a random gene
  def randomize_gene(self):
    return random.random()

This is my Individual class. Each generation, a population of these are created (I'll show the creation code after the class declaration) and have the typical genetic algorithm operations applied. Of note is the print len(self.Genes_) call, which grows each time this class is instantiated.

# individual class
class Individual():
  # genome definition
  Genes_      = []    # genes list
  evaluated_  = False # prevent re-evaluation
  fitness_    = 0.0   # fitness value (from evaluation)
  trace_      = ""    # path to trace file
  generation_ = 0     # generation to which this individual belonged
  indiv_      = 0     # identify this individual by number

  # constructor
  def __init__(self, gen, indv):
    # assign indices
    self.generation_ = gen
    self.indiv_      = indv
    self.fitness_    = random.random()

    # populate genome
    for lp in cfg.params_:
      g = Gene(lp[0], lp[1])
      self.Genes_.append(g)

    print len(self.Genes_)

> python ga.py
> 24
> 48
> 72
> 96
> 120
> 144
......

As you can see, each Individual should have 24 genes, however this population explodes quite rapidly. I create an initial population of new Individuals like this:

# create a randomized initial population
def createPopulation(self, gen):
  loc_population = []
  for i in range(0, cfg.population_size_):
    indv = Individual(gen, i)
    loc_population.append(indv)
  return loc_population

and later on my main loop (apologies for the whole dump, but felt it was necessary - if my secondary calls (mutation/crossover) are needed please let me know))

for i in range(0, cfg.generations_):
      # evaluate current population
      self.evaluate(i)

      # sort population on fitness
      loc_pop = sorted(self.population_, key=operator.attrgetter('fitness_'), reverse=True)

      # create next population & preserve elite individual
      next_population = []
      elitist = copy.deepcopy(loc_pop[0])
      elitist.generation_ = i
      next_population.append(elitist)

      # perform selection
      selection_pool = []
      selection_pool = self.selection(elitist)

      # perform crossover on selection
      new_children = []
      new_children = self.crossover(selection_pool, i)

      # perform mutation on selection
      muties = []
      muties = self.mutation(selection_pool, i)

      # add members to next population
      next_population = next_population + new_children + muties

      # fill out the rest with random
      for j in xrange(len(next_population)-1, cfg.population_size_ - 1):
        next_population.append(Individual(i, j))

      # copy next population over old population
      self.population_ = copy.deepcopy(next_population)

      # clear old lists
      selection_pool[:]  = []
      new_children[:]    = []
      muties[:]          = []
      next_population[:] = []
Was it helpful?

Solution

I'm not not completely sure that I understand your question, but I suspect that your problem is that the Genes_ variable in your Individual() class is declared in the class namespace. This namespace is available to all members of the class. In other words, each instance of Individual() will share the same variable Genes_.

Consider the following two snippets:

class Individual():
  # genome definition
  genes = []
  def __init__(self):
      for i in xrange(10):
              self.genes.append(i)

ind_1 = Individual()
print ind_1.genes
ind_2 = Individual()
print ind_1.genes
print ind_2.genes

and

class Individual():
  # genome definition
  def __init__(self):
      self.genes = []
      for i in xrange(10):
              self.genes.append(i)

ind_1 = Individual()
print ind_1.genes
ind_2 = Individual()
print ind_1.genes
print ind_2.genes

The first snippet outputs

>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

while the second snippet outputs

>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In the first scenario, when the second Individual() is instantiated the genes list variable already exists, and the genes from the second individual are added to this existing list.

Rather than creating the Individual() class like this,

# individual class
class Individual():
  # genome definition
  Genes_      = []    # genes list

  # constructor
  def __init__(self, gen, indv):
    # assign indices
    self.generation_ = gen
    self.indiv_      = indv
    self.fitness_    = random.random()

you should consider declaring the Genes_ variable in init so that each Individual() instance gets its own gene set

# individual class
class Individual():

  # constructor
  def __init__(self, gen, indv):
    # genome definition
    self.Genes_      = []    # genes list
    # assign indices
    self.generation_ = gen
    self.indiv_      = indv
    self.fitness_    = random.random()

OTHER TIPS

When you create a class, you are really creating exactly one 'class object'. These are objects just like any other object in Python; everything in Python is an object, and what those objects do is defined by their methods, not their class! That is the magic of duck typing. In Python you can even create new classes dynamically on the fly.

Anyway, you are adding exactly one list object to the "Genes_" attribute of the one and only "Individuals" class object. The upshot is that every instance object of the "Individual" class object is accessing the same "Genes_" list object.

Consider this

    # In 2.2 <= Python < 3.0 you should ALWAYS inherit from 'object'.
    class Foobar(object):
        doodah = []

    a = Foobar()
    b = Foobar()
    assert id(a.doodah) == id(b.doodah) # True

In this case, as you can see, "a.doodah" and "b.doodah" are the same object!

    class Foobar(object):
        def __init__(self):
            self.doodah = []

    a = Foobar()
    b = Foobar()
    assert id(a.doodah) != id(b.doodah) # True

In this case, they are different objects.

It's possible to have your cake and eat it too. Consider this

    class Foobar(object):
        doodah = []

    a = Foobar()
    b = Foobar()
    a.doodah = 'hlaghalgh'
    assert id(a.doodah) != id(b.doodah) # True

In this case a "doodah" attribute is added to the "a" object, which overrides the class attribute.

Hope this helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top