Question

I was introduced to genetic algorithms recently by this MSDN article, in which he calls them combinatorial evolution, but it seems to be the same thing, and am struggling to understand how combining two potential solutions will always produce a new solution that is at least as good as its parents.

Why is this so? Surely combining might produce something worse.

As far as I understand it, the algorithm is based on the concept that when a male and female of a species produce offspring, those offspring will have characteristics of both parents. Some combinations will be better, some worse and some just as good. The ones that are better (for whatever defintion of "better" is appropriate) stand more chance of surviving and producing offpsring that have the improved characteristics. However, there will be combinations that are weaker. Why isn't this an issue with GA?

Was it helpful?

Solution

A genetic algorithm tries to improve at each generation by culling the population. Every member is evaluated according to a fitness function, and only a high-scoring portion of them is allowed to reproduce.

You're right, though: there is no guarantee that the next generation will improve on its predecessor's score.

Consider Dawkins' weasel program: "evolving" the string "Methinks it is like a weasel". Starting from a population of random strings, the fitness function evaluates the closest textual match, which is perturbed to produce the next generation. With a simple crossover reproduction, two high-scoring strings that are combined could very easily produce lower-scoring offspring. Even "asexual" random mutation of a single high-fitness string could lower the child's fitness.

It's worth noting, I think, that this is not necessarily a flaw. With this kind of search, there is the idea of local maxima. A member of the population might represent a solution that's not the optimal result, but is the best that can be achieved without getting worse on the way.

Imagine that the fitness function for the weasel program doesn't only find the edit distance, but has some notion of "word", and tests whether the last word of the string is the name of an animal. Any animal name scores well, but "weasel" gets a big bonus.

Now what happens if "Methinks it is like a walrus" is evolved? It scores well. Not as well as the ultimate target string, but better than "Methinks it is like a walrut" or other close variations that could be reached by a single step of mutation.

The walrus string is a local maximum, and the search can get stuck there unless the program allows the next generation's score to be worse.

OTHER TIPS

We don't know it will get better, we do know that it won't get worse.

In every generation, does not consist just of the ospring of the best elements, but also includes the best elements themselves -- clones if you will. Since they are still present, they will score the same as before. Meaning that if none of the offspring are better, the previous generations winners will win again -- and be re-mutated / breed.

Consider: With an progenitor individual being a letter, eg A A mutated child being a defined by adding a number eg A1, cross-bread solutions being written with brackets around the parent eg (A1B2) And the fitness core of any indivisual writen after it -- higher being better [12]

For demonstration, consider a Pool of 5, where we keep the best 2. and fill with 1 mutate of each, plus a cross-breed

Generation 1

  • A [10]
  • B [5]
  • C [4]
  • D [3]
  • E [1]

Keep A,B, as they are the best two, and refill other 3 slots with there descendants

Generation 2

  • A [10]
  • B [5]
  • (AB) [7]
  • A1 [12]
  • B1 [4]

Keep A,and (AB), as they are the best 2 -- THis means that grandpa A will still be in the pool as most cildren work weaker

Generation 3

  • A [10]
  • (AB) [12]
  • (A(AB)) [14]
  • A2 [8]
  • (AB)1 [13]

Keep (AB)1 and (A(AB)) -- this time no grandparents were maintained, as two of their children beat them. But if (AB1) has have performed just slightly worse we would have kept (AB) instead.

This continues until the score stablize. Which indicates you've hit some kind of local maxima (potentially a global maxima). One why to detect this would be if the same individuals continue to be "cloned" into the next generation. (though for high dimensional problems that might take too long, so better perhaps to just check improvement < a particular tolerance)

In general, genetic algorithms work by creating a number of (random) variations on the parents in each generation. Then some selection function is applied, and the offspring which is most fit according to this function survives. So the offspring is not necessarily better since the variation is random, but combined with selection you get improvement over time.

When I studied genetic algorithms in college it was explained like this :

Imagine a solution is combination of "genes", where each gene affect how good the solution as a whole is. When two solutions are mated, they genes are picked randomly from each parent.

Now, if gene leads, generally, to good solution, it's frequency in the gene pool increases. In extreme case, the gene will dominate the population.

So when you think about genetic algorithms (and evolution in general), you should not think about individuals. You should think about genes and populations as a whole. Even if one "best" solution is lost, it doesn't mean it's genes are lost.

There is also idea of elitism in genetic algorithms. It means, that best solution(s) are always kept across generations. This might speed up convergence of the algorithm, but it is easier for algorithm to get stuck in local optima.

GA algorithms are not deterministic, they don't guarantee to get an improvement in each generation, and they also don't guarantee to find a total optimum. However, the selection phase of an GA, using a fitness function, makes it more likely that "good solutions" will survive.

Licensed under: CC-BY-SA with attribution
scroll top