Genetic Algorithm selection and crossover

https://stackoverflow.com/questions/8106111

27-02-2021
|

Question

I have been doing some research on genetic algorithms for a project in my ai class but I am a little confused as to what seems to be the traditional algorithm.

Basically, I wonder why they use different selections like roulette wheel to choose parents to reproduce. Why not choose the parents with the best fitness score and call it a day?

Also crossover confuses me as well. It randomly choose points every time to splice parent information. But it would seem to make more sense for crossovers to change based on previous info. If a chromosome string is known to good up to a point, the crossover could still be random but not in the range of the good part in the string.

Any thoughts?

Solution

Selection

If you only ever choose the best parent, what you get is Hill climbing. Hill climbing works nicely, but the more difficult the problem, generally, the more likely you are going to get stuck in a position you can make no further progress from.

Generally, the harder the problem, the more such local optima there are. Selecting other individuals in addition to the best ones maintains the diversity of the population: the solutions are spread further out in the search space, and if a part of the population is stuck in a local optimum, a different part of the population can still make progress.

Modern genetic algorithms usually devote a lot of effort to maintaining the diversity of the population to prevent premature convergence. One technique for that is fitness sharing. Another simple way to do this, is to divide the population into different species, so that individuals of different species can't (or only rarely can) reproduce with each other.

Crossover

Crossover tries to distribute good parts of the genome among individuals that have arisen due to mutation. It would indeed be nice if one could just swap the good parts of the genome, and this has been attempted; for example, you can look at each gene and measure the average fitness of individuals possessing that gene.

There are a two main problems with this though:

It is computationally expensive.
There might be interdependencies in the genome. Maybe gene A looks really good according to your metric, but gene B doesn't, so you leave it out. In reality though, it might be that gene A doesn't actually work without gene B being present.

OTHER TIPS

Picking only two parents and calling it a day converges to a solution too quickly. You are looking to adjust many different variables simultaneously. Imagine a two-variable scenario in which you use a genetic algorithm to find the lowest point in a room. Your approach might quickly find the lowest spot in one local trough, but if the plane has many undulations you risk not finding the trough with the lowest point.

Not selecting the best => because else you very likely to get stuck on a local optima. For similar reason, roulette selection is a thing from the past and cool kids use rank based selection (sorting the offsprings per fitness and keeping, say 1/10 of the best, check "evolution strategies"). Roulette selection, aka fitness proportional selection, does not work well if the fitness scale is not very regular, and in practice it's never regular.

Crossover => Evolution strategies just use mutation and are totally fine without crossover. Crossover assume that your objective function can be neatly decomposed in several bits, that the crossover will find. In most genotype, various parts of the genotype are related in a highly non-linear way. It's very naive and true only on toy problems. If you have no serious justification to use a crossover operator, just do without it, Occam razor and all.

I think DataWraith answered the question quite well. Concerning crossover, I'll just add that John Holland argues that the GA works by implicitly calculating the fitness of each chromosome substring ("schema") using randomized crossover and selection, instead of calculating it explicitly, which would be extremely time-consuming (as DataWraith said). Holland calls this process "implicit parallelism".

-Ted

What about replacing species after crossover?

I choose species for reproduction with roulette wheel selection method. My crossover rate is 0.7 (70%), but I actually don't know what that means. Does it means that i choose 70 pairs of parents, crossover them and replace the worst two in pool with new twos? Or it means I choose 70/2 = 35 pairs of parents, crossover them and replace them with the worst ones?

I really don't know with what species you replace new children? What if fitness of children is worst than fitness of worst two in pool? Please explain replacing process in proportional selection method with roulette wheel.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow