What's the difference between Uniform Crossover (with p=0.5) and Shuffle Crossover

https://stackoverflow.com/questions/19185266

30-06-2022
|

Question

In evolutionary-/genetic algorithms there are multiple recombination methods. Most of them suffer from a bias associated with the length of the chromosome (also called positional bias).

Uniform crossover and shuffle crossover can solve this problem. However, I don't understand the difference between the two, if in case of uniform crossover p(c)=0.5

Explanation

With uniform crossover with p(c)=0.5 every gene is a possible crossover point.
With shuffle crossover the chromosome-sequence is first shuffled (uniformly), then one crossover point is assigned and finally the original chromosome sequence is restored - this actually means although only one crossover has taken place it could affect every position in the chromosome independently.

As both methods involve a complete randomization, I see no reason why the results should be different.

I wanted to know exactly therefore I wrote a little script to test both mechanisms. Here's some R-code, if you like to try it yourself

parent1 <- rep(0, 10000) # 10.000 genes in the chromosome - change at will
parent2 <- rep(1, length(parent1))

# Uniform crossover
offspring1_unif <- rep(-1, length(parent1)) # initialize offspring1_unif
offspring2_unif <- rep(-1, length(parent1)) # initialize offspring2_unif

for(i in 1:length(parent1)) {
  if (runif(1) < 0.5) {
    offspring1_unif[i] <- parent1[i]
    offspring2_unif[i] <- parent2[i]
  } else {
    offspring1_unif[i] <- parent2[i]
    offspring2_unif[i] <- parent1[i]
  }
}

# Shuffle crossover

## Shuffle
shuffler <- seq(1, length(parent1))
shuffler <- sample(shuffler, length(parent1))

## perform the crossover
crossover_point <- sample(1:length(parent1), 1)

offspring1_shuffle <- rep(-1, length(parent1)) # initialize offspring1_shuffle
offspring2_shuffle <- rep(-1, length(parent1)) # initialize offspring2_shuffle

for(i in 1:length(shuffler)) {
  if (i < crossover_point) {
    offspring1_shuffle[shuffler[i]] <- parent1[shuffler[i]]
    offspring2_shuffle[shuffler[i]] <- parent2[shuffler[i]]
  } else {
    offspring1_shuffle[shuffler[i]] <- parent2[shuffler[i]]
    offspring2_shuffle[shuffler[i]] <- parent1[shuffler[i]]

  }
}

mean(offspring1_unif) # 0.493
mean(offspring1_shuffle) # 0.3295

mean(offspring2_unif) # 0.507
mean(offspring2_shuffle) # 0.6705

sd(offspring1_unif) # 0.499976
sd(offspring1_shuffle) # 0.4700552

sd(offspring2_unif) # 0.499976
sd(offspring2_shuffle) # 0.4700552

Solution

The difference is what distribution the number of swaps has in both of the methods.

uniform cross-over:
we pick a gene to be swapped with a probability p independent from the other swaps, i.e. a Bernoulli experiment and we perform this Bernoulli experiment, for the whole chromosome, i.e. let's say for n genes, so the number of swaps will follow a Binomial distribution.
shuffle-cross-over:
we randomly shuffle the chromosome first(this is done mainly to avoid positional bias, i.e. to decouple the probability swaps from the position of the genes in the chromosome -- we do take care of this bias in the uniform case too. What's different from the uniform case, is that we pick only one cross-over point, and all elements on one side of this cross-over point will be swapped, so we swap any number of genes with an equal probability of 1/2. Therefore we avoid also the so called distributional bias, i.e. probability of number of swaps being different.

OTHER TIPS

For uniform crossover there can be many crossover points. The number of crossover points essentially becomes a binomial distribution. With p(c)=0.5 you can expect to have around K/2 crossover points in a K bit long genetic string.

Shuffle crossover on the other hand picks out one, and only one, bit at random as crossover point.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow