Domanda

To set up my question, allow me to start with an example:

Suppose a set of 1000 arrays (aka, row vectors) all the same length. Each is filled with random numbers between -1 and 1. I then pull 500 of those row vectors at random and sum them up. I now start with the sum and want to reverse engineer the selection from the original 1000.

I decide to solve this with a genetic algorithm. I start a family of bit strings 1000 bits long and run through the mutate (aka, flip-a-random-bit) and crossover procedures. Then, after a tenth of a second, I'm 75% right. Then, after another hour, I'm 76% right. Essentially, I'm stuck waiting forever for some few dozen bits to be set correctly. It may be that my random number generator never introduces those in a way that they can be merged into the solution.

The algorithm does very well at the beginning, but then fails to improve the solution further. I tried making sure that my genetic family had one of every possible bit position. That didn't help; you can't judge how quickly items will die out of the pool.

It seems that there must be an additional component to the algorithm. There must be something driving the selection of the flip-bit (aka, mutation) locations. What is the technical term for this piece? Gradient? Where does it come from?

È stato utile?

Soluzione

So essentially you have a number s and you want to find a subset of numbers that add up to s. This is described as the subset sum problem which is a special case of the knapsack problem.

I think your expactations regarding the application of genetic algorithms are wrong. To visualize the number of possible solutions that have to be considered to select 500 items out of a set of 1000 items, please read the following number [the binomial coefficient for "1000 over 500"]:

270288240945436569515614693625975275496152008446548287007392875106625428705522193898612483924502370165362606085021546104802209750050679917549894219699518475423665484263751733356162464079737887344364574161119497604571044985756287880514600994219426752366915856603136862602484428109296905863799821216320 (source).

Le me clarify first: Genetic algorithms and related methods are metaheuristics, which means they are not suited to find the optimal solution, but a "good" solution in a very short time. To find the optimum solution in a problem that is NP hard you would have to try every single possible combination in the worst case. There are methods for exact optimization that intelligently partition the search space and evaluate only a smaller number of solutions, still it can take weeks to come up with the optimal answer.

If you are required to find this exact optimum I would suggest you look for exact methods such as e.g. branch and bound. You could use for example the well-known CPLEX optimizer to describe your problem as an integer program. Look for example at the ILP formulation of the TSP to see how this can be achieved and translate that to your problem.

If you are not required to find the exact optimum there are several things that you can monitor in your genetic algorithm to improve its output:

  • Use a large enough population size and according selection pressure. You want to avoid the effect of genetic drift, still achieve convergence.
  • Monitor the variance (diversity) in your population. Is it decreasing quickly? If all solutions in your population are essentially the same then the algorithm has converged. Once it has converged you would need to restart it, or introduce new genetic information to revive the evoluationary process.
  • Vary the strength of the mutation. Flip multiple bits at the beginning of the search and reduce to flip only few bits at the end of the search.
  • Use multiple crossover points (I assume you're using single-point crossover). For such a long string maybe you want to use 10 crossover points.

Altri suggerimenti

I think you mean "convergence speed" or "rate of convergence". There are a lot of solutions for your problem in literature. I would suggest decreasing the mutation rate every iteration until you get "stuck" in a local optimum (rate of convergence gets too low, absolutely or relatively), then you reset or increase the mutation rate to escape the local optimum.

Also, what kind of selection do you do? Do you keep the parents if they are better than the children? Do you select the best individuals or do you do roulette-wheel selection (which keeps sub-optimal solutions with some chance for more entropy)?

The genetic algorithm is not a precise algorithm, it's a heuristic. That means that no matter how hard you try, only in a few very rare cases you're actually going to get to the optimal solution. So you are exchanging precision for some speed. This doesn't mean the algorithm is a loser, just that it was design for problems that don't demand the exact solution, but a good approximate. This also means that you must have an end condition for the algorithm other than finding the optimal. This means there has to exist a function that evaluates a solution and tells you how good it is.

If you want to try to increase a little bit more your precision, then you are going to have to adapt your heuristic using problem specific knowledge. This means, use what you know about the problem to design better mutations (not using the predefined ones) and also improve your refinement of the population, the selection algorithm. I repeat again, this won't guarantee that you'll find the optimal.

In your particular case I would believe that after a few minutes you're falling into a local maxima. This means that further from here you're not going to get better. This can be sometimes solved by injecting some random elements into the population in order to get the algorithm to get off the maxima. This can also be solved sometimes by deliberately choosing bad solutions.

This is a pretty thorough tutorial, should help you understand the training process and the convergence involved. You're right about it being a form of gradient descent.

http://informatics.indiana.edu/larryy/al4ai/lectures/03.IntroToGAs.pdf

That document has moved, and may now be found here:

http://shinyverse.org/al4ai/lectures/03.IntroToGAs.pdf

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top