Question

Disclaimer

First of all, this is for homework, so don't ask why it's so contrived, it is this way and it can only be this way. (I get a lot of "how about if you change something"), sorry ... I can't.

Also I must use evolutionary algorithms, that means parents have children, they can mutate / recombine, form new generations and eventually lead to a solution.

/Disclaimer

I have n*2 words of length n. I have to make a n^2 matrix containing all these words. The words can be gibberish, but they have to be able to fit in this matrix (it's a requirement on the user's part).

Thus AGE,AGO,BEG,CAB,CAD,DOG will give me this result (1 out of at least 2 possible):

C A B
A G E
D O G

I have to use an evolutionary algorithm. Thus I need to find a way to code my information into a chromosome.

What I did come up with:

Each word has to appear, has a start position in the matrix, and an orientation (left-right or up-down). Thus I have [Word][Orientation][StartPosition] where start position is [0][0] / [0][1] / [1][0] etc (left column and top line). But it has restrictions, I need to validate that the orientation fits the start position.

Problems:

A chromosome has to be a possible solution whereas this is only part of a solution.

Since my solution has to be a matrix containing all the words in such a way as to "fit" the chromosome also has to somehow represent the entire matrix. But that hits several problems. I can only have one word from one starting position in one orientation (except the first two words, they share the same start position with different orientations). I can't see this working as a valid way of attempting an evolutionary algorithm. I just can't see any of the phases working, especially mutation / recombination.

Am I thinking of it entirely wrong ? If so ... why ? and how could I attempt to code my data in such a way as to let me go trough all the phases (reproduction, mutation / recombination, natural selection ... be able to calculate a fitness and start a new generation) without having tons of garbage data (having a word appear twice, lose a word, have a word on a wrong orientation compared to it's start position) ?

EDIT

I shall be using this representation to implement many other nature-inspired algorithms, thus I need a "good" data representation. Nothing makeshift that could hurt me later.

I honestly can't think of a good way though. Because I have a lot of limitation (maybe I have been thinking on this too long and I can't get past them, and they maybe aren't really there). I would really like a binary representation, but that seems impossible.

Was it helpful?

Solution

You seem to be having a problem with Winston Ewert's answer. I'm making this a separate answer for reasons of length and formatting, but what Winston is suggesting is a reasonable approach.

You ask about the "array of characters" and what your parents and offspring are, how you'd do crossover, mutation, etc. I think maybe you're confused by that array of characters bit. Don't think of it as an array of characters -- it's an array of words. This is important, because you don't want to do crossover between word boundaries or change letters inside a word via mutation. You want to shuffle the words around. So instead of dealing with the words, let's take a slightly different (but equivalent) approach.

Let's number each of your 2n words from 1 to 2n. To generate a candidate solution, we just pick n random numbers between 1 and 2n (without replacement). The randomly selected words become the rows to your matrix. So if the words are AGE,AGO,BEG,CAB,CAD,DOG, we pick three at random and end up with, for instance, 2, 3, and 5, (giving a chromosome of 235) or AGO,BEG,CAD.

This yields a matrix of

AGO
BEG
CAD

Now we count how many of the 2n total input words appear in the matrix. In this case, it's only three, as none of the columns make valid words. Your fitness for the input 235 is 3. The input 416 works though -- its fitness is 6.

To generate a population, you just generate N random sets of three numbers (without repetition) between 1 and 2n. With a population size of 4, you might get

142
354
624
241

These give you four different potential solutions:

142 = AGE
      CAB
      AGO

354 = BEG
      CAD
      CAB

624 = DOG
      AGO
      CAB

241 = AGO
      CAB
      AGE

To do crossover, you'd need to design a method that avoided duplicated numbers in the offspring. I've in the past adapted the Cycle Crossover method designed for permutations to handle situations similar to this, but you can design any method that seems reasonable. For instance, you could do a simple uniform crossover and then just fix any duplicate values by changing one of them to something else.

Mutation is simply changing one number into another or swapping the positions of two numbers in the encoding.

You say you must use a GA, so you could stop here and try something like that. It's not likely to work extremely well, for a couple of reasons. First, there are going to be a lot of plateaus in the fitness function. Almost every possible answer is equally wrong, so there's not much of a gradient for the GA to follow. It's a needle in a haystack search problem. You could try to modify the way you score the fitness values to alleviate this somewhat. For instance, I know that only one word starts with D in your example set, so any solution with DOG in the first row is automatically wrong. I could award points to solutions that gave more "plausible" wrong answers. However, in general, this is going to be a tough problem for a GA to solve efficiently.

Second, constraint satisfaction problems like this one can be much more efficiently solved with specialized techniques, no matter how well you build your GA. You could easily solve this with backtracking.

OTHER TIPS

Some options:

Just have the matrix of letters as your representation. So your optimal solution from the example would be:

CABAGEDOG

Then for your fitness, just give on point for each requested word that's actually in the solution.

Or have the representation consist of three words, one for each row.

   CAB
   AGE
   DOG

Your fitness then rewards words that are correct in the columns. Mutations swap out one word for another.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top