You seem to be having a problem with Winston Ewert's answer. I'm making this a separate answer for reasons of length and formatting, but what Winston is suggesting is a reasonable approach.
You ask about the "array of characters" and what your parents and offspring are, how you'd do crossover, mutation, etc. I think maybe you're confused by that array of characters bit. Don't think of it as an array of characters -- it's an array of words. This is important, because you don't want to do crossover between word boundaries or change letters inside a word via mutation. You want to shuffle the words around. So instead of dealing with the words, let's take a slightly different (but equivalent) approach.
Let's number each of your 2n words from 1 to 2n. To generate a candidate solution, we just pick n random numbers between 1 and 2n (without replacement). The randomly selected words become the rows to your matrix. So if the words are AGE,AGO,BEG,CAB,CAD,DOG
, we pick three at random and end up with, for instance, 2, 3, and 5, (giving a chromosome of 235
) or AGO,BEG,CAD
.
This yields a matrix of
AGO
BEG
CAD
Now we count how many of the 2n total input words appear in the matrix. In this case, it's only three, as none of the columns make valid words. Your fitness for the input 235
is 3. The input 416
works though -- its fitness is 6.
To generate a population, you just generate N random sets of three numbers (without repetition) between 1 and 2n. With a population size of 4, you might get
142
354
624
241
These give you four different potential solutions:
142 = AGE
CAB
AGO
354 = BEG
CAD
CAB
624 = DOG
AGO
CAB
241 = AGO
CAB
AGE
To do crossover, you'd need to design a method that avoided duplicated numbers in the offspring. I've in the past adapted the Cycle Crossover method designed for permutations to handle situations similar to this, but you can design any method that seems reasonable. For instance, you could do a simple uniform crossover and then just fix any duplicate values by changing one of them to something else.
Mutation is simply changing one number into another or swapping the positions of two numbers in the encoding.
You say you must use a GA, so you could stop here and try something like that. It's not likely to work extremely well, for a couple of reasons. First, there are going to be a lot of plateaus in the fitness function. Almost every possible answer is equally wrong, so there's not much of a gradient for the GA to follow. It's a needle in a haystack search problem. You could try to modify the way you score the fitness values to alleviate this somewhat. For instance, I know that only one word starts with D
in your example set, so any solution with DOG
in the first row is automatically wrong. I could award points to solutions that gave more "plausible" wrong answers. However, in general, this is going to be a tough problem for a GA to solve efficiently.
Second, constraint satisfaction problems like this one can be much more efficiently solved with specialized techniques, no matter how well you build your GA. You could easily solve this with backtracking.